What is the best way to get a beta map for the between-run average of a trial type to use in RSA?

Long title, but hopefully an easy question! :upside_down_face: (if you don’t need the details, the question is right below this)

My question:
To conduct RSA on a task consisting of stimuli drawn from multiple trial types which are distributed equally across multiple runs, what is the best way to get an average representation of voxel activity for each trial type, which could then be used to create an RDM (comparing trial types to one another) for each subject?

Potential options:
I thought of three options for proceeding: each would generate one RDM per subject that compares activity for each trial type. As a note, for a given run, each individual stimulus is effectively a different trial type - basically, each stimulus shares a feature which varies along a discrete range of values, and each value is presented once per run. I want to ask questions regarding the discrete feature values.

  1. Create an RDM for each run using single-trial beta estimates for each stimulus (and therefore each trial type) as input, then average those RDMs.
  2. Separately average single-trial beta estimates for each stimulus (trial type) across runs, then use those averages to create the RDM.
  3. Create a single model with all runs, and use that to generate a contrast for each trial type which would contain the average of the data from all the runs; this would be used for RDM construction. For example, to isolate one trial type from a task containing two runs and four trial types, the contrast would look something like [.5 0 0 0 .5 0 0 0].

Pros and cons:

  1. It seems like it would introduce issues like temporal correlation you get from within-run analyses that I can avoid by averaging across runs. Alternatively, due to the design of the task, most of the interesting data may only be present in the first run, and averaging across them would reduce our power to detect an effect/limit the kinds of questions we can ask.
  2. This one seems just seems like an odd mix of #1 and #3. It also seems statistically inappropriate regarding accounting for error/variance, maybe?
  3. I currently favor this option slightly more than #1, however, in this 2008 Kriegeskorte paper it seems like they concatenated runs in order to average stimuli together rather than modeling each run separately. Maybe I’m misunderstanding, though!

Task explanation:
The goal is to determine whether stimuli in a two-category learning task are represented in a binary-like manner, strictly by category, or more on a continuum (i.e., greater neural pattern similarity closer to the category decision bound). The stimuli contained some number of dots (7-16 or 18-27), and additionally varied on other parameters (to make each stimulus unique). Each dot bin was treated as a trial type.

I collected four runs of data for each of two different phases (training and testing), and in each run, a stimulus from each dot bin was presented once. That left me with (for each phase), for example, four stimuli containing 7 dots, four containing 8 dots, etc.

My primary analysis of interest concerns how category is represented in the first run or two of the testing phase, however, I’m also interested in how this feature is represented, in general.

Your thoughts are always appreciated!!

IIUC, what you need is to compute the fixed effects across sessions to compute RDMs.
2. Is a coarse, but valid way to obtain such a fixed effect estimate. But if the final goal is only to compute RDMs this is more than enough I think. If you have effects size and variance estimates for each run, you can do better, see e.g. Nilearn: Statistical Analysis for NeuroImaging in Python — Machine learning for NeuroImaging
3 is another, less efficient version of 2.
I would a priori discourage you to use solution 1.

Thank you for your clear response!