I have a dataset of n subjects, with 347 stimuli and activity in 8 regions. I have a representation (embedding) of the stimuli and I want to apply RSA to check the relation between neural pattern similarity and the representation, in each of the regions. (To probe what is the region in which the activity patterns resembles the most to the embeddings).
My data is in the form of
pandas dataframe with the columns:
subject stimuli_embedding region1_activity region2_activity... region6_activity
What will be the best way to apply RSA (with python ) on these data?
What you call
region2_activity is the average activity within the regions or a bunch of values with for all voxels in the regions ?
Can you please explain why can’t I run RSA with average value? Can’t I run it between-subject?
You can still run ti formally, but it won’t be informative. The information that you keep from each regions is just one scalar value: the amount of activity, so this is very little to characterize the pattern of activity of that regions and compare it to some metric of stimulus relatedness.
In that case, doing it across subjects does not help.
The only think that makes sense to me is to use the region-level averages as input to a global classification/regression method.
@bthirion Thanks, can I do a between subject similarity analysis? For example, between each of the 8 vectors of [1Xn] per-region activity, and the stimuli embedding?
Thus, for example, I can see that there is high correlation between region#1 activity to the embedding, but only random correlation between region#2 and the embedding, to suggest that region#1 represents the data better?
I think that comparing similarity of within-region activation across-subject to “stimuli” would make sense if the output variables were actually not stimuli but some subject related quantity, like a behavioral score, response time or any other individual characteristic.
Here I don’t really understand what you would infer from it. Sorry if I misunderstood.
@bthirion Thank you very much. The idea is to change the parameters of the embedding to see which correlated better to brain-data therefore possibly these parameters (e.g more importance so specific features in the representation) more resembles to the way these data is cognitively represented , thus we can learn something about the representation in the brain.
Does it make sense?
Yes, but you can this type information by regression: which feature(s) best explain(s) brain activity. This is called an “encoding model” in the literature.
@bthirion Thank you for your answers.
I do aware of encoding model and read quiet a few of these papers, but:
- I am failing to understand why is this the same information?
- Isn’t the encoding model has much more degrees of freedom that are not present in what I suggested (e.g regression model with its params, evaluation metric).
- Isn’t encoding model assume that the features itself has meaning, therefore not be suitable to embedding method such as PCA, VAE , BERT (for sentence) , etc … in which the representative vector in meaningful but each feature by itself is meaningless?
1 . Well: the most precise response to the question: “what stimulus features explains activity in region XXX ?” is given by an encoding model, that explicitly tests that. RSA is a slightly more indirect way to answer the question. Note that the two are related, see e.g.
But RSA has even more degrees of freedom (searchlight model, dissimilarity model, comparison statistic etc.)
With encoding you can also test groups of features, if you believe that each of them, taken separately, is not meaningful.
Thanks, I get your point on why they are conceptually the same / answer the same research question. But does building an encoding model contradicts the similarity analysis or make it redundant? Isn’t it possible to come up with finding on one but not on the other?
No my point is rather that these are two formulations on the same problem. Practically, I would not expect strong difference on the results of RSA vs encoding analysis.