Hi Sebastian,

Thanks a lot for your help, so if I understand correctly, instead of creating a 100,000 averaged group searchlight maps and running permutation tests on these you conducted t-tests on the demeaned accuracy maps for the original data, which is adapted from Lee (2012)

Correct.

and created a null distribution of t-maps by retraining classifiers on 100 permuted class labels resulting in 100 rnadom SL maps for each subject and then randomly picking one from each subject to conduct the voxel wise t-test

Yes, from the pool of 100 permutations per subject, I then constructed 10^4 random group maps by selecting a random permutation from each subject.

So you still have to create 100 SLs for each subject, which should be the most time consuming part?

It’s the embarrassingly parallel part, so running the permutations on a couple dozen subjects, across 9 analyses took about a week on the cluster. So yes, in terms of CPU time, that’s probably the heavier lift, but in terms of wall time and attention, it was extremely low effort.

What I found to be the real chore of Stelzer was building the per-vertex (or voxel) histograms for 10^5 random maps. This is not as easily parallelizable, and in my experience took several days of time with a dedicated server. The choice of surface mesh (`fsaverage`

, *i.e.*, 164k vertices per hemisphere) was one problem, but a bit more fundamentally, if any input changes, the entire calculation must be re-performed.

This is also the space issue: constructing a per-voxel histogram means 100000 chance maps, which are then concatenated and sorted (uncompressed, since you’ll almost certainly need to take advantage of memory mapping just to do the job). Supposing you never want to set your voxel threshold at p > 0.05, you still need to keep 500 maps, once it’s done.

By skipping the per-vertex(/voxel) histogram stage, you fail to model the spatial inhomogeneity of the null distribution of your figure of merit, but you save yourself some fiddly calculation that, in my estimation, didn’t buy anything in interpretability.

Instead of creating 100 SL maps for each subjects would it also be appropriate to shuffle the behavioral variable, the one I’m correlating with the similarity scores, 10 000 times to get the 10 000 random t-maps?

Regarding the use of correlation instead of accuracies, I assumed that I could convert the correlations I get from correlating the behavior to the similarity scores at each voxel to tstats and then proceed the same way as you did?

I’m hesitant to give advice here, as this isn’t an MVPA approach I’ve used. But also, please don’t feel the need to coerce your figures into t-statistics, just to use my approach so precisely. The overall gist I’d like to put forward is: if you can calculate an empirical thresholded map, and sufficiently many (e.g. 10^4) random thresholded maps, you can get a cluster size null distribution without going through a per-voxel/vertex null distribution.

It sounds like you already have an empirical map, and are now looking to threshold it appropriately. That puts you in a pretty good place, so the problem now is how to appropriately shuffle labels to construct a proper permutation test. Going back to your first post:

I have run a similarity searchlight (correlation) looking at the difference between two conditions (Ultimatum Game vs Dictator Game) on 31 one subjects from two different cohorts. As a next step I am correlating the dissimilarity scores at each voxel with the average difference in behavior between the two conditions, while controlling for the group membership (cohorts) as a dummy variable.

My intuition would be to shuffle the ultimatum/dictator game labels, and go from there. The idea is that you want to destroy the connection between a real brain state and the condition label. I don’t think shuffling the behavioral variable will be have the same effect. But I’ve given this only a couple minutes’ thought.