Cluster correction for partial correlation on searchlight map

Sebastian · July 9, 2018, 9:04am

Dear experts,

I am not sure whether this is the right forum for this question, so if not I hope someone could give me a suggestion as to where else to post it.
I have run a similarity searchlight (correlation) looking at the difference between two conditions (Ultimatum Game vs Dictator Game) on 31 one subjects from two different cohorts. As a next step I am correlating the dissimilarity scores at each voxel with the average difference in behavior between the two conditions, while controlling for the group membership (cohorts) as a dummy variable. I am now wondering how to implement cluster correction for this analysis? Is the only viable way the Stelzer algorithm (http://mvpa.blogspot.com/2012/10/permuting-searchlight-maps-stelzer.html) or are there other alternatives? I have considered for instance fsl randomize or just the simple cluster function?

Thanks for your help,

Sebastian

effigies · July 9, 2018, 2:50pm

Hi Sebastian,

You may want to consider a partially-parametric approach, for the sake of computational and storage complexity. For my MVPA studies, I settled on a hybrid of Lee, et al. (2012) and Stelzer, et al. (2013), which ended up being slightly more conservative than Stelzer but much cheaper in terms of resources and time. I used cross-validation accuracy (CVA) as the figure of merit, not a correlation measure, but I believe Stelzer et al, did, as well, so you’ve presumably already considered how to adapt that. I also adapted their approach of clusters to contiguous clusters on the surface, but I don’t believe it’s integral to the statistical idea.

Probably best to just quote my dissertation, rather than try to reconstruct it here, 2 years later:

For each classification result, each subject’s accuracy map was registered to the FreeSurfer fsaverage template for group analysis. A map of relative accuracy was created by subtracting the mean cross-validation accuracy across vertices from each vertex, such that the resulting image had a mean of zero. One-tailed
t-tests were used to assess whether individual vertices had consistently high relative accuracy across sub-
jects (Lee et al., 2012). The map was subjected to a p < 0.05 (uncorrected) threshold, and we defined clusters as connected subgraphs (of supra-threshold vertices) on the surface, which were used for further inference.

To perform cluster-extent based thresholding, we generated a null distribution of chance cluster sizes. For each subject, we permuted class labels 100 times and retrained classifiers to generate random accuracy maps (see Stelzer et al., 2013); each such map was registered to the fsaverage template. Choosing one random accuracy map from each subject and subjecting it to the thresholding process described above produced a set of cluster sizes. In this way, we constructed 10^4 sets of empirical chance cluster sizes, providing a null-distribution for assigning cluster-level significance.

A similar description should also appear in doi:10.1016/j.neuroimage.2016.07.023, but I don’t have journal access at the moment to excerpt instead.

(Dissertation here; Section 2.2.7.1 excerpted above.)

Sebastian · July 10, 2018, 12:18pm

Hi Chris,

Thanks a lot for your help, so if I understand correctly, instead of creating a 100,000 averaged group searchlight maps and running permutation tests on these you conducted t-tests on the demeaned accuracy maps for the original data, which is adapted from Lee (2012) and created a null distribution of t-maps by retraining classifiers on 100 permuted class labels resulting in 100 rnadom SL maps for each subject and then randomly picking one from each subject to conduct the voxel wise t-test, which is adapted from Stelzer. Is this right? So you still have to create 100 SLs for each subject, which should be the most time consuming part? I am probably missing something, but I don’t fully understand at which part time and storage space is saved?

Instead of creating 100 SL maps for each subjects would it also be appropriate to shuffle the behavioral variable, the one I’m correlating with the similarity scores, 10 000 times to get the 10 000 random t-maps?

Regarding the use of correlation instead of accuracies, I assumed that I could convert the correlations I get from correlating the behavior to the similarity scores at each voxel to tstats and then proceed the same way as you did?

Thanks again a lot for your help,

Sebastian

effigies · July 10, 2018, 1:42pm

Hi Sebastian,

Thanks a lot for your help, so if I understand correctly, instead of creating a 100,000 averaged group searchlight maps and running permutation tests on these you conducted t-tests on the demeaned accuracy maps for the original data, which is adapted from Lee (2012)

Correct.

and created a null distribution of t-maps by retraining classifiers on 100 permuted class labels resulting in 100 rnadom SL maps for each subject and then randomly picking one from each subject to conduct the voxel wise t-test

Yes, from the pool of 100 permutations per subject, I then constructed 10^4 random group maps by selecting a random permutation from each subject.

So you still have to create 100 SLs for each subject, which should be the most time consuming part?

It’s the embarrassingly parallel part, so running the permutations on a couple dozen subjects, across 9 analyses took about a week on the cluster. So yes, in terms of CPU time, that’s probably the heavier lift, but in terms of wall time and attention, it was extremely low effort.

What I found to be the real chore of Stelzer was building the per-vertex (or voxel) histograms for 10^5 random maps. This is not as easily parallelizable, and in my experience took several days of time with a dedicated server. The choice of surface mesh (fsaverage, i.e., 164k vertices per hemisphere) was one problem, but a bit more fundamentally, if any input changes, the entire calculation must be re-performed.

This is also the space issue: constructing a per-voxel histogram means 100000 chance maps, which are then concatenated and sorted (uncompressed, since you’ll almost certainly need to take advantage of memory mapping just to do the job). Supposing you never want to set your voxel threshold at p > 0.05, you still need to keep 500 maps, once it’s done.

By skipping the per-vertex(/voxel) histogram stage, you fail to model the spatial inhomogeneity of the null distribution of your figure of merit, but you save yourself some fiddly calculation that, in my estimation, didn’t buy anything in interpretability.

Instead of creating 100 SL maps for each subjects would it also be appropriate to shuffle the behavioral variable, the one I’m correlating with the similarity scores, 10 000 times to get the 10 000 random t-maps?

Regarding the use of correlation instead of accuracies, I assumed that I could convert the correlations I get from correlating the behavior to the similarity scores at each voxel to tstats and then proceed the same way as you did?

I’m hesitant to give advice here, as this isn’t an MVPA approach I’ve used. But also, please don’t feel the need to coerce your figures into t-statistics, just to use my approach so precisely. The overall gist I’d like to put forward is: if you can calculate an empirical thresholded map, and sufficiently many (e.g. 10^4) random thresholded maps, you can get a cluster size null distribution without going through a per-voxel/vertex null distribution.

It sounds like you already have an empirical map, and are now looking to threshold it appropriately. That puts you in a pretty good place, so the problem now is how to appropriately shuffle labels to construct a proper permutation test. Going back to your first post:

I have run a similarity searchlight (correlation) looking at the difference between two conditions (Ultimatum Game vs Dictator Game) on 31 one subjects from two different cohorts. As a next step I am correlating the dissimilarity scores at each voxel with the average difference in behavior between the two conditions, while controlling for the group membership (cohorts) as a dummy variable.

My intuition would be to shuffle the ultimatum/dictator game labels, and go from there. The idea is that you want to destroy the connection between a real brain state and the condition label. I don’t think shuffling the behavioral variable will be have the same effect. But I’ve given this only a couple minutes’ thought.

Sebastian · July 11, 2018, 8:45am

Ok, thanks! Your comments have been very helpful. The only thing that is not entirely clear to me is how you get the cluster size null distribution. Would you have some could available that you would be willing to share that I could use as a guideline?

Thanks for your help!

effigies · July 11, 2018, 9:57pm

The only thing that is not entirely clear to me is how you get the cluster size null distribution.

If you can create an empirical map with clusters (e.g., by thresholding), then you should perform the exact same process, just starting with random individual subject maps.

You’re going to need to establish cluster size. On the surface, you can take connected subgraphs of the surface mesh as your cluster, and then choose a metric, e.g. vertex count or maximum geodesic distance between any pair of vertices in the cluster, etc. It’s probably worth thinking through a few different metrics. Volumetric clusters are generally simpler, as you can just take the number of voxels in the cluster, which directly corresponds to volume. I suspect there are tools for identifying clusters in the volume, but I can’t think of them off the top of my head.

Once you have a method for calculating the set of cluster sizes in a thresholded map, you simply apply that to each of your random group-level maps, and concatenate all the sizes into one array, which is a big pile of cluster sizes that have been found by chance. Supposing you have a sorted array of sizes hist and some target p value p:

n = len(hist)
threshold = hist[int(n * (1 - p))]

(That may be too liberal by one index… You should probably check my math.)

Would you have some could available that you would be willing to share that I could use as a guideline?

I can try to dig some up, but it may take me a little time. Maybe ping me in a week if I haven’t gotten back to you?

Sebastian · July 18, 2018, 3:35pm

Hi Chris,

Thanks for your help again. Have you been able to find the code?

Thanks,

Sebastian

effigies · July 18, 2018, 6:58pm

Hi, sorry, I’ve been in the hospital. I’ll try to get to this soon.

Sebastian · July 24, 2018, 8:42am

Oh, I am sorry to hear. I hope you will get well soon!

effigies · July 24, 2018, 2:29pm

Hi Sebastian,

I managed to dig up that code. It’s a bit hard to follow in the original files, just because the functions are spread over a few files and several analyses are represented in each. I’ve curated it here:

gist.github.com

https://gist.github.com/effigies/0267812074c8eb49492ac0584d6730c1

partial_permutation_testing.md

# Partially-parametric permutation testing

Cluster thresholding in multi-voxel pattern analysis (MVPA) is an open problem, as many figures of merit are possible, few
(if any) of which have been sufficiently analyzed to permit a parametric solution or a guarantee of compatibility with
pre-computed simulations. [Stelzer, et al. 2012](https://doi.org/10.1016/j.neuroimage.2012.09.063) represents probably the
most conservative approach, constructing voxel-wise and then cluster-wise null distributions at the group level, based on
permuting the training labels at the individual level.

In [Mapping the cortical representation of speech sounds in a syllable repetition
task](https://doi.org/10.1016/j.neuroimage.2016.07.023), we adapted this approach to skip the voxel-wise null distribution,

This file has been truncated. show original

Some of it can be taken whole, while other functions will need adapting to be usable on a different computer or different study, but it should be enough to follow along and compare to your attempts.

Sebastian · August 8, 2018, 1:19pm

Thank you very much!!