Maximizing sensitivity of the MVPA analysis

Basil_Preisig · August 13, 2019, 7:04am

Dear Martin and all,

I would like to maximize the sensitivity of my decoding analysis pipeline.

I have four fMRI blocks/runs per participant. Each run contains 60 trials, 30 trials of stimulus A, 30 trials of stimulus B. In total 240 stimulus trials per participant. We used sparse fMRI for auditory stimuli a TR of 3000ms and a TA of 1000ms. Participants had to indicate after each trial if they perceived A or B. In a first step of the analysis, I would like to decode stimulus identity. The stimuli were ambiguous and we would like to decode in a second step the response/decision made by the participant, in order to differeniate the areas required for decision making form the areas implicated in sensory stimulus processing.

In an earlier analysis on another dataset, I used a first level design including one beta weight per stimulus trial (total 48 stimulus trials per participant) as predictor.

Now, I got 240 trials per participant (cross-validation design with two folds) and it takes almost 7 hours to compute the accuracy map of a single participant. I forsee that permutation testing will take too much computation time.

I wanted to ask you for your adivce. How many beta weights would you define? How would you group/chunk the stimuli trials for the betas?

I just read

Sohoglu, Ediz, Sukhbinder Kumar, Maria Chait, and Timothy D. Griffiths. “Multivoxel Codes for Representing and Integrating Acoustic Features in Human Cortex.” BioRxiv, August 9, 2019, 730234. https://doi.org/10.1101/730234.

And I was wondering if their approach using cross-validated multivariate analysis of variance (Allefeld and Haynes, 2014) would be the better more approriate to address my classification problems? Also looking maybe at the interaction between stimulus identity x decision?

Many thanks for your advice

Best regards,

Basil

Martin · August 13, 2019, 8:41am

Hi Basil,

there are many ways of speeding up an analysis, and the easiest probably are not to upsample your data (which some people do during preprocessing) and to use fewer samples by averaging data of the same label. Larger searchlights also help a lot when you have many samples (at the cost of lower spatial specificity), and using a slightly different regularization (e.g. try c = 0.1 or 0.01, which shouldn’t affect generalization a lot). There are classifiers that don’t use the covariance and can be calculated in parallel (e.g. Gaussian Naive Bayes - not implemented in TDT though ). Liblinear can be faster for such simple problems, which is implemented. I also have an alpha version of a fast SVM (Pegasos SVM) that was about 2x as fast in my case, which you can try if you would like (just shoot me an email).

I would in the ideal case only use one beta per run, assuming there is not much to be gained by the classifier learning the covariance structure of the data themselves (which is difficult anyway, considering how few trials we are usually dealing with). That sounds like very little, and indeed prevents the use of permutation testing. But valid permutation testing is difficult anyway, given that the only valid way to exchange labels within a run is to exchange all of them (i.e. 2 valid permutations per run).

I’m a great fan of their approach, specifically when you have multiple factors, which is very difficult to analyze using decoding approaches. Again, the number of runs would limit the number of permutations, and the standard approach is not the same as cross-classification. I know Carsten implemented a version with out-of-sample cross-classification, but I’m not sure it’s not on his website - you would have to ask him.

If you wanted to mimic the effect of CV-MANOVA using our toolbox, you could also use the crossnobis distance as an estimate instead and then use cross-classification. With this approach you would prewhiten your data and using the residuals of the first-level model (see decoding_template_crossnobis.m for how to do this). Then you would use make_design_xclass or make_design_xclass_cv (the latter would leave one run out, the former wouldn’t).

Again, same issue with permutation testing. The only way to get around this is artificially splitting up your runs into mini-runs and assume they are somewhat independent from each other, but that brings other problems and makes balancing very difficult - it’s difficult to know what happens when you just leave out one of your mini-runs, data of the same physical run will be more similar to it and since less will be part of the training set (because you took out part of it for the test set) that can lead to weird effects. Ideally, then you would leave out one mini run per physical run (i.e. leave-4-out), but maybe earlier time points in each run are different to later time points? It’s not easy to do things right for within-subject permutation testing.

Best,
Martin

Basil_Preisig · August 13, 2019, 9:52am

Hi Martin,

Thanks a lot for your quick and comprehensive response. To sum up, you would either model the betas at the stimulus level and if not feasible due to computational constraints you would model one beta per run per stimulus, correct?

How do I set the regularization parameter in TDT?

Best,

Basil

Martin · August 13, 2019, 10:08am

There are cases for both and the power can sometimes even be higher with the latter approach. If I have a sufficient number of runs, I tend to use the latter approach, with fewer runs the former.

For libsvm (these are the default parameters) you would change c:
cfg.decoding.train.classification.model_parameters = '-s 0 -t 0 -c 1 -b 0 -q';

See decoding_tutorial.m , or decoding_defaults.m .

Best,
Martin

jaetzel · August 14, 2019, 1:31pm

Another alternative might be to change the voxels - you’re running a searchlight analysis? Have you considered defining ROIs instead? If you’re interested in a set of brain regions (e.g., auditory cortex), carrying out the analyses within those regions can be a massive increase in speed and interpretive power.