Significant below chance decoding (searchlight, cross-validation)

Hi all,

I have something unexpected with my MVPA analysis results but cannot found where it come from. Maybe I can find some help here.
I run a whole brain searchlight MVPA analyze on fMRI dataset (using Nilearn:
searchlight = nilearn.decoding.SearchLight(mask_img[sub],
radius=Searchlight[‘radius’], #10
estimator=Searchlight[‘classifier’], #svc
scoring = None,
cv=single_split[sub])). #leave 2 runs out

The experimental protocol included 2 Conditions (A, B) and 2 classes by condition. I run different leave2runs out analyses at individual level then [accuracy maps – 0.5] were entered in SnPM (5000 permutations, 0.05 FWE-cluster corrected) for group level analyses.

  • Intracondition decoding : in each condition I trained the classifier to decode the two classes. The results were very powerful, and I have observed common regions that decode the classes in each of the 2 intracondition decoding (A, B).
  • Cross-condition decoding : (I trained the classifier on one condition and tested on another one). I run the analysis and find significant below chance decoding in some regions (mostly the common regions of intracondition decodings) and nothing significant above chance. I was expected the exact opposite!
  • I thought that perhaps the decoding relationship was strictly opposite in one condition compared to the other. To better understand I performed a univariate analysis that confirmed that the beta values in these regions increase in all condition for the class1 compared to the class2. So, in the same direction (however it’s more related to the intensity of the signal than in spatial representation in that case)

I have the impression that something is wrong, I can’t believe that the brain decodes in a totally opposite way in one condition compared to another and in the expected regions. The results of the univariate contrasts (Class1>Class2) reinforce this feeling.

Have you ever had these kinds of results? Do you have any idea what I could test to better understand what is going on? Maybe plot some of the outputs of the decoding, but I don’t know which ones can help me?

Thanks for your help,

Jo Etzel has some posts on below chance decoding in her blog.

I remember reading those posts when faced with some results of that type, but I don’t remember where that rabbit hole brought me.

But maybe it can already help you.

Thanks, yes, I have checked this post before but it didn’t really help me.
I wonder what kind of visual/graphical inspection (with Nilearn or skilearn) I can do to check the direction of the linear classification for example (positive or negative from class1 to class2) and thus verify that the direction is the same or opposite in two intra-condition decodings. If there is an “opposite classification”, this could explain the negative results below chance in the cross-condition decoding.

1 Like

This sounds like the class labels might be part of the issue; am I understanding that you have two conditions (A & B), with two classes in each (say, “high” and “low” in A and “up” and “down” in B), and you want to do some cross classification (e.g., train for high vs low, test with up vs down, expecting that high=up and low=down)?

In that case, with R (I don’t know nilearn/skilearn, but it is likely similar), the text categories (high, low, up, down) will be assigned levels in alphabetical order, so depending on how you set up the model, it would consider high = class 1 in A but down = class 1 in B.

A way to test if the labels are causing the strange results could be to make two copies of a toy dataset (i.e., with very high accuracy), one copy for A and one for B. Then give the examples the labels you’re using (high/low, up/down) and see how the results turn out. Since the toy A & B are identical, the cross-classification should work extremely well (since it’s toy data), so you can see if the classes flip.

good luck!

Thanks for you suggestion ! You’re right in R and python the labels are assigned in alphabetical order, so it could be an issue. However I gave the same label name for the two classes of the two conditions (A: “class1” and "class2’, B: “class1” and “class2”) and not two different sets of labels. eg: Train to decode class1 vs class2 in A then test decode class1 and class2 but in B.


:thinking: Not the labels, then. Have you tried the “make a copy of A and call it B” test? That could help identify if there’s something odd in the cross-condition code.

Another troubleshooting idea is to check the class balance in all versions; I assume the cross-validation scheme is different for the single-condition and cross-condition cases? If so, confirm that the training sets have equal numbers of examples from each class each time. Also, if the number of examples or amount of compression (e.g., if summarizing to one example per run but 100 trials per run in A but only 5 in B) varies a lot it could have strange effects.

You could check the hyperplane (linear decoder so its equation is straightforward) and which class assignment (side of the hyperplane) individual examples will get. But this seems unlikely to be an issue, since you’re using the same class labels for A & B.

Also - this “flipping” of results is showing up in single subjects, or just at the group level? Strange things can happen, particularly with searchlight analyses. To reduce the complexity I’d suggest switching to a ROI-based analysis for the troubleshooting (i.e., a single sensible anatomic area or searchlight).

Yes, I tested different possibilities for the labels and created toy labeling data, for example, in the training runs I named B as A and did not change the labels in the testing runs (A always A), so it was a crossmodal decoding but the algorithm did not know. Unfortunately, I had exactly the same results. The problem doesn’t seem to be the labels

I have the exact same number of trials per label per condition in each run, so the balance between classes is the same for all versions, as is the number of trials.

I think it’s a good idea to check the hyperplane. I’m using nilearn/sklearn but I’m not sure how to test it.
Flipping happens at both levels (subjects and group), so I think the next step, as you mention, should be to redo the analysis in ROI and look what happens !

Thanks for your help

Just to let you know that I finally found the problem. It was an error in the first steps when I defined each condition and create my beta maps. Now I have no longer significant below chance decoding at group level !
Thanks for your help.


1 Like