TDT: Cross-Classification with unbalanced datasets

Dear Dr Hebart,

we have a dataset that is structured as follows:

  • event-related design
  • conditioning paradigm (CS+ and CS-)
  • 3 different conditions (A,B,C) in 3 separate scanning runs
  • for each condition (A,B,C) there are events of interest (CS+) and baseline events (CS-) respectively
  • -> we derive GLM beta estimates for each subject & condition (A,B,C) modeling CS+ and CS- respectively

We intend to use TDT to evaluate whether a SVM classifier
CS+(A) vs [ CS+(B), CS+(C ), CS–(A), CS–(B), CS–(C ) ]
performs well for e.g.
CS+(A) vs CS–(A)
but poorly for e.g.
CS+(B) vs CS–(B)

I.e. we want to perform cross-classification, where we train with one set of data/conditions and test with another.

Therefore our training dataset is quite unbalanced and we assume, we need to modifyaddress this problem.

In the TDT template decoding_template_unbalanced_data.m, you describe these options as ways to deal with unbalanced data:

  • supersampling
  • AUC_minus_chance
  • repeated subsampling (bootstrap)
  • balance ensemble approach

As we understand
to perform cross-classification:
cfg = make_design_xclass(cfg)
But to perform bootstrapping:
cfg = make_design_boot(cfg)

It seems, using cross-classification & bootstrapping in the described combined fashion are mutually exclusive?

So our request is as follows:

  1. Is the idea to deal with the unbalanced dataset by bootstrapping in our case recommended / valid?
  2. How do we “combine” our cross-classification approach with bootstrapping?

Thank you very much for your help in advance

Best regards,

Daniel Hoffmann

Hi Daniel,

This sounds interesting. To answer your specific questions:

Yes, you can use subsampling for unbalanced data. You cannot use it to control for confounds, though (I still need to remove that option because it is invalid). I personally prefer using AUC or our ensemble approach described in decoding_template_unbalanced_data.m.

We haven’t written a function that does both repeated subsampling and cross-classification, although I think it should be fairly easy to do so. If you are good at coding you can probably set this up manually yourself. Else what you could do is take cfg.files, reduce it to the runs that you want to use for training and just use make_design_boot. Since in every iteration your test data should be balanced you can then manually append entries to,, and If the test data are also unbalanced, you can either use a classification output that can deal with it (e.g. balanced accuracy) or you just run make_design_boot separately for test data and concatenate both matrices.

Now taking this one step back, I noticed several things:

  1. Based on your description it looks as if you just want to do normal cross-validation (classifying CS+A vs CS-A). So perhaps there is no imbalance after all? Or do you want to train a classifier on CS+(A) vs. all and test it on CS+(A) vs. CS-(A)?

  2. Maybe I don’t understand your design but it sounds as if all trials for CS+(A) and CS-(A) are in one run, all trials for CS+(B) and CS-(B) in another etc. If I understand this correctly, this has two unintended consequences. First, you cannot do leave-one-run-out cross-validation. With three runs I would recommend trialwise decoding rather than runwise decoding anyway. Maybe the trials are spaced far enough that you can do within-run decoding. Second, run will be confounded with condition. If you classify CS+(A) vs. all, then unintentionally you will be classifying run 1 vs. all runs. This will lead to very high decoding pretty much everywhere. If that’s the case you will have to think of clever ways to balance everything. Check out this paper if you want to test whether your decoding design is confounded.

Hope this helps!

1 Like


Since the question is a bit related to the analysis in my study, I just want to add a few points that reviewers might ask you to take care of. I did not use TDT back then, however.
My study had 4 conditions, with 2 different stimuli each:
CS+(A), CS-(A)
CS+(B), CS-(B)
NS1(A), NS2(A) (where NS stands for “neutral stimulus”, i.e. a context without any reinforcement)
NS1(B), NS2(B)
CS and NS were presented in separate runs, which is why I used single trial estimates (instead of run-wise) for the decoding (I recommend this article about how to setup your GLM for single trials). Issues of concern:

  1. Since I excluded all reinforced CS+ (i.e. CS+US+) trials, the classification CS+(A) vs CS-(A) was imbalanced, with 12 vs 24 trials. For an SVM that does not automatically result in a chance level of 66.6%, which is why I computed the chance level from 1000 classifications with permuted labels. I used 3-fold cross validation in the scheme 1 2 3 1 2 3 …
  2. For the classification of NS, I ran into another problem: in the neutral condition I could use all trials, because there was no reinforcement present. So to make it comparable to the CS classification, I removed 12 trials from one condition (from 24 vs 24 trials to 12 vs 24 trials). And to make sure I did not throw away data (and/or bias my classification due to discarding), I redid the NS1 vs NS2 classification 100 times, each time discarding a different set of 12 trials. A similar issue would probably arise for you if you do a classification of CS+(A) vs CS+(B), as your description seemed to suggest (CS+(A) vs [ CS+(B), CS+(C ), CS–(A), CS–(B), CS–(C ) ]).

Kind regards

I like the idea but I’m not sure I would generally recommend this. While permutations can allow you to estimate what bias you have in the estimation of chance, there might also be biases in the variance introduced by this. I guess for random effects analyses this should be less of an issue but I would at least be cautious with this. Using balanced_accuracy for imbalances in testing and AUC or our ensemble approach for general imbalances can often help overcome issues with unbalanced data while preserving the nominal chance level.