TDT cross validation design

Hello, I want to run an experiment comprising 3 or 4 runs. Separate runs are just a way to allow subjects to take some rest, but the conditions are the same in all runs. There are, say, 5 conditions (eg words, faces, tools…), which are organized in short homogeneous blocks (e.g. 10 faces, 1/second, ie 10 seconds per block) . In other words, runs consist in the random alternation of 5 types of short blocks. Let’s assume that I want to decode faces from tools. I speculate that the best CV design would be:
(1) to create a GLM with 1 regressor per block + 1 regressor per run
(2) on each cv step, leave out one random “faces” block and one random “tools” block, just ignoring the distinction between runs.
My questions are:

  • Am I correct?
  • Would the decoding be easy to implement using TDT?
    Thank you very much in advance!

Sorry for the late reply, please add the tag “tdt” in the future if you would like faster replies!

The short answer is: whenever possible you want to run a leave-one-run-out cross-validation to ensure independence of training and test data and to make sure the results are not biased (positively or negatively) by a confound, in your case run.

I will (hopefully next week) post some of my lectures online that will include a lecture on designing MVPA studies that explains this in more detail.

General recommendations (these are just rules of thumb):

  • more runs, ideally 8-10, avoid less than 4-6
  • repeat all conditions in all runs, and have the same number of blocks per condition in each run
  • potentially add a little bit of time between blocks to avoid carryover effects
  • model either effects per block or per run but not both (unless you think of the constant effect as a nuisance variable, which here in my opinion isn’t the case)

Then, get beta maps, one per condition per run (or one per condition per block per run) and carry out leave-one-run-out crossvalidation, which is super easy to implement in TDT (if you use SPM or AFNI even with one line of code) :slight_smile:


Hey @ulysses & @Martin,

in case you don’t know it already, this might be an interesting read:
Assessing and tuning brain decoders: cross-validation, caveats, and guidelines by Gael et friends.
As you can tell by the name of the paper, it also includes a super neat and comprehensive
assessment of different CV strategies.
Note: that’s the link to the preprint version on arxiv. The final version was published in neuroimage.

HTH, best, Peer

I am looking for an example of how to perform decoding with cross validation, across MORE THAN TWO conditions. The example using make_design_cv is fine, but generalization to 4 categories is a bit tricky…
thank you in advance

PS, more specifically, I just want to decode across 4 categories, and I tried to follow closely the example provided with the Haxby data. I generate a which looks perfectly fine, but I get the following error message:

Unable to perform assignment because the size of the left side is 32-by-1 and the size of the right side is 32-by-6.

Error in AUCstats_matrix (line 24)
label_position(:,i_label) = true_labels == labels(i_label);

Error in decoding_transform_results (line 192)
output = 100*mean(AUCstats_matrix(decision_values,true_labels,labels));

Error in decoding_generate_output (line 35)
output = decoding_transform_results(curr_output,decoding_out,chancelevel,cfg,data);

Error in decoding (line 568)
results = decoding_generate_output(cfg,results,decoding_out,i_decoding,curr_decoding,current_data);

The change I made to the code to move from 2 to 4 categories is just declaring 4 labelnames:
labelnames = {

and then:
cfg = decoding_describe_data(cfg,labelnames,[1 2 3 4],regressor_names,beta_loc);

the design is the following, which looks OK to me:

I also get the following warning, which makes sense:

Warning: More than 2 labels for AUC. Running all pairwise comparisons and averaging (using AUCstats_pairwise.m).

Hi Laurent,

Apologies for the delay, I only just returned from my vacation.

It seems like it’s becoming more and more common that people are using AUC for more than 2 conditions. There is, unfortunately, still a bug in the AUC function for more than 2 conditions (apologies!). Since internally with a classifier we are actually comparing all pairwise conditions anyway, for now it would be great if you could just run all pairwise comparisons, use AUC as an output, and average them (which for all pairs of 4 conditions should still be manageable). One approach would be to generate the design matrix for all pairs of conditions by setting all other values to 0 and then concatenating them. Alternatively, you could run all analyses separately for all pairs and use an external tool for averaging the searchlight maps (or ROI results). Finally, you could perhaps just use accuracy for now, which should yield similar results. I would probably use signed decision valued, which would be even more informative than AUC by reporting the signed distance to the separating hyperplane, weighted by whether the classification was correct (i.e. times 1) or incorrect (i.e. times -1).

I will put this on the top of my list of priorities for fixing this in TDT. It will likely take a few weeks with the current backlog.


Thank you so much!
Indeed, all the other output types seem to be working fine.
All the best