Dear TDT experts,
I am running a between-subject classification with a leave-one-subject-out cross-validation design. Is there a way to access all the accuracies from each cross-validation step in addition to the mean overall accuracy?
Ahoi hoi @tbertram,
thank you very much for your post and welcome to neurostarts, it’s great to have you here.
First things first: could you maybe add the
TDT tag to your post so it’s indexed and archived accordingly (makes it easier to find it later, etc.) and that folks are become aware of your question (tagging @Martin here). I’m not that familiar with TDT, but I would assume that it returns not only the mean accuracy across CV-folds but a structure that contains different types of information, including CV-fold specific ones. Sorry for just wildly guessing here…
Also sorry for addressing something outside the main scope of your question, but here are some interesting reads concerning CV strategies:
Cross-validation failure: Small sample sizes lead to large error bars (preprint)
Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines (preprint)
HTH, cheers, Peer
Thanks for your message (and thanks @PeerHerholz for adding me).
Because of memory constraints, we don’t per default output all accuracies but since running analyses is so fast, I hope that is not an issue for you to run analyses several times.
When setting up your design (before
make_design_cv or whatever design function you pick), set
cfg.design.set = 1:n_sub;
n_sub is your number of subjects. Together with the flag
cfg.results.setwise = 1 (which is the default), this will provide one result for each cross-validation iteration.
Now, a few things to bear in mind:
- make sure the training data is balanced (and ideally the test data, too but there are easier ways to deal with this, e.g. using balanced accuracy or AUC as output instead of accuracy)
- ideally, you could run a leave-one-pair-out CV scheme (or even better e.g. split-half or the such to reduce the variance of the estimators). However, for searchlight analyses, the issue that Peer pointed out with Gael’s paper is less of a problem because there you can smooth the resulting decoding accuracies, which will reduce the variance.
- when using such schemes, it becomes more difficult to get per subject results. In that case, I would return
true_labels as results and compare them individually. You can vectorize them and relate them to the individual subjects for which you have the indices in the test design matrix (