Get models from TDT

Hi TDT experts,

I am a new user of TDT. After finished some practices following decoding tutorial I have a confusion. Can I get the proper model after cross-validation to test on another independent testing data? Is it reasonable for my thinking? If it is possible which function in TDT should be used?

Best,
Xinqi

Dear Xinqi,

I understand your confusion. Usually, when running machine learning classification, you first run cross-validation to optimize hyperparameters and then apply them to independent test data. In decoding for studying brain function, the goal is usually not to build a classifier, but to know whether there is statistical dependence between the labels and the data. Usually, we don’t optimize the hyperparameters (e.g. c in the support vector classifiers), and if we do, then we use nested cross-validation. In contrast, we usually use cross-validation on all data, only to get estimate of the information content that is not positively biased, i.e. there is no separate test set. See my paper that explains this distinction in more detail (if you have no access, the preprint is here) .

Now, in TDT you will get a model for each different cross-validation iteration, so which one is the “correct” one to apply to separate data? Well, assuming that your estimate of the information content is fine (i.e. you are happy with the accuracy for a given set of hyperparameters), you can just run TDT on all your data as training data and set separate test data as test data. You just cannot use the automatic routines, because with separate data you need to determine the file names, labels, and chunks manually. But that should all be covered in the tutorial and the templates.

If you just want to inspect the model parameters, set
cfg.results.output = {'accuracy_minus_chance', 'model_parameters'};

Best,
Martin

1 Like

Dear Martin,

Thanks for your detailed explanation! It really solved my confusion.

Best,
Xinqi

Dear Martin,

I have a related question. You mentioned a training set that resulted in a satisfactory estimated information content that can be further applied to a separate testset. Is it correct that you propose to simply change the testset in decoding design? Let’s assume the classifier was trained on n crossvalidation folds. Does this have constraints on how to specify the design matrix for the subsequent decoding analysis on the new dataset? In principle, my question is whether the new testset needs to have the same n crossvalidation folds as the original trainingset that led to satisfactory information estimate in the first run? And if so, whether all stimuli form the separate testset should be included on every crossvalidation fold in the decoding design?

Many thanks for your advice.

Kind regards,

Basil

Hi Basil,

Sorry, for some reason your reply fell through the cracks.

That’s a good question. I think the separate testset is just the testset, but whether applying the model to the full dataset is equivalent to applying it to, say, 90% of the data when doing 10-fold-CV depends on the structure of the data. If the data come from the same distribution, then it shouldn’t matter. In that case, if you come from the “looking-for-information” perspective, rather than the “building-a-model” perspective, think of the separate test set as one fold in you outer cross-validation. Then the inner cross-validation would be a nested-CV for finding the correct hyperparameters. Once you have found them you can apply them to the entire training set.

Now, whether the data come from the same distribution or not is more difficult, but in the worst case, your classifier should just be worse than expected. In weird cases, it can be below chance, but the reasons for this are complicated.

Tl’Dr: Once you have your hyperparameters from the training set, you can train a model on that entire training set without cross-validation and apply it to separate test data.

Hope this helps,
Martin

Dear Martin,

Many thanks for your information. This was helpful.

Best wishes,
Basil