Training and Testing with different data sets

Karel_Lopez · March 7, 2019, 3:16pm

Dear TDT experts,

I would like to train the classifier with one data set and then test it in a completely different data set. The data is about a memory experiment. So, I have a set of beta images of an encoding phase (7 runs/sessions) and another one of a retreival phase (7 runs/sessions).
How could I do to train the classifier in some enconding conditions(labels) and then test it the retrieval conditions(labels)?
I tried to joint all the sessions from 1 to 14. So, TDT recognize all my regressors. But, when I choose the labels that I want to train and the ones that I want to test, there is a problem because the design includes all the chose labels for training and testing (if I use make_design_cv). If I use make_design_xclass_cv, then the design looks much more of what i am looking for, but, there is an error because the number of used labels varies across decoding steps. Finally, something similar occure when I manualy modify the design.

Are there some way or type of design which achieves this idea?

Best regards,
Karel

Martin · March 7, 2019, 7:55pm

Hi Karel,

It’s kind of difficult to know what the problem is without more details (e.g. error messages). I think you don’t want to use xclass_cv but xclass (and make sure to set twoway = 0). Also, all labels need to be present in the training and the testing data.

It sounds as if you have different labels in different runs. You have to be careful with that, because the effects of run are huge, which can lead to quite some confounds if you don’t have the effect of run balanced across labels.

Best,
Martin

Karel_Lopez · March 19, 2019, 11:30am

Hi Martin,
Thanks for the reply. I have tried with xclass and it worked. The main idea was to train the classifier in a defined regressors/conditions to discriminate between faces and scenes (a set of beta images from the first part of the task in which participants saw faces and scenes), and then test it in different regressors/conditions (a different set of beta images from the second part of the task in which participants had to remember/retrieve if there were a face or a scene). Here is a brief example of how I tried to do that:

labelname1=‘encoding_faces*’
labelname2=‘encoding_scenes*’
labelname3=‘retrieval_faces*’
labelname4=‘retrieval_scenes*’

cfg = decoding_describe_data(cfg,{labelname1 labelname2 labelname3 labelname4},[1 -1 1 -1],regressor_names,beta_loc,[1 1 2 2]);

Keeping that idea, the number of runs and trials are balanced in the training set of my data. But, there’s no balanced data in the test set because the number of trials in a condition depends on the participant performance. So, I dont have the same number of runs in some common regressors. Does the testing set needs to be balanced?

Best,
Karel

Martin · March 19, 2019, 12:13pm

If you do classification on trialwise regressors, I would use balanced accuracy or better AUC for estimating performance in the test set, to make sure it’s not just classifier bias. If you do classification on runwise regressors, then there should still be one regressor per condition per run and it should be fine.