Training Accuracy in TDT

RjZ · January 30, 2019, 4:02pm

Dear TDT Developers/Users,

Maybe I missed something but how does one check the Training accuracy of the classifier? My understanding is that the ‘accuracy/accuracy minus chance’ is only referring to the performance on the test data. Since I have below-chance decoding performance, I would like to check how well the classifier is actually learning the training data so that I can determine whether the problem occurs in the poor training (low accuracy during training already) or poor predicting (in this case, maybe over fitting the training data?). Or are there some other ways to check the training performance in the toolbox?

Your suggestion would be much appreciated!

Cheers,
RJ

Martin · January 30, 2019, 9:50pm

Hi RJ,

In most cases, the training accuracy is 100%. The reason is that often we we have very simple classification problems, so the classifier will be able to separate the data perfectly. This will almost definitely be the case when you have more voxels than samples. For that reason, we just don’t report it, because it costs unnecessary time.

However, in case you still want to check it anyway, then you need to set your testing data to your training data. Once your design has been created, set
cfg.design.test = cfg.design.train;

Then, set the flag
cfg.design.nonindependence = 'ok';
otherwise TDT won’t let you run that analysis.

That’s unfortunate. Typical reasons for this are (1) bad luck (because the null-distribution can have a long tail in the negative range, i.e. it might actually be a null-effect), or (2) an uncontrolled confound. This means that in some chunks (i.e. runs), there is a positive relationship of the confound with your labels, while in other chunks there is a negative relationship. This can be introduced when you explicitly try to counterbalance conditions between runs, or more generally when there your data are non-stationary. See this paper here (arxiv version here). I also explain the effect it in this talk (slides here). Other times, it is more difficult to find out what is going on. Reading this may help, and perhaps this.

For a start, perhaps try a simpler split-half analysis (odd vs. even runs), e.g. using a correlation classifier, which may avoid some of these issues.

Best,
Martin