Dear experts,
Dear community,
The prime tenant of classification is: Do not engage in double-dipping. Put differently, make sure your data is not married before the analysis.
Recently, a colleague of mine claimed that motion correction (e.g., McFLIR) would count as double dipping if the data is not partitioned before the analysis. As I am new to this type of analysis, I would need some community support to evaluate the validity of this statement, so please bear with me.
Premise
Let’s assume we have an fMRI experiment with 4 conditions (A-D) and we want to train a classifier on SINGLE-TRIAL beta-maps to distinguish between them.
Workflow
Normally, I would feed the whole dataset into a prepro pipeline (motion correction, b0 unwarping, smoothing, ICA_AROMA, temporal filtering) and then create a LSS or LSA GLM for the beta maps and prewhiten the data using e.g. 3dREMLFIT
Problem
Train and test data have to be independent from each other, meaning they are not allowed to ‘see’ each other prior the classification. However, according to my colleague there are 3 problems with my pipeline:
- Prewhitening and the GLM with all the data points:
Prewhitening uses the GLM output to remove autocorrelations. However, since it uses all the data, the ‘trainings’ and ‘test’ beta maps already were in a common GLM meaning, they influenced each other prior to the analysis → double dipping
- ICA__AROMA
ICA_AROMA uses GLMs to find good and bad components and tried to automatically remove the bad ones → same problem above. Test and train betas were in the same GLM constituting double dipping.
- Motion Correction
Ultimately motion correction is just a GLM, so the same logic applies.
Open Questions
I have never explicitly read about this potential issue in the field of classification (but I am also a newbie)
Assuming you have one long run and you preprocess this run in one go, does (1) motion correction, (2) ICA and (3) prewhitening count as double dipping?
(Logically, it should but I am not really sure)
Potential Solution
If these steps count as double-dipping and I want to train my classifier on single trial data, I could only come up with the following idea:
Partition my data from the get-go:
- cut out each trial and fixation-cross prior (as baseline)
- run each of the preprocessing steps separately for each trial (motion correction, smoothing, (no Ica → makes no sense for 15 data points I guess), temporal filtering) and
- then create a single GLM for each trial → this will be very noisy and I have to see if it works
Conclusion
Does motion correction count as double-dipping?
Does ICA_AROMA count as double-dipping?
Does prewhitening count as double-dipping?
Does anyone have a better idea how I could proceed from here?
Anyone some literature recommendations for my problem?
Thanks in advance!