Classification: Are common prepro steps (e.g. motion correction) considered double dipping?

Dear experts,
Dear community,

The prime tenant of classification is: Do not engage in double-dipping. Put differently, make sure your data is not married before the analysis.
Recently, a colleague of mine claimed that motion correction (e.g., McFLIR) would count as double dipping if the data is not partitioned before the analysis. As I am new to this type of analysis, I would need some community support to evaluate the validity of this statement, so please bear with me.

Premise
Let’s assume we have an fMRI experiment with 4 conditions (A-D) and we want to train a classifier on SINGLE-TRIAL beta-maps to distinguish between them.

Workflow
Normally, I would feed the whole dataset into a prepro pipeline (motion correction, b0 unwarping, smoothing, ICA_AROMA, temporal filtering) and then create a LSS or LSA GLM for the beta maps and prewhiten the data using e.g. 3dREMLFIT

Problem
Train and test data have to be independent from each other, meaning they are not allowed to ‘see’ each other prior the classification. However, according to my colleague there are 3 problems with my pipeline:

  1. Prewhitening and the GLM with all the data points:

Prewhitening uses the GLM output to remove autocorrelations. However, since it uses all the data, the ‘trainings’ and ‘test’ beta maps already were in a common GLM meaning, they influenced each other prior to the analysis → double dipping

  1. ICA__AROMA

ICA_AROMA uses GLMs to find good and bad components and tried to automatically remove the bad ones → same problem above. Test and train betas were in the same GLM constituting double dipping.

  1. Motion Correction

Ultimately motion correction is just a GLM, so the same logic applies.

Open Questions

I have never explicitly read about this potential issue in the field of classification (but I am also a newbie)

Assuming you have one long run and you preprocess this run in one go, does (1) motion correction, (2) ICA and (3) prewhitening count as double dipping?
(Logically, it should but I am not really sure)

Potential Solution
If these steps count as double-dipping and I want to train my classifier on single trial data, I could only come up with the following idea:
Partition my data from the get-go:

  1. cut out each trial and fixation-cross prior (as baseline)
  2. run each of the preprocessing steps separately for each trial (motion correction, smoothing, (no Ica → makes no sense for 15 data points I guess), temporal filtering) and
  3. then create a single GLM for each trial → this will be very noisy and I have to see if it works

Conclusion
Does motion correction count as double-dipping?
Does ICA_AROMA count as double-dipping?
Does prewhitening count as double-dipping?
Does anyone have a better idea how I could proceed from here?
Anyone some literature recommendations for my problem?

Thanks in advance!

Double-dipping is using class labels, or some measure that correlates with class labels, to drive some decision in processing such that class A is processed differently than class B. The processes you’re describing would not induce this in themselves, but you can definitely find yourself in the situation where your classes do get processed differently due to problems in your experimental design.

For example, if you have a task where in condition A the subject subvocalizes a word and in condition B they speak it out loud, you would very likely be inducing motion that confounds with the measures of interest. Head motion correction and regressing out motion parameters are intended to remove this confound, but at the cost of also removing real signals that you have good reason to believe should be there and correlate with the head motion. In a regular GLM context, I would expect to lose the ability to detect these signals against baseline, but it’s plausible that in a classification context you would get higher discriminability against other conditions. Alternately, you might have a condition that only occurs in run 1 and another that only occurs in run 2; now there is no direct way to contrast them because they do not have a shared baseline.

I haven’t thought much about how ICA-AROMA might play out in this context.

To describe running a GLM with pre-whitening as double-dipping seems like a category error to me. The problem here is signal independence, not labeling. LS-S and pre-whitening are in fact intended to isolate combined signals to the greatest extent possible.

The statistical problem you’re describing seems more to be about independence of measurements and generalizability of results. You can train and test within a run, and the statistics may be valid, but they can only tell you something about that run. You can collect multiple runs and analyze them in a leave-one-run-out cross-validation scheme, and that can tell you something about that session. You can bring the subject back for multiple sessions, and learn more about that subject. And you can run many subjects and learn something about their cohort. But every study will have limited generalizability.

Anyway, this got a bit long. If your colleague has an argument that these common preprocessing steps make data unusable in a classification context, I would be interested to read it, but I don’t think you can say, without looking at the task and study design, that preprocessing is double-dipping.