ComBat and cross-validation

neuroimaging
mvpa

#1

Dear Neurostars,

Recently a few papers came out on ComBat; a tool that can be used for removing site effects from your multi-site datasets. According to these papers; it does a pretty good job removing unwanted scanner artifacts from modalities like DTI, functional connectivity (ref) and cortical thickness measurements ref.

In short, ComBat is an extension of a linear regression model that uses empirical Bayes to improve the estimation of the site parameters. I wanted to give this a try myself; and luckily ComBat code is available for Matlab; R and Python

Now I want to try out ComBat for my classification study, in which I try to separate patients from controls using cortical features of >4000 subjects, coming from 46 unique sites around the world. My model optimization, training and testing are performed in separate (inner- and outer-) cross-validation loops. The problem however is that ComBat seems to be a “one-shot” approach; in the sense that it is run only once on the entire data set instead of fitting the harmonization model’s parameters on the training data only; and apply them on both training and test data like you would typically do.

Is anyone more familiar with these kind of harmonization techniques, or has some recommendations on how to get rid of site-specific effects in multi-site data-sets in a cross-validated manner?

Many thanks,

Best,

Willem


#2

For anyone interested; here are the github repo’s for ComBat inplemented in Matlab, R and Python:



#3

I am not an expert in these kind of approaches.
However, I also coded a python version of Combat and applied to DWI, as Fortin’s paper.
On my experience, there are two key parameters (cant recall them now), computed during the harmonization process, that you could export and re-use in new data.
However, I would not recommend to do so, since these algorithms are very sample-dependent. I have tried to harmonize the who sample or just subsets of patients, obtaining different results. In my case, I decided to harmonize my data in an analysis-specifc case.