ComBat and cross-validation

Dear Neurostars,

Recently a few papers came out on ComBat; a tool that can be used for removing site effects from your multi-site datasets. According to these papers; it does a pretty good job removing unwanted scanner artifacts from modalities like DTI, functional connectivity (ref) and cortical thickness measurements ref.

In short, ComBat is an extension of a linear regression model that uses empirical Bayes to improve the estimation of the site parameters. I wanted to give this a try myself; and luckily ComBat code is available for Matlab; R and Python

Now I want to try out ComBat for my classification study, in which I try to separate patients from controls using cortical features of >4000 subjects, coming from 46 unique sites around the world. My model optimization, training and testing are performed in separate (inner- and outer-) cross-validation loops. The problem however is that ComBat seems to be a “one-shot” approach; in the sense that it is run only once on the entire data set instead of fitting the harmonization model’s parameters on the training data only; and apply them on both training and test data like you would typically do.

Is anyone more familiar with these kind of harmonization techniques, or has some recommendations on how to get rid of site-specific effects in multi-site data-sets in a cross-validated manner?

Many thanks,



1 Like

For anyone interested; here are the github repo’s for ComBat inplemented in Matlab, R and Python:

I am not an expert in these kind of approaches.
However, I also coded a python version of Combat and applied to DWI, as Fortin’s paper.
On my experience, there are two key parameters (cant recall them now), computed during the harmonization process, that you could export and re-use in new data.
However, I would not recommend to do so, since these algorithms are very sample-dependent. I have tried to harmonize the who sample or just subsets of patients, obtaining different results. In my case, I decided to harmonize my data in an analysis-specifc case.

Hi Williem,

Could you please share with us what you did to apply Combat in a cross-validation?


I’ve found that training combat on the training set and applying it to the test set can actually induce site effects in the test set. I’ve had the most luck applying it separately to the training and test sets. Assuming that your data is sufficiently large, that’s what I’d do.

Here’s some example code where I do that:

@WillemB2104 neuroCombat seems to be abandoned, do you have any other channels for contacting the author and seeing if they’d be interested in reviving the project?

@slieped would you mind linking to your implementation? Did you manage to implement the non-parametric approach as well as the parametric one?

Hi there, sorry for the delay. I found the neuroCombat author’s email address through his github repo and will ask around there. Our colleagues from neurofedora have contacted him this week as well and they had the same questions regarding the project status. Might be worth to follow as well, see here:

Will keep you posted!

It looks like Dr. Fortin himself has taken over the repo:

Hi everyone, new to the community so here’s a quick introduction.

I’m Laura, PhD student at Amsterdam UMC and I’m using multi-site consortium data to develop a brain age prediction model.

I’m following up on some of the discussion in the QA session after multi-site harmonization at HBM. I’m also very keen on finding out whether it is possible to use the learned ComBat parameters from the training set to apply to the test set in a cross-validation framework.

As I understood from the QA session, @Shotgunosine you have managed to do so, but unfortunately it resulted in the classifier model to perform below chance. I was wondering whether you think it would also lead to poorer performance in case of regression (compared to no harmonization)?

I was also hoping you could share your thoughts on the following approach:

Perform a half-split on the all the control data I have from >18 scanners. Use 50% of this data to train a brain age prediction model using cross-validation, ensuring that each fold has the same scanners in the training and test set.

Now ideally, I’d like to run ComBat on each training fold and predict in the testset. I would want to use the same learned weights to the other 50% of the control (and 100% of the patient) data.

But I unfortunately don’t know how to do this, and maybe it doesn’t work, so alternatively, what are your thoughts on running ComBat separately in each training and test set in each fold?

I would assume that the site effects will be estimated differently due to the number of subjects per site in the test sets.

I found a paper by Wachinger et al. ( that seems to apply ComBat within a brain age prediction framework (resulting in improved metrics), but as the authors did a leave-site out evaluation, I am assuming they applied ComBat to the whole dataset prior to training the brain age prediction model (i.e. data leakage)? But might be wrong here.

And maybe as a final more general question, how bad is it to harmonize the imaging features across all samples prior to training your model?

Looking forward to connect and hearing your thoughts and ideas!


Based on my experience, I’d run it separately on the training and test set.

For those who are interested, here are combat_fit and combat_apply functions in R.

1 Like