Combat versus Mixed Effects Models

Hi all,


Our lab just read a paper about Combat which is a method to eliminate site effects on your data (in this case using functional correlation/coherence matrices).
So if you collect MRI data from multiple locations (or are merging multiple studies from open neuro), you can extract measures from those data and “harmonize” them.


Is combat any better/different than including site as a fixed and random effect using a mixed effects modeling approach?

Like the model below (in R syntax)

lmer(fc ~ age + sex + site + (1|site))

Has anyone compared Combat and a mixed effects approach using neuroimaging data?
I see a paper using/simulating transcriptomics data, but I’m curious is anyone has an opinion/evidence in the neuroscience community.

@Shotgunosine, internet sleuthing suggests you are interested in/knowledgeable about this topic, do you have any thoughts?


I don’t have an answer or strong opinion on this yet, but jumping in to follow the thread, as well as to point out that I am interested in working with others interested in multi-site harmonization (deconfounding in general). I did bunch of research to get a sense of where our community is at, and I feel several open challenges remain, and that widely used code is not tested sufficiently to trust them (including the method you mention here)… I personally believe a good way we can start solving this complex problem is developing common library of trustworthy implementations (for ComBat, LMER etc), so we can compare them properly…

here is what I have already… love to work with anyone interested in implementing ComBat or related techniques:


I haven’t run the comparison myself, but my intuition is that ComBat and mixed effects regression should be broadly equivalent. If you were doing an analysis where a mixed effects regression was feasible, I would go with that. In analysis where it isn’t feasible, for example, some machine learning approaches, adding a site effect removal step with ComBat might be easier, but care should be taken to apply ComBat separately to the training and test data to avoid leaking information.

@raamana I don’t know if implementing all of the approaches in a single package is going to be feasible. For lme4 in python I’ve been using pymer4, and I made a little wrapper to make it slightly more sklearn-esque, sklearn-lmer. For combat in python I’ve just been using an rpy2 wrap of the original combat code.


I don’t mean to include scipy and stan and statsmodels etc in that library, all I mean is to have an intuitive Interface and trustworthy algorithm-level implementations that are tested uniformly to not just behave the same way for the purposes of deconfounding… take a quick look through the code:

I’ve organized and named the classes in a way that I think makes sense and beneficial for our community, as we all need these more and more going forward…

And take a look at the example in docs which addresses an important problem you just mentioned (mistakes in the cross-validation scenarios):

1 Like

If it’s easier to brainstorm this, I can organize a quick Zoom meeting… let me know!

Also, I remember we talked about rpy2 and pythons combat sometime ago, my position remains the same: need a pure/mostly python library that is well-tested… that doesn’t exist to my knowledge (even with rpy2 and R implementation of combat)

I agree that a well tested version of ComBat doesn’t exist. Unfortunately my priorities are elsewhere these days and I don’t have much time to work on implementing something like that.

Thank you for sharing @raamana, I also do not currently have the bandwidth to tackle this problem right now, but I am graduating this summer and I would be interested to circling back on this post graduation.

1 Like

No worries… I know most would be in that boat… posting this openly so anyone focusing on multi-site harmonization or deconfounding as their main research project is welcome to use and contribute…

above effort can potentially lead to high impact and/or surprising results, as most prior results were produced without a good implementation (and likely incorrect)…

1 Like

It’s a bit more complicated. I found the description of ComBat (with some other competing methods) in this paper useful:

The main issue is that ComBat lets the error variance vary site, constrained by a Bayesian model that is fit with Empirical Bayes. Further, the site random intercepts have a common variance over voxels but are site-specific, but the inter-site variation is again constrained by a prior.

As I understand it, lmer is not very flexible with what you can do with the measurement error term (e.g. you can’t impose an AR structure, or even heteroscedasticity).

So, no, ComBat is not something you could easily fit in a vanilla lmer model.


Thank you @nicholst for the additional clarification, I’ll try to translate this to a nlme model, which allows for error variance to vary by site (i.e., heteroscedasticity).

In my imaginary scenario, I’m trying to compare FA in the uncinate between Bipolar and Control groups. Let’s say there are a 100 voxels in the uncinate, so each person gets 100 observations of uncinate FA. Further, ~50 people were collected at 4 different sites.
So my data could look like this:

| participant_id | site | voxel | fa   | group |
| C1             | A    | 1     | 0.31 | C     |
| C1             | A    | 2     | 0.29 | C     |
| C1             | A    | 3     | 0.21 | C     |
| ...            | ...  | ...   | ...  | ...   |
| B50            | D    | 1     | 0.43 | B     |
| B50            | D    | 2     | 0.19 | B     |

If I wanted to model the difference in FA between Bipolar and Controls while adjusting for site variance (that can vary between site), and including random intercepts for voxels (nested within site?), the model could look like:

lme(fa ~ group + site,
    weights=varIdent(form = ~1 | site))

I haven’t tested this syntax, but from what I could translate, this “looks” like what you’ve explained (excluding the priors). Does this LMM look closer to Combat to you?

pinging @Jeanette_Mumford because I think she would also have an opinion on this.

Thanks for this discussion!


Dear @jdkent,

I don’t know nlme so well, but, sure, it does seem like this is going to get quite close. The main thing is that ComBat uses an Empirical Bayes approach… not sure how different that’ll be from a completely unconstrained approach. I imagine if you have very few sites ComBat might be better behaved… maybe? Not sure.

1 Like

One other thing to consider is that if I remember correctly, most uses of ComBat in neuroimaging have been applying it at the level of ROI statistics. ComBat is assuming that all of the correction factors for a given site come from the same distribution, so if you’re doing a voxel level analysis, that’d be assuming that the voxels are accurately modeled as coming from the same distribution. It’s easy to imagine ways that a voxel level analysis might violate that assumption if the masking or tissue segmentation isn’t great. Applying it at the level of ROI statistics makes it a little bit less likely that the assumption of a single distribution is going to be violated.

Also @nicholst thanks for clarifying the difference between ComBat and a mixed effects approach. I’m not sure how big the difference in results would be in practice but I’d be interested to see.

1 Like

Hi All - I am currently finding myself faced with this choice as well. In my case, my scanner effect is between subjects (similar to jdkent’s description) due to a scanner brand change in the middle of data collection. Some threads have suggested this could be addressed as a nested effect in a linear mixed-effects model. I am not sure that this is the best approach for a between-subjects factor with combat correct as an alternative option.

Any updated work or suggestions on this topic is greatly appreciated.