fMRIprep+ICA-AROMA filtering including WM, CSF, etc. confounds in fsl_regfilt


I am using fmriprep to preprocess my rsfMRI data and I am curious if appending ‘CSF’ and ‘WM’ columns from the *confounds.tsv (as well as perhaps other columns, e.g., ‘GlobalSignal’) to the MELODIC mixing matrix is an acceptable approach for simultaneously regressing out “noise” independent components and physiological signals. I have some Nipype code for doing this ( ), but the process goes as follows:

  1. run fmriprep with the AROMA option turned ‘on’
  2. append columns in the *confounds.tsv I wish to remove (i.e., WM, CSF) to the *MELODICmix.tsv, save as e.g., *MELODICmixPLUSPhysio.tsv
  3. append the indices of those columns to the *AROMAnoiseICs.csv, save as *AROMAnoiseICsPLUSPhysio.tsv
  4. run fsl_regfilt, with the “design” file being *MELODICmixPLUSPhysio.tsv and the “filter columns” being *AROMAnoiseICsPLUSPhysio.tsv.

This approach should theoretically get around having to re-extract the WM, CSF, etc. from the *variant-smoothAROMAnonaggr_preproc.nii.gz file if I want to use ICA-AROMA in combination with physio confounds ( ). The one concern I have, however, is whether there is an issue with having potentially “competing” physiological signals in both the “signal” and “noise” components of the fsl_regfilt equation. That is, there may be “WM” and “CSF” components in the MELODIC mixing matrix that aren’t flagged by ICA-AROMA (and thus kept as “signal”), which are competing for variance with the “WM” and “CSF” signals from the *confounds.tsv (which I explicitly define above as “noise”) in fsl_regfilt.

Any thoughts or prior experiences with this problem would be greatly appreciated!


1 Like

Hi @burwell,

I’m going to ping @jdkent and @rastko here, as they are our experts in denoising :D.

There was some great discussion about this issue here, with two great outcomes in the form of Jupyter notebooks (here and here).

Hopefully, you’ll find there some useful information.


1 Like

Thanks @oesteban!

I don’t think I have an answer for you, but I can say the method of regressing WM/CSF post denoising is different from including the WM/CSF in the melodic mixing matrix. And I think the explanation you give is good, but I will re-phrase in my own words to see if that gets us anywhere.

regressing post denoising

In this scenario, the data will be made completely orthogonal to WM/CSF, That is, there will be no shared variance between the residual data and WM/CSF.

pro(?): the data will be completely independent of the WM/CSF signal.
con: you have to recalculate the WM/CSF measures (and you probably do not want to extract them from a smoothed image)

adding WM/CSF to Melodic mixing matrix

In this scenario, the data will somewhat independent of the WM/CSF, depending on how much variance was shared between the signal components and the WM/CSF. And then, depending on the specific data, a portion of the shared variance will be attributed to the signal components and a portion to WM/CSF. I don’t believe there will be a “winner-take-all” effect where either the signal components or the WM/CSF take all the shared variance, and leave nothing for the other. If the shared variance feels like a large concern, you can create a covariance matrix with the signal components and WM/CSF to see if there is decent variance shared (I don’t have a good idea on what constitutes a large amount of shared variance).

pros: you do not have to recalculate WM/CSF
cons: The data will not be independent with respect to WM/CSF

  • this assumes non-aggressive denoising, if you do aggressive denoising, then the signal components will be made orthogonal to all other columns in the melodic mixing matrix.

EDIT: thought of a third option

make the WM/CSF orthogonal to the columns in the melodic mixing file

If you take the original WM/CSF signals, make them orthogonal to the noise columns within the melodic mixing file, and then regress the WM/CSF residuals from your data. This will ensure that the shared variance from the WM/CSF and the noise columns will not be introduced (I think), but any shared variance between WM/CSF and the signals will be given to the WM/CSF.

Alternatively, you could make the WM/CSF orthogonal to all the columns in the melodic mixing file, and then only unique WM/CSF variance will be regressed from the data.

So no need to recalculate the WM/CSF signal from the data.

@oesteban: I was working on a prerequisite to solve this problem in fmriprep to at least make the first scenario easier, I’ll ping the people there to see if I can continue working on it, or if it’s still too much of a moving target.



Thank you @jdkent for the detailed response, and simple pro/con takeaways!

As you’ve outlined, Scenario 1 appears to be the most surefire method (and I believe original method from Pruim et al., 2015) to ensure that the WM/CSF signals are orthogonal to the output time-series, and Scenario 2 might result in residual associations with WM/CSF. I was really hoping to do all of the “denoising/regression” in one step (i.e., including CosineXX, NonStationaryXX, WM, CSF, etc. in the same “noise” matrix), which makes your Scenario 3 appealing.

I guess I’ll have to try all three and see which works best. The Jupyter links sent by @oesteban look pretty helpful too, but I’ll need a little more time to work through them.

Thanks much!

Hi both,

Thanks for this discussion, I am thinking about adopting the procedures you outlined here. I am just having some trouble thinking about how you can integrate confounds from T1 space and the AROMA noise IC’s in MNI space - are the WM/CSF confounds in T1 space still valid or do they need to recalculated.

Ideally, I would like to extract ROI timecourses (aparcaseg) in T1 space that have been denoised using ICA-AROMA, WM, CSF, high pass filter and a linear detrend. But I am not sure if it is valid to use the AROMA noise components in data from T1 space. Do you have any thoughts on that?

Perhaps a similar approach to this post: Preprocessing for atlas based connectivity matrices

Aslo @jdkent concerning your tests on confound regressions ( was it slightly better to regress the WM and CSF signals in T1 space from the smoothAROMAnonaggr_bold.nii.gz, as opposed to recalculating the regressors after ICA-AROMA?

Many thanks,

1 Like

This is a good point you raise, @JoffJones, regarding the discrepancy between the WM/CSF obtained from T1 / native space, and the AROMA components obtained from MNI space. FWIW, I have since decided to proceed with Scenario 1 explained by @jdkent on Jan 7. Perhaps others on this thread could speak to using AROMA component time-series (MNI space) to preprocessing data in native space.

Thanks for the quick reply @burwell!
I was also thinking about doing the same using AFNI’s 3dTproject to simultaneously remove WM, CSF, linear detrend and highpass filter from the data after ICA-AROMA (in MNI space). In Pruim et al (2015) they re-calculated the WM and CSF after ICA-AROMA, but these estimates aren’t currently provided in fmriprep - although there is talk of including it (

If you’re interested, I have some Python code that extracts the WM, CSF, and global signals from AROMA-corrected data (unsmoothed), and then uses 3dTproject to do simultaneous highpass cosine / WM / CSF regression, followed by ROI extraction. See my the script in my GitHub repo for this project ( ). I admit, the code is a little kloodgy, so if you have a questions please don’t hesitate to ask or post an issue to the GitHub!


Thanks, that is really handy! I will defintitely check it out. Does this use the eroded the WM and CSF masks as in fmriprep?

Yep, it uses eroded WM / CSF masks via fslmaths / Threshold (see lines 214-215 in the script). I’m not sure if the kernel for erosion is identical to what’s normally implemented in fmriprep, but they’re eroded nonetheless!

Sorry to bump a long-ago thread, but we found this discussion very helpful, stimulating discussion and a question about the third scenario described by @jdkent as:

If anyone is still listening to this thread, we were wondering if it would be equivalent to reverse the orthogonalization, by orthogonalizing the ICA-AROMA signal components to the WM/CSF, and then just include the residual ICA-AROMA signal, ICA-AROMA noise, and the WM/CSF (as additional noise) all in the same regression, filtering out all noise components? Variance shared between WM/CSF and the signal components would be assigned to WM/CSF (and therefore removed) and variance shared between WM/CSF and the noise components would be shared between them somehow (and also removed). This approach would save computation and disk space (which is at a premium right now with the number of subjects we are running). We can test this empirically, but welcome the collective’s expertise!