Using FMRIPREP on (somewhat) preprocessed data (ABCD)

I apologize in advance if this question has been asked already in some way shape or form, as it seems pretty elementary, but I figured this is the best resource of experts I could consult!

I would like to use FMRIPREP on minimally preprocessed, publicly available data. One basic question I have is whether running motion correction again on already motion-corrected data will…do bad things, I guess. Other questions and ruminations are below.

ABCD says that the minimally preprocessed T1s have been corrected for gradient nonlinearity distortions and intensity inhomogeneity, resampled to 1mm isotropic and aligned to an in-house atlas, however, when they ran Freesurfer they bypassed the initial intensity scaling and N3 intensity inhomogeneity correction due to the in-house processes run beforehand. Functional data has been HMC-ed with 3dvolreg, corrected for B0 distortion and gradient nonlinearities, and “resampled with cubic interpolation into alignment with each other, correcting between-scan motion, using a scan in the middle of the session as the reference.”

Given that they used 3dvolreg and FMRIPREP uses MCFLIRT, I worry about doing motion correction again (and not having it be to a reference in middle of all scan sessions). Should I be? Also, will running INU again cause issues? I believe I can skip distortion correction in the functional stream, but I don’t see an option for the structural intensity normalization. Is there a way to turn off these modules of FMRIPREP, and just run from skull-stripping and alignment between T1 and functional onwards? What would experts recommend doing from this starting point?

Basically, I’d still like to recon these folks as I’d like to do surface-based analyses, and ABCD surfaces are not being released yet. Also, will want to regress out CSF, etc, so having those confound regressors will be helpful, and the downstream analysis pipeline I’d like to use works with FMRIPREP output, so I was looking for a way to plug these subjects into that easily.

It seems to me that your data are pretty much preprocessed. What do you want to use from fMRIPrep?

I think what you really want is: 1) organize the data in a BIDS-Derivatives compliant fashion; 2) run CompCor or ICA-AROMA as fMRIPrep would do?; and 3) run FreeSurfer’s recon (skipping some steps).

You may find some of the workflows used by fMRIPrep and sMRIPrep (for the anatomical processing) useful, but I don’t think the tool fits your purpose.

Happy to help you find the pieces you need.

Thank you so much for the rapid reply and for your clarifications! That is exactly what I want to do–run Compcor and recon-all (I’ve managed to figure out how to organize data in an approximately BIDS-readable fashion) on these participants, before doing downstream confound regression and temporal filtering. I know how to bypass the steps in recon-all, and I think I could figure out how to put together a CompCor workflow using Nipype from the workflows in FMRIPREP.

However, talking to some folks in other labs here encountered thoughts about needing to re-interpolate the functional scans (which have already been resampled with cubic interpolation) to bring them T1 or to template space down the line being a problem. They thought that it’d be preferable to do it all in a single-shot interpolation as FMRIPREP does, and thus suggested just starting again with the raw (unprocessed) data and running that through FMRIPREP, which I could also do. Thoughts?

For aCompCor you can avoid re-interpolation at the estimation step by projecting the ROIs from T1w space into the preprocessed BOLD.

For AROMA, it only works in MNI space at this point so I guess there is no other way around.

Alternatively, you could try to gain access to the original data and run fMRIPrep :smiley:

We ran into similar issues with their pseudo-BIDS data. We’re currently planning to run HeuDiConv to convert their dicoms into BIDS format and then use FMRIPREP. But, we’ve hit a bit of a glitch with HeuDiConv (

One other thing I’ve noticed with the data so far (and many multiband sequences) is that the first several volumes could be discarded. Right now, we’re planning to shave off the first 5 volumes and adjust the timing files by the appropriate length of time.

Oh man, thanks for chiming in @dvsmith. That’s not reassuring–can you let me know if you get some resolution via email?

Re: discarding the first few volumes, I believe in the ABCD preprocessing paper they say they say “a total of 16 initial frames (12.8 seconds) are discarded. On Siemens and Philips scanners, the first eight frames make up the pre-scan reference, and are not saved as DICOMS. An additional eight frames are discarded as part of the pre-analysis processing, for a total of 16 initial frames. On GE scanners with software version DV25, the first 12 frames make up the pre-scan reference. Instead of being discarded, those 12 reference scans are combined into one, and saved as the first frame, for a total of five initial frames to be discarded as part of the pre-analysis processing for GE DV25 series. On GE scanners with software version DV26, the pre-scan reference is not retained at all, and a total of 16 initial frames are discarded for GE DV26 scans as part of the pre-analysis processing.”

I was planning to follow them and discard at least 8 vols from the Siemens and Phillips scanners.

Will do! I’m hoping it’s resolved next week, but I fear there could be other issues since we’ve only been testing a small subset so far (i.e., 5 subjects, and 2 of those happened to be Siemens data, which worked fine).

Thanks for clarifying the timing. I think I’m still missing something, and messing up the timing is one of biggest concerns with working with these data! Here’s what I got from Don Hagler and the ABCD team at UCSD:

The DICOMs are our preferred “raw” data format because they contain many private tags that could get lost in conversion. We put the files into a directory structure that mimics BIDS. Because of storage cost considerations, we don’t plan to also share the raw data as nifti files in BIDS, but we do hope to eventually release a docker container to convert the fast track data to raw BIDS.

About the stimulus timing files, no adjustment is required for them. Time=0 is the start of the first non-discarded TR, and is specified in the events files by values in specific columns of specific rows (a starting time that should be subtracted from all other stimulus times).

We have shared a set of matlab functions for reading the event files and generating timing files that can be used by AFNI or FSL:

But he also said this, which I think matches what you said:

For Siemens and Philips, you should remove 8 initial TRs.

Based on your reading of this, do you think time=0 in the *_events.tsv files that are included with their pseudo-BIDS data (mproc) matches time=0 in the dicom files (the fasttrack)?

The GE data seem pretty tricky to work with. I think the scanner/software version are in the *.json files (at least the ones that come out of dcm2niix/HeuDiConv), so I guess we could use that and treat those data differently instead of just removing the first 8 volumes and subtracting the 6.4 seconds from all the *_events.tsv files? Definitely don’t want to screw up the timing… :frowning_face:

My intuition reading this is actually the opposite–but maybe I haven’t thought about it too enough. I think it could be read either way, though.

My interpretation of what he means is that for Siemens and Phillips BOLD scans, you should remove the first 8 vols from the fasttrack data in order to match the t=0 in the mproc data. That’s my interpretation, but I think you should probably clarify!

I haven’t even tried working with GE scanner data, honestly, and given the trickiness, I may just skip sites with GE scanners. What have you decided do about the .json files ABCD gives you with Fast Track data (vs. the BIDS .json sidecar files)? I’m in the process of sorting heudiconv now, and was going to cross-check that I wasn’t losing any information by discarding ABCD’s .json sidecar files, but if you’ve already tackled that, I’d be curious what you came up with.

FMRIPREP automatically detects non-steady state (how many volumes should be removed) and, instead of removing them, encodes them as outlier regressors in the regressors.tsv file which you can you in the analysis. In theory you should be able to feed it data from different scanners and it will decide how many “bad volumes” are there on a case by case basis.


This is a tricky dataset! Hopefully we can figure it out together! I just looked through the data (Siemens) and I can confirm that the fasttrack data match the mproc data in terms of timing (and at least for this one subject):

tug87422@cla18994 /data/projects/abcd-mid/raw $ fslnvols mproc/sub-NDARINVE5XTBT7L/ses-baselineYear1Arm1/func/sub-NDARINVE5XTBT7L_ses-baselineYear1Arm1_task-mid_run-01_bold.nii 
tug87422@cla18994 /data/projects/abcd-mid/raw $ ls -1 fasttrack/sub-NDARINVE5XTBT7L/ses-baselineYear1Arm1/func/ABCD-MID-fMRI_run-20170603173136/sub-NDARINVE5XTBT7L_ses-baselineYear1Arm1_dicom00*.dcm | wc -l

I haven’t even begun to write in conditional statements for different scanners, but I may need to do that! I think I’m really only looking at the Siemens and Phillips data.

Right now, I’m ignoring their *.json files and only using the output of HeuDiConv. Here’s a link to my current workflow ( If HeuDiConv can be updated to work with the GE data, it might help resolve some of these issues, but I’m still a bit paranoid about timing. So, in the case, I may go with Chris’ suggestion and just include the dummy regressors. My only (minor) reservation about that approach is that I suspect ICA-AROMA will not work optimally if its output is biased by the non-steady state volumes.

We remove non-steady-state volumes prior to running ICA-AROMA.

That’s awesome! Thanks for clarifying and making such a great product for the community! We’ll adapt our workflow to include the NonSteadyVolumes regressor in the future in cases where AROMA is run outside of FMRIPrep (a rarity for us, but happened with one dataset where we started with FMRIPrep data). We already default to using AROMA in FMRIPrep, so it seems like all the non-steady state volumes are regressed out during the denoising process, if I understand correctly.

fMRIPrep preserves the number of volumes, so in this case, the NSS volumes are removed, ICA-AROMA is run, and then those volumes are prepended. You will still want to use the non_steady_state_* regressors for desc-confounds_regressors.tsv when analyzing your AROMA-denoised data.

1 Like

Sounds good. Thanks! We’ll be doing that going forward. Seems to be really important for MB sequences… Thanks again for such a great tool and resource!

A post was split to a new topic: ABCD processing preparation

I’m coming back to this super late, but because I saw discussion of a topic I’m interested in–I’ve seen some posts here that ICA-AROMA is not well-suited for multiband data, but @dvsmith you seem to think it is okay? @effigies did you have particular concerns?

Hi @Ursula_Tooley, no I don’t. Mostly I wasn’t prepared to endorse the practice, since I haven’t tested it myself. Since I do sometimes make explicit recommendations for fMRIPrep settings, I didn’t want to be interpreted as an authority in this context. It sounds like others are comfortable with it, but I would still suggest to anybody to try it with their own data, examine the results, and see whether it seems to help.


Hi @Ursula_Tooley – I agree with @effigies that users should try it out and examine the results. In our experience, ICA-AROMA does seem to produce reasonable results with the HCP data and ABCD data (and perhaps similar multiband data).

Unrelated to ICA-AROMA but related to the discussion about timing and the ABCD data earlier this thread: I’ve had a bit of back and forth with USCD and specifically Don Hagler about this issue (see mailing list for details), and we’ve recently confirmed that their documentation might be slightly misleading. Although it’s true that the number of volumes/TR you might want to remove will be more or less depending on the scanner, any adjustment to the imaging data should be matched by an equivalent adjustment to the timing files. So, if you decide to remove 10 volumes/TRs from the imaging data (fast track or mproc), you need to subtract 8 seconds (10 * 0.8) from the onsets in the *_events.tsv files.

1 Like

That’s really helpful, @dvsmith, thanks for sharing! I swear, there’s needs to be an ABCD wiki somewhere…there is so much knowledge floating around in disparate places. In case you haven’t seen and this is still relevant for you, there’s a new repo that would hopefully streamline what was a bit of an onerous process for me:

I’ll message you separately about the mailing list–I’m not aware of it, my communication with Don Hagler had all been through the NDA Help Desk.

Thanks! We’ve been using HeuDiConv to convert their DICOMS to BIDS format, but it only works on the Siemens data at the moment – hoping we can get that fixed soon, but this looks like another option!