Question about sorting out motion outliers in subjects of dHCP dataset

Huiqing_Hu · April 22, 2020, 3:23pm

Hi all,

I am trying to sort out motion-outlier volumes in each subject and exclude subjects with more than 160 motion outliers.
At first, I applied the “fsl_motion_outliers” command to preprocessed functional data and found only 4 subjects were going to be excluded. This figure isn’t consistent with that in a paper written by dHCP group (https://www.biorxiv.org/content/10.1101/2020.01.20.912881v2.full.pdf). After comparing the “fsl_motion_outliers” script and the “dHCP neonatal fMRI pipeline” scripts, I realised it’s because I shouldn’t use preprocessed functional data to get outliers. I might have to run calculate DVARS after the raw functional data are motion and distortion corrected and before the data are cleaned (ICA + FIX), which means I need to do PRE-MCDC and MCDC steps on the raw functional data. Is my assertion on the raw data correct? If yes, these steps may need quite a bit of work.
Does anybody know if the list of the number of motion outliers in each subject have been provided somewhere? Or is anybody of the dHCP group able to provide this list?

Thanks in advance for your help!

Kind regards,
Huiqing

seanfitz · April 22, 2020, 4:08pm

Hi Huiqing

There are numerous ways to determine volumes that are motion outliers, but there are two common features, 1) applying some metric as an estimate for motion, and 2) determining a threshold for that metric. We do not supply ‘outliers’ because there is no one standard.

fsl_motion_outliers has a choice of 5 metrics and allows you to either set the threshold manually, or determine the threshold per file automatically as box-plot cutoff = P75 + 1.5*IQR

If you use fsl_motion_outliers with DVARS as the metric, and the automatic threshold, on the clean dHCP fMRI data, then you will not get many outliers (as you found) because the cleaning process has dramatically reduced the the volume-to-volume variation. You should either run it on the RAW data, or the post-mcdc (pre-denoised) data. Both are available for download.

But perhaps the easiest option would be to use the framewise_displacement metric, contained within func/sub-{subid}_ses-{sesid}_motion.tsv, and setting a fixed displacement threshold. 0.25mm is a popular threshold, but we have found it to be too conservative in this cohort, so you will want to relax that. See the appendix 9.4 in https://www.biorxiv.org/content/10.1101/766030v2

Why did you choose 160 as the number of corrupted volumes by which to exclude subjects? It is only ~7% of volumes, which means you will discard subjects for whom 93% of volumes are ok…

Hope this helps.

Cheers, Sean

Huiqing_Hu · April 22, 2020, 4:45pm

Hi Sean,

Thanks for your quick reply!
Sorting out outliers by using the post-mcdc (pre-denised) data would be a good choice, but I didn’t find this data in the dataset that I downloaded, which includes the ‘soursedata’ folder and ‘dhcp_fmri_pipeline’ folder. Could you please send me the link to download it?
I also noticed that you didn’t set a minimum duration of continuous uncorrupted data in this paper (https://www.biorxiv.org/content/10.1101/766030v2). The ‘160 volumes’ option follows the ceretia used in this paper https://www.biorxiv.org/content/10.1101/2020.01.20.912881v2.full.pdf (Functional data pre-processing part) . Specifically, they chose the continuous set of 1600 volumes with the minimum number of motion-outlier volumes to do following analyses, and subjects with more than 10% of the cropped dataset (60 motion-outlier volumes) were excluded entirely.

Best,
Huiqing

seanfitz · April 24, 2020, 2:05pm

Hi Huiqing

The intermediate (MCDC) fMRI is currently missing from our site for some reason. I will get it added again ASAP and then update you.

I know it is common in the neonatal literature to exclude subjects that do not have long periods without outliers. However, in my opinion, it is too conservative for this cohort to do this based on metrics that were run prior to denoising. Too much data is wasted. Furthermore, as you have seen yourself, if you examine post-denoised data, there is very little residual artefact to cause outliers.

You are correct that I do not impose a minimum duration criterion. I find that only low number of frames (~6.5%) are flagged as outliers prior to denoising. However, the movements tend to be short and transient and spread out in time, so that if we impose minimum durations, we end up excluding many subjects, even though most frames are fine.

If you were still concerned about residual movement contamination after denoising, you could still apply some form of frame censoring to the cleaned data.

Cheers, Sean

seanfitz · April 24, 2020, 2:11pm

Hi Huiqing

The fMRI intermediates are now available on the dHCP xnat:
https://data.developingconnectome.org/app/template/Login.vm

Just log in, select Data Release 2 2019, and then select Download Data Set (.tar.gz). This will download a small .tar.gz that will contain 5 torrents, one of which is the fMRI intermediates dhcp_2nd_rel_fmri_intermediate.torrent.

Cheers, Sean

Huiqing_Hu · April 28, 2020, 8:53pm

Hi Sean,

Thank you very much for your help! I will go with the intermediate fMRI data.

Best,
Huiqing