Is my fmriprep run time typical for my data?

heffjos · July 29, 2020, 4:36pm

Hi,

When I run fmriprep for a single subject, it requires 100+ hours to complete and I was wondering if this is normal. Each subject folder contains 3 sessions. Each session has 8 functional images. Each functional image has 251 volumes. Each session also has 3 pairs of spin echo EPI images with opposite phase encoding directions (“pepolar”). I assign the spin echo pairs to use in susceptibility distortion correction that are closest in time acquisition to the functional images. This is done by adding the “IntendedFor” field in the spin echo json files.

I am running fmriprep v20.1.0 on an HPC and split the anatomy and functional preprocessing in two commands. The anatomy command finishes in reasonable time (less than 12 hours). The functional command requires 100+ hours to complete. I am requesting 100 GB of memory and 10 cores for the functional preprocessing. Here is the anatomy preprocessing command:

Freesurfer=/cm/shared/apps/freesurfer/6.0.0

unset PYTHONPATH
export SINGULARITYENV_TEMPLATEFLOW_HOME=/home/fmriprep/.cache/templateflow

singularity run --cleanenv \
    -B ${HOME}/.cache:/home/fmriprep/.cache \
    -B ${Freesurfer}:/work_dir \
    ${container} \
    ${bids_dir} ${derivatives_dir} participant \
    --participant_label ${participant} \
    --nprocs ${nthreads} \
    --omp-nthreads ${omp_nthreads} \
    --anat-only \
    --ignore slicetiming \
    --output-spaces MNI152NLin2009cAsym T1w fsLR:den-32k \
    --medial-surface-nan \
    --random-seed 1234567890 \
    --use-aroma \
    --fs-license-file /work_dir/license.txt \
    --no-submm-recon \
    --cifti-output 91k \
    --work-dir ${work_dir} \
    --stop-on-first-crash \
    --notrack

Here is my functional preprocessing command (it should be the same as anatomy command without --anat-only):

Freesurfer=/cm/shared/apps/freesurfer/6.0.0

unset PYTHONPATH
export SINGULARITYENV_TEMPLATEFLOW_HOME=/home/fmriprep/.cache/templateflow

singularity run --cleanenv \
    -B ${HOME}/.cache:/home/fmriprep/.cache \
    -B ${Freesurfer}:/work_dir \
    ${container} \
    ${bids_dir} ${derivatives_dir} participant \
    --participant_label ${participant} \
    --nprocs ${nthreads} \
    --omp-nthreads ${omp_nthreads} \
    --ignore slicetiming \
    --output-spaces MNI152NLin2009cAsym T1w fsLR:den-32k \
    --medial-surface-nan \
    --random-seed 1234567890 \
    --use-aroma \
    --fs-license-file /work_dir/license.txt \
    --no-submm-recon \
    --cifti-output 91k \
    --work-dir ${work_dir} \
    --stop-on-first-crash \
    --notrack

I noted the number of spin echo images and how they are assigned, because if I view the batch job log while the functional command is running, it is typically performing “qwarp”. Here are the last lines from a log file for a functional preprocessing job I am currently running:

     [Node] Running "op2ref_reg" ("nipype.interfaces.ants.registration.Registration"), a CommandLine Interface with command:
antsRegistration --collapse-output-transforms 1 --dimensionality 3 --float 1 --initialize-transforms-per-stage 0 --interpolation LanczosWindowedSinc --output [ transform, transform_Warped.nii.gz ] --transform Translation[ 0.05 ] --metric Mattes[ /rcc/stor1/depts/neurology/users/jheffernan/soe/work_dir/fmriprep_wf/single_subject_soe102_wf/func_preproc_ses_1_task_soe_dir_AP_run_8_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/apply_mask/uni_xform_masked.nii.gz, /rcc/stor1/depts/neurology/users/jheffernan/soe/work_dir/fmriprep_wf/single_subject_soe102_wf/func_preproc_ses_1_task_soe_dir_AP_run_8_wf/sdc_estimate_wf/pepolar_unwarp_wf/prepare_epi_wf/ref_op_wf/apply_mask/uni_xform_masked.nii.gz, 1, 64, Random, 0.5 ] --convergence [ 500, 1e-07, 200 ] --smoothing-sigmas 8.0mm --shrink-factors 2 --use-estimate-learning-rate-once 1 --use-histogram-matching 1 --transform Rigid[ 0.01 ] --metric Mattes[ /rcc/stor1/depts/neurology/users/jheffernan/soe/work_dir/fmriprep_wf/single_subject_soe102_wf/func_preproc_ses_1_task_soe_dir_AP_run_8_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/apply_mask/uni_xform_masked.nii.gz, /rcc/stor1/depts/neurology/users/jheffernan/soe/work_dir/fmriprep_wf/single_subject_soe102_wf/func_preproc_ses_1_task_soe_dir_AP_run_8_wf/sdc_estimate_wf/pepolar_unwarp_wf/prepare_epi_wf/ref_op_wf/apply_mask/uni_xform_masked.nii.gz, 1, 64, Random, 0.5 ] --convergence [ 200, 1e-08, 100 ] --smoothing-sigmas 2.0mm --shrink-factors 1 --use-estimate-learning-rate-once 1 --use-histogram-matching 1 --winsorize-image-intensities [ 0.005, 0.998 ]  --write-composite-transform 1
200728-23:41:07,7 nipype.workflow INFO:
     [Node] Finished "fmriprep_wf.single_subject_soe102_wf.func_preproc_ses_1_task_soe_dir_AP_run_8_wf.sdc_estimate_wf.pepolar_unwarp_wf.prepare_epi_wf.op2ref_reg".
200728-23:41:07,816 nipype.workflow INFO:
     [Node] Setting-up "fmriprep_wf.single_subject_soe102_wf.func_preproc_ses_1_task_soe_dir_AP_run_8_wf.sdc_estimate_wf.pepolar_unwarp_wf.qwarp" in "/rcc/stor1/depts/neurology/users/jheffernan/soe/work_dir/fmriprep_wf/single_subject_soe102_wf/func_preproc_ses_1_task_soe_dir_AP_run_8_wf/sdc_estimate_wf/pepolar_unwarp_wf/qwarp".
200728-23:41:07,854 nipype.workflow INFO:
     [Node] Running "qwarp" ("nipype.interfaces.afni.preprocess.QwarpPlusMinus"), a CommandLine Interface with command:
3dQwarp -prefix Qwarp.nii.gz -plusminus -noXdis -noZdis -base /rcc/stor1/depts/neurology/users/jheffernan/soe/work_dir/fmriprep_wf/single_subject_soe102_wf/func_preproc_ses_1_task_soe_dir_AP_run_8_wf/sdc_estimate_wf/pepolar_unwarp_wf/qwarp/transform_Warped.nii.gz -blur -1.0 -1.0 -source /rcc/stor1/depts/neurology/users/jheffernan/soe/work_dir/fmriprep_wf/single_subject_soe102_wf/func_preproc_ses_1_task_soe_dir_AP_run_8_wf/sdc_estimate_wf/pepolar_unwarp_wf/qwarp/transform_Warped_c0000.nii.gz -minpatch 9 -nopadWARP -noweight -pblur 0.05 0.05

henrymj · August 5, 2020, 9:33pm

Bumping this post!

@mphagen and I have been trying to run fmriprep with v20.1.1 for some time, and repeatedly run into timeout errors on my HPC cluster. It’s normally 2 sessions per subject with 6 runs per session, and I’m running ICA-AROMA (which is the goal of this run of fmriprep).

This is my command:

PYTHONPATH=""  singularity run --cleanenv docker://poldracklab/fmriprep:20.1.1 /scratch/.../henrymj/BIDS_scans /scratch/.../henrymj/BIDS_scans/derivatives participant --participant_label sub-s003 --mem-mb 180000 --nthreads 20 --omp-nthreads 16 --ignore sbref -vv -w /scratch/.../henrymj/uh2_workdir  --use-aroma

Any guidance/understanding (maybe by @effigies) would be greatly appreciated.

Here are examples the warnings I’m getting (added in case there is any hint to what could be going on).

200728-21:55:33,272 nipype.interface WARNING:
         q-forms of reference and mask are substantially different

200728-22:22:56,515 nipype.interface WARNING:
	Changing /scratch/.../henrymj/BIDS_scans/derivatives/fmriprep/sub-s003/ses-2/func/sub-s003_ses-2_task-WATT3_run-1_space-MNI152NLin2009cAsym_desc-aseg_dseg.nii.gz dtype from float64 to int16

200728-22:22:56,515 nipype.interface WARNING:
	Changing /scratch/.../henrymj/BIDS_scans/derivatives/fmriprep/sub-s003/ses-2/func/sub-s003_ses-2_task-WATT3_run-1_space-MNI152NLin2009cAsym_desc-brain_mask.nii.gz dtype from float64 to uint8

200728-21:55:52,21 nipype.interface INFO:
         stderr 2020-07-28T21:55:52.021003:ESC[7m*+ WARNING:ESC[0m   If you are performing spatial transformations on an oblique dset, 
200728-21:55:52,21 nipype.interface INFO:
         stderr 2020-07-28T21:55:52.021003:  such as /scratch/.../henrymj/uh2_workdir/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_2_task_twoByTwo_run_1_wf/bold_bold_trans_wf/bold_reference_wf/gen_ref/slice.nii.gz,
200728-21:55:52,21 nipype.interface INFO:
         stderr 2020-07-28T21:55:52.021003:  or viewing/combining it with volumes of differing obliquity,
200728-21:55:52,21 nipype.interface INFO:
         stderr 2020-07-28T21:55:52.021003:  you should consider running: 
200728-21:55:52,21 nipype.interface INFO:
         stderr 2020-07-28T21:55:52.021003:     3dWarp -deoblique 
200728-21:55:52,21 nipype.interface INFO:
         stderr 2020-07-28T21:55:52.021003:  on this and  other oblique datasets in the same session.
200728-21:55:52,21 nipype.interface INFO:
         stderr 2020-07-28T21:55:52.021003: See 3dWarp -help for details.

Thanks in advance!

henrymj · August 7, 2020, 7:19pm

Hi @heffjos, we just got some useful feedback from @effigies, who peered through our log files:

Okay, looking at that, I’m seeing a couple things: You have --omp-nthreads at 16, which is higher than pretty much any tool can use effectively. It’s also over half of --nthreads so one large job will block any other large tasks. If you have 20 cores, I might set --omp-nthreads to 4, 5, or 8, so you can run 2, 4 or 5 large tasks can run simultaneously (and any remaining cores can be used by small tasks).

Following these notes, we cut our ---omp-nthreads down to 5 and reran a few test subjects. They went from hitting the timelimit to finishing in ~22hrs! I don’t see notes on your threads, but I hope this is helpful for you.

Best,
@henrymj

effigies · August 7, 2020, 7:33pm

Just to add on to that comment: ANTs (and some FreeSurfer) processes can see nearly-linear speedups on up to 8 cores, but it’s much worse after that point. You can either target 8 to maximize the speed of individual jobs or a factor of your available cores, to maximize the potential for multithreaded jobs to be using all cores.

heffjos · August 11, 2020, 4:35pm

I tested two different nprocs and omp-nthreads options on 2 other data sets:
--nprocs 8 --omp-nthreads 4 This finished in 73 hours which is an improvement.
--nprocs 16 --omp-nthreads 8 This finished in 22 hours which is great.