Question about automatic parallel processing in fmriprep for multiple subjects

Dear Neurostars Community,

I am perplexed about whether fmriprep automatically processes multiple subjects in parallel, or if it processes them sequentially by default. I couldn’t find explicit information in the official documentation.

Based on what I observed in the documentation, there is a brief mention of how to run subjects in parallel, which led me to guess that fmriprep might process subjects sequentially and requires manual intervention to parallelize at the subject level. Is this understanding correct?

For context, I am using fmriprep docker on my workstation (32 cores, 64 threads; 256GB RAM) without cluster environments like slurm. While I have come across discussions about parallel processing on Neurostars, most seem tailored for datasets much larger than mine. I am working with structural and task data from 50 subjects, and some of the suggested settings don’t seem applicable to my scenario. I currently run all subjects with one simple fmriprep-docker command, but it seems quite slow.

fmriprep-docker "${bids_dir}" "${prep_dir}" participant --fs-license-file "${license_path}" --fs-no-reconall -w "${working_dir}" --stop-on-first-crash | tee "${log_path}"

What would be the best practice for running multiple subjects in my case? Should I start multiple Docker instances, each running a single subject, or should I run a few instances processing several subjects at a time? Also, what would be appropriate settings for parameters like omp-nthreads and nprocs for individual subject processing?

I appreciate any guidance or experiences you could share on this matter.
Thank you!

Kun

Hi @const,

Fmriprep should parallelize across subjects by default.

That’s a lot of subjects for so few threads. Maybe consider running in batches of 5? You can also consider brainlife.io for cloud based parallel processing.

This flag is not recommended.

This has already been profiled here: FAQ - Frequently Asked Questions — fmriprep version documentation. But of course it also depends on how much data you have per subject.

Best,
Steven

Hi Steven,

Thanks for your valuable comments. I have a few more quick questions:

If fmriprep parallelizes by default, why is manual parallelization often discussed? Also, I’ve observed low efficiency (low resource utilization) in its default parallel processing – could you shed some light on this?

Regarding the --fs-no-reconall flag, several discussions suggest using it when surface preprocessing is not required, for time-saving. Why is it generally not recommended?

Thanks for your recommendation. I have reviewed the document and several related discussions in the forum. However, I’m still somewhat puzzled about the nprocs and omp-nthreads settings.
According to the usage note, omp-nthreads represents the maximum number of threads per process, while nprocs is for the maximum number of threads across all processes. I’m a bit unclear about what exactly constitutes a ‘process’ in this context. Is it each docker instance, each subject, or something else? This seems a bit confusing, especially considering the CPU/OS-related terms “process/thread”.

Hi @const,

Even if not using surface (eg GIFTI or CIFTI) outputs, surface based workflows for spatial normalization tend to outperform volumetric workflows. Surface based workflows are disabled with —fs-no-reconall. The developers even considered dropping it as an argument: Drop FSL/``--fs-no-reconall`` workflow path · Issue #2794 · nipreps/fmriprep · GitHub

A process is any nipype processing node. These can range from trivial (eg thresholding an image) to complex (a step of recon-all). I’ll typically use 12 or 16 threads total with a max of 8 per process.

Maybe because many users here are working with high performance computing clusters, where it is convenient to schedule large job arrays (each for one subject). It also makes error logging and tracking easier if jobs are kept separate.

In regards to the resource utilization, it would help to describe in more detail what you are experiencing.

Best,
Steven

Hi Steven,

Thanks for your detailed reply, it’s really helpful!

As regards resource utilization, I simply use the following command to run fmriprep and find that only a few CPU threads (normally less than four) are used.

fmriprep-docker "${bids_dir}" "${prep_dir}" participant --fs-license-file "${license_path}" --fs-no-reconall -w "${working_dir}" --stop-on-first-crash | tee "${log_path}"

Currently, I divide the subjects into multiple batches and process each batch by starting 5 fmriprep instances, with each instance handling one subject. It’s been working great in terms of CPU utilization.

Thanks for your help again!

Best,
Kun

Hi @const,

No problem, happy to help!

Did you change your Docker settings to allow for more resource usage, like in this screen?


Best,
Steven

I use the default linux docker setup (the container will use all resources by default).
I have tried the previous command on two different Linux servers and got a similar low CPU utilization