Fmriprep takes longer than expected per subject on local workstation

Hi - I’m running fmriprep via docker on a local workstation (Ubuntu 18.04.3) with 12 cores/24 threads and 32 GB of RAM, with aroma, but no slicetime correction or reconall. My BOLD files are ~390 volumes each (4 runs, mb x7, TR=1s). I have not limited my memory usage with any flags. Each subject’s preprocessing takes about 6.5-7 hours. Is this along the lines of what one should expect with these specs? I’ve checked some benchmarking on a previous thread and the estimates seem shorter than what I’m experiencing. If it matters at all, my vm/overcommit_memory is set to 0, and the overcommit_ratio is 50.

Thanks!
Ahmet

Can you share your full command?

Sure. I’m running version 1.5.3.rc2:

for subj in 14006 14014 14018 14023 14027 14048 14055 14072 14077 14091; do docker run --rm -it -v /media/ahmetc/Seagate4TB/SSRTE/Nifti:/data -v /media/ahmetc/Seagate4TB/SSRTE/Nifti/derivatives:/out -v /media/ahmetc/Seagate4TB/SSRTE/Work:/work -v /media/ahmetc/Seagate4TB/SSRTE:/maindir poldracklab/fmriprep:latest /data /out participant --participant-label S${subj} -w /work --fs-no-reconall --use-aroma --ignore slicetiming --fs-license-file /maindir/license.txt -v; done

This whole thing runs without errors, in case it matters.

Just wanted to bump this to add that my swap space is ~2GB. I thought that could potentially be an issue (I’ve seen some Linux users dedicate double their RAM when partitioning their drives to have a swap), but even while fmriprep runs through melodic, my swap usage is nominal (< 16mb). I also don’t think I’ve seen more than 8 processors in use at a time, although I might be wrong about that!

Ahmet

It may be the AROMA that’s taking so long. If you try it without AROMA, do you see reasonable durations?

If i remember correctly, running without aroma shaves off a good 3 hours. Would running aroma with any of the cpu/thread flags help in my case (eg. allocate more than 8 processors per task)? I imagine that these flags help restrict resource consumption, not expand it.

Thanks!
Ahmet

The default threading parameters are generally pretty good, and they are:

  1. --n-cpus is the total number of threads fMRIPrep or any of its subprocesses can create. This defaults to the number of cores you have. You might want to decrease it if you want to run multiple fMRIPrep processes in parallel, or are on an interactive system and need to ensure at least one core is available for you to interact with your computer.
  2. --omp-nthreads is the maximum number of threads any single job can spawn. If n-cpus is 9 or fewer, this defaults to n-cpus - 1, which helps small jobs run and empty the queue while large jobs run. It maxes out at 8, after which more cores generally don’t help.

It’s possible that bumping up --omp-nthreads could help the speed of AROMA… I haven’t ever looked at how well it utilizes concurrency. But it may come at the cost of other jobs not using your cores as efficiently. It’s hard to say.

One thing you could try is --omp-nthreads 6, which would allow two large jobs to run simultaneously, rather than having them each use 8 cores, and one have to wait for the other. But again, the tradeoffs may or may not work out.

Thanks! That’s very informative. I’ll try rerunning with the nthreads flag set to 6 and report back with findings. Maybe running two fmriprep sessions in parallel with ncpus each set to 12 (as opposed to one session hoarding all 24) will help, too.

Ahmet

So, looks like --omp-nthreads 6 decreased to runtime from 7ish hours to 5. I then ran two fmriprep processes in parallel with -n-cpus 12 and that didn’t help much (~11 hours altogether) before it errored out towards the end (I assume because the two processes were using the same template files and there was some disagreement at some point). In any case, even a 2 hour difference in processing one at a time is great, so thanks!

Ahmet

1 Like