I like to report some strange findings on memory consumption and runtime for fmriprep.
Background:
I am tuning the runtime parameters for our server. We use SGE job scheduling with qsub commands to book CPU and memory. Fmriprep is run as a singularity container (converted from the docker container tag 1.5.4). Our subjects have all longitudinal data, with about 3 annual sessions with functional data, but a total of 7 sessions with no functional data the first 3-4 years (so I guess anatomical pipeline is contructing a template from 7+ T1w files form all sessions).
Findings:
Running a subject with such characteristics requires more than 8Gb suggested in the documentation. The RAM consumption on some test subjects have been 8.5-13.9Gb.
We are allotting 2 slots (i.e., 2 CPU cores) but since the server has multithreading enabled I pass --nthreads 4
to fmriprep (we know this speeds up ANTs registration). When it comes to --omp-threads
, I remember it was advised to put NTHREADS - 1, but I tried once with OMPTHREADS=NTHREADS and for some strange reason the processing time was higher. So I ran a test on the same subject, at the same time, with all the parameters identical except --omp-threads
. One test was done with omp=3, the other omp=4. I expected omp=4 to be faster, or at least the same. On the contrary, omp=4 was about 50 minutes slower (out of ~17hrs processing). The memory consumption was lower with omp=4. At this point I am a bit confused of these discrepancies in runtime and memory consumption, but though to report it here. Looks like I need to use omp=3 after all. Here are the details of the test.
singularity run --cleanenv $SINGULARITY_IMAGE \
$BIDS $OUTFOLDER \
participant \
--participant-label $SUBJECTID \
--longitudinal \
--nthreads $NTHREADS \
--omp-nthreads $OMPTHREADS \
--write-graph \
--force-syn \
--fs-license-file $FS_LICENSE \
--work-dir $WORKDIR \
--cifti-output \
--resource-monitor \
--notrack \
--no-submm-recon \
--use-aroma \
--output-spaces \
MNI152NLin2009cAsym \
MNI152NLin2009cAsym:res-2 \
OASIS30ANTs \
fsaverage \
fsaverage5 \
T1w \
func
Runtime omp=3:
User = dorian
Queue = all.q@mri
Host = mri
Start Time = 12/29/2019 14:53:42
End Time = 12/30/2019 07:58:16
User Time = 02:42:52
System Time = 00:00:39
Wallclock Time = 17:04:34
CPU = 40:03:21
Max vmem = 11.574G
Exit Status = 0
Runtime omp=4
User = dorian
Queue = all.q@mri
Host = mri
Start Time = 12/29/2019 14:53:57
End Time = 12/30/2019 08:52:46
User Time = 01:19:25
System Time = 00:00:43
Wallclock Time = 17:58:49
CPU = 42:29:57
Max vmem = 10.324G
Exit Status = 0
By the way, in an attempt to keep memory consumption low I tried --low-mem
but it made little difference, there were still 10+Gb needed for a couple of test subjects. The only thing I have not tried is --mem-mb
.