This is related to this post (How much RAM/CPUs is reasonable to run pipelines like fmriprep?), but slightly different so I thought I would make a new thread.
I’ve been running into an issue with fmriprep calling much more memory than it was assigned, killing my job when I try to run it on the cluster. On our cluster, each node has 8 cpus and 56 Gb of virtual memory.
I’ve tried including the --low-mem option (with working directory in /scratch), and I’ve even tried submitting a job for 8 cpus, with 56 Gb, then only using 6 cpus to limit the number of jobs run in parallel, however the job always runs over the memory limit. I get this warning at the start:
180309-23:12:29,806 workflow WARNING:
Some nodes exceed the total amount of memory available (39.06GB).
…and then this crash at the end:
=>> PBS: job killed: vmem 69282582528 exceeded limit 60129542144
Terminated
compute-0-54:<user>[92]
compute-0-54:<user>[92] /usr/local/miniconda/lib/python3.6/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 4 leaked semaphores to clean up at shutdown
len(cache))
One possibility is that the size of the job is just too big (it’s a 7T dataset, and the disk usage of the raw BIDS-formatted dataset is 8.4 GB.
There are some other quirks of the dataset (the anatomical image is a “spoofed” anatomical, from a series of EPI images, and therefore does not have a skull to strip—but the output does not indicate any problems with skull stripping).
I’m also getting a message about leaked semaphores, which I do not understand.
I hope I’m not missing anything obvious, so I thought I would post. One last resort would be to just run fmriprep separately for each task, but I’d rather avoid that option if possible since the code will be messier and take longer.
Really appreciate the help!
The call to fmriprep is:
$SINGULARITY run \
--bind "/scratch:/scratch" \
--bind "/homes/9/<user>:/license" \
$IMAGE \
/scratch/BIDS_raw /scratch/BIDS_preproc \
participant \
--participant-label $SUBJ \
--fs-no-reconall \
--fs-license-file '/license/license.txt' \
--omp-nthreads 6 \
--nthreads 6 \
--mem_mb 40000 \
--low-mem \
--use-aroma \
--write-graph
And the first bit of output is:
180309-23:10:32,319 workflow IMPORTANT:
Running fMRIPREP version 1.0.8:
* BIDS dataset path: /scratch/BIDS_raw.
* Participant list: ['001'].
* Run identifier: 20180309-231032_50c69fd2-cf5e-4ed2-9ca4-f61403555935.
180309-23:10:32,638 workflow IMPORTANT:
Creating bold processing workflow for "/scratch/BIDS_raw/sub-001/func/sub-001_task-rest_run-01_bold.nii.gz" (1.36 GB / 256 TRs). Memory resampled/largemem=5.43/8.90 GB.
180309-23:10:53,667 workflow IMPORTANT:
Slice-timing correction will be included.
180309-23:10:54,291 workflow WARNING:
No fieldmaps found or they were ignored, building base workflow for dataset /scratch/BIDS_raw/sub-001/func/sub-001_task-rest_run-01_bold.nii.gz.
180309-23:10:56,246 workflow IMPORTANT:
Creating bold processing workflow for "/scratch/BIDS_raw/sub-001/func/sub-001_task-rest_run-02_bold.nii.gz" (1.32 GB / 256 TRs). Memory resampled/largemem=5.27/8.64 GB.
180309-23:11:02,720 workflow IMPORTANT:
Slice-timing correction will be included.
180309-23:11:03,344 workflow WARNING:
No fieldmaps found or they were ignored, building base workflow for dataset /scratch/BIDS_raw/sub-001/func/sub-001_task-rest_run-02_bold.nii.gz.
180309-23:11:05,311 workflow IMPORTANT:
Creating bold processing workflow for "/scratch/BIDS_raw/sub-001/func/sub-001_task-rest_run-03_bold.nii.gz" (1.31 GB / 256 TRs). Memory resampled/largemem=5.24/8.60 GB.
180309-23:11:11,749 workflow IMPORTANT:
Slice-timing correction will be included.
180309-23:11:12,374 workflow WARNING:
No fieldmaps found or they were ignored, building base workflow for dataset /scratch/BIDS_raw/sub-001/func/sub-001_task-rest_run-03_bold.nii.gz.
180309-23:11:14,353 workflow IMPORTANT:
Creating bold processing workflow for "/scratch/BIDS_raw/sub-001/func/sub-001_task-rest_run-04_bold.nii.gz" (1.31 GB / 256 TRs). Memory resampled/largemem=5.23/8.57 GB.
180309-23:11:20,854 workflow IMPORTANT:
Slice-timing correction will be included.
180309-23:11:21,480 workflow WARNING:
No fieldmaps found or they were ignored, building base workflow for dataset /scratch/BIDS_raw/sub-001/func/sub-001_task-rest_run-04_bold.nii.gz.
180309-23:11:23,327 workflow IMPORTANT:
Creating bold processing workflow for "/scratch/BIDS_raw/sub-001/func/sub-001_task-stress_bold.nii.gz" (0.88 GB / 172 TRs). Memory resampled/largemem=3.54/5.06 GB.
180309-23:11:29,789 workflow IMPORTANT:
Slice-timing correction will be included.
180309-23:11:30,569 workflow WARNING:
No fieldmaps found or they were ignored, building base workflow for dataset /scratch/BIDS_raw/sub-001/func/sub-001_task-stress_bold.nii.gz.
180309-23:11:32,414 workflow IMPORTANT:
Creating bold processing workflow for "/scratch/BIDS_raw/sub-001/func/sub-001_task-wm_bold.nii.gz" (1.28 GB / 249 TRs). Memory resampled/largemem=5.13/8.32 GB.
180309-23:11:38,903 workflow IMPORTANT:
Slice-timing correction will be included.
180309-23:11:39,526 workflow WARNING:
No fieldmaps found or they were ignored, building base workflow for dataset /scratch/BIDS_raw/sub-001/func/sub-001_task-wm_bold.nii.gz.
180309-23:12:29,806 workflow WARNING:
Some nodes exceed the total amount of memory available (39.06GB).
And the final error (with preceding text):
180310-04:32:06,346 workflow INFO:
[Node] Finished "fmriprep_wf.single_subject_001_wf.func_preproc_task_stress_wf.bold_reg_wf.fsl_bbr_wf.fsl2itk_inv".
180310-04:32:09,292 workflow INFO:
[Node] Setting-up "fmriprep_wf.single_subject_001_wf.func_preproc_task_stress_wf.bold_confounds_wf.tcc_tfm" in "/scratch/work/fmriprep_wf/single_subject_001_wf/func_preproc_task_stress_wf/bold_confounds_wf/tcc_tfm".
180310-04:32:09,296 workflow INFO:
[Node] Running "tcc_tfm" ("niworkflows.interfaces.fixes.FixHeaderApplyTransforms"), a CommandLine Interface with command:
antsApplyTransforms --default-value 0 --float 1 --input /scratch/work/fmriprep_wf/single_subject_001_wf/func_preproc_task_stress_wf/bold_confounds_wf/csf_roi/highres001_BrainExtractionMask_eroded.nii.gz --interpolation NearestNeighbor --output highres001_BrainExtractionMask_eroded_trans.nii.gz --reference-image /scratch/work/fmriprep_wf/single_subject_001_wf/func_preproc_task_stress_wf/bold_bold_trans_wf/bold_reference_wf/enhance_and_skullstrip_bold_wf/combine_masks/ref_image_corrected_brain_mask_maths.nii.gz --transform /scratch/work/fmriprep_wf/single_subject_001_wf/func_preproc_task_stress_wf/bold_reg_wf/fsl_bbr_wf/fsl2itk_inv/affine.txt
180310-04:32:10,480 workflow INFO:
[Node] Finished "fmriprep_wf.single_subject_001_wf.func_preproc_task_stress_wf.bold_confounds_wf.tcc_tfm".
180310-04:32:12,129 workflow INFO:
[Node] Setting-up "fmriprep_wf.single_subject_001_wf.func_preproc_task_stress_wf.bold_confounds_wf.tcc_msk" in "/scratch/work/fmriprep_wf/single_subject_001_wf/func_preproc_task_stress_wf/bold_confounds_wf/tcc_msk".
180310-04:32:12,132 workflow INFO:
[Node] Running "tcc_msk" ("niworkflows.nipype.interfaces.utility.wrappers.Function")
180310-04:32:12,365 workflow INFO:
[Node] Finished "fmriprep_wf.single_subject_001_wf.func_preproc_task_stress_wf.bold_confounds_wf.tcc_msk".
180310-04:32:14,963 workflow INFO:
[Node] Setting-up "fmriprep_wf.single_subject_001_wf.func_preproc_task_stress_wf.bold_confounds_wf.tcompcor" in "/scratch/work/fmriprep_wf/single_subject_001_wf/func_preproc_task_stress_wf/bold_confounds_wf/tcompcor".
180310-04:32:14,966 workflow INFO:
[Node] Running "tcompcor" ("fmriprep.interfaces.patches.RobustTCompCor")
180310-04:32:25,88 workflow INFO:
[Node] Finished "fmriprep_wf.single_subject_001_wf.func_preproc_task_wm_wf.bold_reg_wf.mask_t1w_tfm".
Crashes at:
=>> PBS: job killed: vmem 69282582528 exceeded limit 60129542144
Terminated
compute-0-54:<user>[92]
compute-0-54:<user>[92] /usr/local/miniconda/lib/python3.6/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 4 leaked semaphores to clean up at shutdown
len(cache))
I can link the full output if it helps, but it’s too large for the initial post.