Summary of what happened:
Thanks for your time! I am having a software issue when running fMRIPrep on my data, described below:
We submitted an fMRIPrep workflow using an HPC job submission command and fMRIPrep workflow execution command in a Singularity container. While the job was expected to run for an extended period of time (up to 36 hours), the process terminated prematurely (~2 hours) with the following error: RuntimeError: Robust spatial normalization failed after 3 retries
. In logs in the working directory (specified below) we get the issue :CPU time limit exceeded
We’ve attempted to troubleshoot by modifying HPC resource allocation and the workflow execution parameters but haven’t been able to solve the issue so far. Any insight would be helpful.
Alisha Kodibagkar
Command used (and if a helper script was used, a link to the helper script or the command generated):
HPC job submission command (using Slurm workload manager):
sbatch -o /path/to/logs/%j_$SLURM_JOB_NAME.log --time=36:00:00 --cpus-per-task=16 --mem-per-cpu=4G ${scripts}/ ${sub}
fMRIPrep workflow execution command in wrapper script:
singularity run --cleanenv \
-B /path/to/project \
fmriprep-23.1.3.simg \
/path/to/data/ \
${outdir} \
participant \
--participant_label ${sub} \
--skip-bids-validation \
--fs-license-file /path/to/license.txt \
--fs-no-reconall \
--use-syn-sdc warn \
--output-spaces MNI152NLin2009cAsym:res-2 \
--debug all \
--nthreads 16 \
--omp-nthreads 16 \
--n-cpus 8
Environment (Docker, Singularity / Apptainer, custom installation): Singularity
The workflow is running within a Singularity container environment
Data formatted according to a validatable standard? Please provide the output of the validator:
Relevant log outputs (up to 20 lines):
Retry #3 failed.
Log of failed retry saved (/path/to/work/fmriprep_23_1_wf/single_subject_0011_wf/anat_preproc_wf/anat_norm_wf/_template_MNI152NLin2009cAsym/registration/merged.nipype-0003).
250129-13:23:03,859 nipype.workflow WARNING:
Storing result file without outputs
cat merged.nipype-0003:
CPU time limit exceeded
Traceback (most recent call last):
File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nipype/interfaces/base/", line 397, in run
runtime = self._run_interface(runtime)
File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/niworkflows/interfaces/", line 246, in _run_interface
raise RuntimeError(
RuntimeError: Robust spatial normalization failed after 3 retries.
Screenshots / relevant information:
We attempted modifying the following parameters in the HPC job submission command:
We have also attempted modifying the following parameters in the workflow execution command:
The error appears after ~2h into execution although we allocate multiple days’ time in the job submission