CPU time limit error for fMRIPrep

Summary of what happened:

Hello,

Thanks for your time! I am having a software issue when running fMRIPrep on my data, described below:
We submitted an fMRIPrep workflow using an HPC job submission command and fMRIPrep workflow execution command in a Singularity container. While the job was expected to run for an extended period of time (up to 36 hours), the process terminated prematurely (~2 hours) with the following error: RuntimeError: Robust spatial normalization failed after 3 retries. In logs in the working directory (specified below) we get the issue :CPU time limit exceeded
We’ve attempted to troubleshoot by modifying HPC resource allocation and the workflow execution parameters but haven’t been able to solve the issue so far. Any insight would be helpful.

Thanks,
Alisha Kodibagkar

Command used (and if a helper script was used, a link to the helper script or the command generated):

HPC job submission command (using Slurm workload manager):

sbatch -o /path/to/logs/%j_$SLURM_JOB_NAME.log --time=36:00:00 --cpus-per-task=16 --mem-per-cpu=4G ${scripts}/fmriprep_wrapper.sh ${sub}

fMRIPrep workflow execution command in wrapper script:

singularity run --cleanenv \
 -B /path/to/project \
 fmriprep-23.1.3.simg \
/path/to/data/ \
${outdir} \
participant \
--participant_label ${sub} \
--skip-bids-validation \
--fs-license-file /path/to/license.txt \
--fs-no-reconall \
--use-syn-sdc warn \
--output-spaces MNI152NLin2009cAsym:res-2 \
--debug all \
--nthreads 16 \
--omp-nthreads 16 \
--n-cpus 8

Version:

fMRIPrep-23.1.3

Environment (Docker, Singularity / Apptainer, custom installation): Singularity

The workflow is running within a Singularity container environment

Data formatted according to a validatable standard? Please provide the output of the validator:

PASTE VALIDATOR OUTPUT HERE

Relevant log outputs (up to 20 lines):

Retry #3 failed.
Log of failed retry saved (/path/to/work/fmriprep_23_1_wf/single_subject_0011_wf/anat_preproc_wf/anat_norm_wf/_template_MNI152NLin2009cAsym/registration/merged.nipype-0003).
250129-13:23:03,859 nipype.workflow WARNING:
	 Storing result file without outputs

cat merged.nipype-0003:
CPU time limit exceeded

Traceback:
	Traceback (most recent call last):
	  File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 397, in run
	    runtime = self._run_interface(runtime)
	  File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/niworkflows/interfaces/norm.py", line 246, in _run_interface
	    raise RuntimeError(
	RuntimeError: Robust spatial normalization failed after 3 retries.

Screenshots / relevant information:

We attempted modifying the following parameters in the HPC job submission command:

--cpus-per-task
--mem-per-cpu
--mem

We have also attempted modifying the following parameters in the workflow execution command:

--n-cpus 
--nthreads  
--omp-nthreads 

The error appears after ~2h into execution although we allocate multiple days’ time in the job submission


Hi @Alisha_Kodibagkar, and welcome to neurostars!

Can you try adding --propagate=NONE to your sbatch arguments?

Also, it is recommended to update to most recent version, not skipping bids validation (the BIDS report missing in your original post may help), and not using --fs-no-reconall flag.

I think you included the n-cpus 8 flag by mistake? Since you already defined nthreads above.

Additionally, you should set the --mem flag to something slightly below what you are giving the job via SBATCH parameters.

Best,
Steven