fMRIPREP failing on HPC via singularity

I will add --fs-subjects-dir /output/freesurfer to my fmriprep commands and create a new project for reruning the process.

I’ll let you know as soon as it’s done.

Best,
Yunhong

Hi Yunhong,
I am not sure if this is helpful, but here is the code I used in my slurm job file.

export SINGULARITYENV_TEMPLATEFLOW_HOME=/templates
export TEMPLATEFLOW_HOME=/home/gshearre/.templateflow
export FS_LICENSE=/project/brainmri/software/freesurfer/license.txt


singularity run --cleanenv -B /project/brainmri/data/bro/BIDS:/data -B /home/gshearre/.templateflow:/templates -B /gscratch/gshearre:/work -B /project/brainmri/software/freesurfer/license.txt:/license.txt:ro /project/brainmri/software/fmriprep-latest.simg /data /data/derivatives/fMRIPrep-23.1.2 participant --participant-label $id -w /work/ -vvv  --omp-nthreads 8 --nthreads 12 --mem_mb 30000 --fs-license-file /license.txt --skip_bids_validation

A couple of tips and things I had to do to get it to work:

  1. Make sure my work dir was directed to my scratch (/gscratch/gshearre:/work) I had a couple failures from trying to write to a scratch that I thought existed but didn’t
  2. This was a weird one, but the .templateflow only worked in my home dir and not when it was in my project dir
    2.1. I had to stop using this short hand
${TEMPLATEFLOW_HOST_HOME}:${SINGULARITYENV_TEMPLATEFLOW_HOME}

and actually write out the paths same with

${SINGULARITYENV_FS_LICENSE}
  1. I had to delete freesurfer intermediates multiple times

Hope this helps!

Grace

Hi @Grace_Shearrer thank you for your help.

Could you tell me your slurm setting before process. This is my setting:

#SBATCH --job-name=fmriprep_test
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4gb
# Outputs ----------------------------------
#SBATCH --output=logs/%x-%A_%a.out
#SBATCH --error=logs/%x-%A_%a.err
# Email notifications -------------------------------
#SBATCH --mail-type=ALL

Cheers,
Yunhong

The issue is very likely underestimating the memory usage. If you tell fMRIPrep that it has 24GB of RAM while telling SLURM to give it 32GB, it will be more conservative.

I set performence options of fmriprep like this:

    --mem-mb 32000 \
    --n-cpus 8 \

Should I allocate more memory to slurm and fmriprep?
For example:

#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=6gb
--mem-mb 48000 \
--n-cpus 8 \

SLURM is a process supervisor and will enforce constraints by killing processes or declining to allocate memory (which will generally cause a job to be killed). fMRIPrep uses nipype, which is a workflow manager that schedules tasks. We tag the tasks with expected memory usage, which instructs the scheduler not to start tasks if it expects to exceed the memory limit, but this is only as good as the estimation.

Generally rough estimates are fine, as operating systems have some slack for processes that request but don’t use memory, which is extremely common. SLURM, on the other hand, needs to make sure that you’re not using more memory than you requested, as you could otherwise interfere with other jobs allocated on the same node.

Thus the solution in cases like yours is to tell SLURM a higher number than you tell fMRIPrep. You can either keep --mem-per-cpu=4gb and tell fMRIPrep --mem-mb 24000 or you can do --mem-per-cpu=5gb and keep --mem-mb 32000.

Thank you very much for your help @effigies

To prove that I understand what you are saying, do I need to set the total amount of memory consumed by each cpu in slurm to be higher than fmriprep?

Is this freesurfer error reported related to my previous slurm memory settings?

According to @Steven advice, I added --fs-subjects-dir /output/freesurfer, and it worked. But I do not understand why adding this command is useful.

I checked the fmriprep documentation for 23.2.1 and it tells me ‘You can use the --fs-subjects-dir flag to specify a different location to save FreeSurfer If precomputed results are found, they will be reused’. Does this mean that if there is no freesurfer output under the specified path, fmriprep calls freesurfer for the first recon?

Cheers,
Yunhong

Hi @Yunhong_Wang

Your SLURM memory allowance, whether defined per CPU as a single total, should be higher than the memory allowance set in the fmriprep command.

By having you change your freesufer output directory, it forces fmriprep to recompute freesurfer outputs.

Best,
Steven

Thank you for all of your help @Steven @effigies @Grace_Shearrer