Fmriprep nipype.workflow stops writing to working directory

smburns47 · February 8, 2025, 5:44pm

Summary of what happened:

Trying to run fmriprep on a dataset on my institution’s HPC cluster. It seems to crash at the brain extraction stage where it doesn’t write a T1w_template_maths_corrected.nii.gz and the remaining processes thus can’t run.

Command used (and if a helper script was used, a link to the helper script or the command generated):

Slurm file

#!/bin/bash

# Run from BIDS code/preprocessing directory: sbatch slurm_fmriprep.sh

# Name of job?
#SBATCH --job-name=fmriprep

# Set partition
#SBATCH --partition=amd

# How long is job?
#SBATCH -t 14:00:00

# Where to output log files? The log file will be in the format of the job ID_array number
# make sure this logs directory exists!! otherwise the script won't run
#SBATCH --output='../../data/bids/derivatives/fmriprep/logs/fmriprep-%A_%a.log'

# How much memory to allocate (in MB)?
#SBATCH --cpus-per-task=8 --mem-per-cpu=20000

# Update with your email 
#SBATCH --mail-user=YOURID@pomona.edu
#SBATCH --mail-type=BEGIN,END,FAIL

# Remove modules because Singularity shouldn't need them
echo "Purging modules"
module purge

# Print job submission info
echo "Slurm job ID: " $SLURM_JOB_ID
date

# Set subject ID based on array index
subjs=($@) # You can input a list of subjects by running
# submit_job_array.sh sub-01 sub-02 ....... or just let 
# this script collect all subjects in the BIDS directory

bids=/bigdata/lab/sburnslab/MRI_data/NIMH-volunteer/data/bids

if [[ $# -eq 0 ]]; then
    # first go to data directory, grab all subjects,
    # and assign to an array
    pushd $bids
    subjs=($(ls sub-* -d))
    popd
fi

# take the length of the array
# this will be useful for indexing later
len=$(expr ${#subjs[@]} - 1) # len - 1

echo Spawning ${#subjs[@]} sub-jobs.

sbatch --array=0-$len ./run_fmriprep.sh ${subjs[@]}

Calling fmriprep

#!/bin/bash

subjs=($@)
subject=${subjs[${SLURM_ARRAY_TASK_ID}]}

#change the capitalized parts of these filepaths to match your project
project_dir=/bigdata/lab/sburnslab/MRI_data/NIMH-volunteer
scratch_dir=/scratch/$SLURM_JOB_USER/$SLURM_JOB_ID

data_dir=$project_dir/data
bids_dir=$data_dir/bids
derivatives_dir=$bids_dir/derivatives
scripts_dir=$project_dir/code/preprocessing

module load fmriprep

#change your fmriprep options below
singularity run --cleanenv --bind $project_dir --bind $scratch_dir \
    /opt/linux/rocky/8/software/fmriprep/23.0.0/bin/fmriprep \
    $bids_dir $derivatives_dir participant \
    --participant-label $subject \
    --fs-no-reconall \
    --fs-license-file $project_dir/code/preprocessing/license.txt \
    --anat-only \
    --verbose \
    --no-submm-recon \
    --nthreads 8 --omp-nthreads 8 \
    --output-spaces T1w fsaverage:den-41k \
                    MNI152NLin2009cAsym:res-native \
    --write-graph --work-dir $scratch_dir \
    --ignore fieldmaps \
    --use-syn-sdc warn \
    --output-layout legacy \

Version:

23.0.0

Environment (Docker, Singularity / Apptainer, custom installation):

Singularity on an institutional HPC cluster

Data formatted according to a validatable standard? Please provide the output of the validator:

Online bids validator reported no errors.

Relevant log outputs (up to 20 lines):

log

250208-09:24:16,993 nipype.workflow INFO:
	 [Node] Setting-up "_inu_n40" in "/scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/inu_n4/mapflow/_inu_n40".
250208-09:24:16,995 nipype.workflow INFO:
	 [Node] Executing "_inu_n40" <nipype.interfaces.ants.segmentation.N4BiasFieldCorrection>
250208-09:24:18,933 nipype.workflow INFO:
	 [MultiProc] Running 1 tasks, and 2 jobs ready. Free memory (GB): 452.66/452.86, Free processors: 0/8.
                     Currently running:
                       * fmriprep_23_0_wf.single_subject_ON99620_wf.anat_preproc_wf.brain_extraction_wf.inu_n4
250208-09:25:38,448 nipype.workflow INFO:
	 [Node] Finished "_inu_n40", elapsed time 81.333546s.
250208-09:25:38,450 nipype.workflow WARNING:
	 Storing result file without outputs
250208-09:25:38,452 nipype.workflow WARNING:
	 [Node] Error on "_inu_n40" (/scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/inu_n4/mapflow/_inu_n40)
250208-09:25:38,455 nipype.workflow WARNING:
	 Storing result file without outputs
250208-09:25:38,456 nipype.workflow WARNING:
	 [Node] Error on "fmriprep_23_0_wf.single_subject_ON99620_wf.anat_preproc_wf.brain_extraction_wf.inu_n4" (/scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/inu_n4)
250208-09:25:40,315 nipype.workflow ERROR:
	 Node inu_n4 failed to run on host a001.hpc.pomona.edu.
250208-09:25:40,323 nipype.workflow ERROR:
	 Saving crash info to /bigdata/lab/sburnslab/MRI_data/NIMH-volunteer/data/bids/derivatives/fmriprep/sub-ON99620/log/20250208-092004_7fc4b1f2-9c87-4b85-bbc6-ad86d24dc924/crash-20250208-092540-smb02011-inu_n4-f4edbc88-606e-47bc-bc27-cee3846fc193.txt

crash report

Node: fmriprep_23_0_wf.single_subject_ON99620_wf.anat_preproc_wf.brain_extraction_wf.inu_n4
Working directory: /scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/inu_n4

Node inputs:

args = <undefined>
bias_image = <undefined>
bspline_fitting_distance = 200.0
bspline_order = <undefined>
convergence_threshold = 1e-07
copy_header = True
dimension = 3
environ = {'NSLOTS': '8'}
histogram_sharpening = <undefined>
input_image = ['/scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/truncate_images/mapflow/_truncate_images0/sub-ON99620_ses-01_acq-MPRAGE_T1w_template_maths.nii.gz']
mask_image = <undefined>
n_iterations = [50, 50, 50, 50]
num_threads = 8
output_image = <undefined>
rescale_intensities = False
save_bias = False
shrink_factor = 4
weight_image = <undefined>

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 1380, in _run_interface
    result = self._collate_results(
  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 1293, in _collate_results
    raise NodeExecutionError(
nipype.pipeline.engine.nodes.NodeExecutionError: Subnodes of node: inu_n4 failed:
Subnode 0 failed
Error: Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 454, in aggregate_outputs
    setattr(outputs, key, val)
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
    value = super(File, self).validate(objekt, name, value, return_pathlike=True)
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
    self.error(objekt, name, str(value))
  File "/opt/conda/lib/python3.9/site-packages/traits/base_trait_handler.py", line 74, in error
    raise TraitError(
traits.trait_errors.TraitError: The 'output_image' trait of a N4BiasFieldCorrectionOutputSpec instance must be a pathlike object or string representing an existing file, but a value of '/scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/inu_n4/mapflow/_inu_n40/sub-ON99620_ses-01_acq-MPRAGE_T1w_template_maths_corrected.nii.gz' <class 'str'> was specified.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 399, in run
    runtime = self._post_run_hook(runtime)
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/mixins/fixheader.py", line 127, in _post_run_hook
    outputs = self.aggregate_outputs(runtime=runtime).get_traitsfree()
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 461, in aggregate_outputs
    raise FileNotFoundError(msg)
FileNotFoundError: No such file or directory '/scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/inu_n4/mapflow/_inu_n40/sub-ON99620_ses-01_acq-MPRAGE_T1w_template_maths_corrected.nii.gz' for output 'output_image' of a N4BiasFieldCorrection interface

Traceback (most recent call last):

  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/utils.py", line 94, in nodelist_runner
    result = node.run(updatehash=updatehash)

  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)

  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)

  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)

nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node _inu_n40.

Cmdline:
	N4BiasFieldCorrection --bspline-fitting [ 200 ] -d 3 --input-image /scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/truncate_images/mapflow/_truncate_images0/sub-ON99620_ses-01_acq-MPRAGE_T1w_template_maths.nii.gz --convergence [ 50x50x50x50, 1e-07 ] --output sub-ON99620_ses-01_acq-MPRAGE_T1w_template_maths_corrected.nii.gz --shrink-factor 4
Stdout:

Stderr:
	Killed
Traceback:
	Traceback (most recent call last):
	  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 454, in aggregate_outputs
	    setattr(outputs, key, val)
	  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
	    value = super(File, self).validate(objekt, name, value, return_pathlike=True)
	  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
	    self.error(objekt, name, str(value))
	  File "/opt/conda/lib/python3.9/site-packages/traits/base_trait_handler.py", line 74, in error
	    raise TraitError(
	traits.trait_errors.TraitError: The 'output_image' trait of a N4BiasFieldCorrectionOutputSpec instance must be a pathlike object or string representing an existing file, but a value of '/scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/inu_n4/mapflow/_inu_n40/sub-ON99620_ses-01_acq-MPRAGE_T1w_template_maths_corrected.nii.gz' <class 'str'> was specified.

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 399, in run
	    runtime = self._post_run_hook(runtime)
	  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/mixins/fixheader.py", line 127, in _post_run_hook
	    outputs = self.aggregate_outputs(runtime=runtime).get_traitsfree()
	  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 461, in aggregate_outputs
	    raise FileNotFoundError(msg)
	FileNotFoundError: No such file or directory '/scratch/smb02011/16727/fmriprep_23_0_wf/single_subject_ON99620_wf/anat_preproc_wf/brain_extraction_wf/inu_n4/mapflow/_inu_n40/sub-ON99620_ses-01_acq-MPRAGE_T1w_template_maths_corrected.nii.gz' for output 'output_image' of a N4BiasFieldCorrection interface

Screenshots / relevant information:

The closest other posts on neurostars that I’ve been able to find suggest this is a memory or disk space issue, but I’m allocating what I think is a lot of memory to the job and disk space at the time of the crash says I’ve only used 5.6G out of 776G available. Is there something else I’m doing wrong? Note I have --fs-no-reconall flagged in this to try and isolate the issue but it also happened when trying to run FreeSurfer. I also tried to set a different working directory (scratch_dir=$project_dir/data/work) but that didn’t change the error.

Steven · February 8, 2025, 5:51pm

Hi @smburns47,

please try updating
I believe the memory allocation SBATCH configuration options should go in the fmriprep script, not the job submission script.
setting the memory mb flag in fmriprep to below the sbatch allocation also helps memory leak issues.
you should explicitly set a working directory in the fmriprep command in a place outside BIDS dir where you know you have space and write permissions
unrelated but —fs-no-reconall is not recommended.

Is the error for everyone or subject specific?

Edit: sorry I see you’ve tried 4 and 5

Best,
Steven

smburns47 · February 11, 2025, 1:11am

Thanks Steven for your quick answer as always!

We’ve updated the fmriprep version and I rewrote the ‘run_fmriprep.sh’ file like so:

#!/bin/bash

subjs=($@)
subject=${subjs[${SLURM_ARRAY_TASK_ID}]}

# Name of job?
#SBATCH --job-name=fmriprep


# Set partition
#SBATCH --partition=amd

# How long is job?
#SBATCH -t 14:00:00

# Where to output log files? The log file will be in the format of the job ID_array number
# make sure this logs directory exists!! otherwise the script won't run
#SBATCH --output='../../data/bids/derivatives/fmriprep/logs/fmriprep-%A_%a.log'

# How much memory to allocate (in MB)?
#SBATCH --cpus-per-task=8 --mem-per-cpu=20000

# Update with your email 
#SBATCH --mail-user=YOURID@pomona.edu
#SBATCH --mail-type=BEGIN,END,FAIL

# Print job submission info
echo "Slurm job ID: " $SLURM_JOB_ID
date

#change the capitalized parts of these filepaths to match your project
project_dir=/bigdata/lab/sburnslab/MRI_data/NIMH-volunteer
scratch_dir=/scratch/$SLURM_JOB_USER/$SLURM_JOB_ID
#scratch_dir=$project_dir/data/work

data_dir=$project_dir/data
bids_dir=$data_dir/bids
derivatives_dir=$bids_dir/derivatives
scripts_dir=$project_dir/code/preprocessing

module load fmriprep

#change your fmriprep options below
singularity run --cleanenv --bind $project_dir --bind $scratch_dir \
    /opt/linux/rocky/8/software/fmriprep/24.0.0/bin/fmriprep \
    $bids_dir $derivatives_dir participant \
    --participant-label $subject \
    --fs-license-file $project_dir/code/preprocessing/license.txt \
    --anat-only \
    --verbose \
    --no-submm-recon \
    --nthreads 8 --omp-nthreads 8 --mem-mb 18000 \
    --output-spaces T1w fsaverage:den-41k \
                    MNI152NLin2009cAsym:res-native \
    --write-graph --work-dir $scratch_dir \
    --ignore fieldmaps \
    --use-syn-sdc warn \
    --output-layout legacy \

Still getting an error, but it’s a different one now about the singularity container not successfully unmounting:

250210-14:38:08,591 nipype.workflow INFO:
	 [MultiProc] Running 4 tasks, and 0 jobs ready. Free memory (GB): 17.20/18.00, Free processors: 4/8.
                     Currently running:
                       * _n4_correct1
                       * _n4_correct0
                       * _n4_correct1
                       * _n4_correct0
INFO:    Cleanup error: while unmounting /bigdata/linux/rocky/8/software/apptainer/1.1.4/var/apptainer/mnt/session/underlay/usr directory: device or resource busy, while unmounting /bigdata/linux/rocky/8/software/apptainer/1.1.4/var/apptainer/mnt/session/underlay/opt directory: device or resource busy, while unmounting /bigdata/linux/rocky/8/software/apptainer/1.1.4/var/apptainer/mnt/session/final directory: device or resource busy
/var/spool/slurm/d/job16759/slurm_script: line 59: 1402213 Killed                  singularity run --cleanenv --bind $project_dir --bind $scratch_dir /opt/linux/rocky/8/software/fmriprep/24.0.0/bin/fmriprep $bids_dir $derivatives_dir participant --participant-label $subject --fs-license-file $project_dir/code/preprocessing/license.txt --anat-only --verbose --no-submm-recon --nthreads 8 --omp-nthreads 8 --mem-mb 18000 --output-spaces T1w fsaverage:den-41k MNI152NLin2009cAsym:res-native --write-graph --work-dir $scratch_dir --ignore fieldmaps --use-syn-sdc warn --output-layout legacy
slurmstepd: error: Detected 1 oom_kill event in StepId=16759.batch. Some of the step tasks have been OOM Killed.

bash is definitely not my strong suit, so is this due to our install not working correctly or something I’m still doing wrong with the sbatch call?

Steven · February 11, 2025, 1:27am

Hi @smburns47,

The most recent version is 24.1.1, as a heads up. Also, usually the first positional argument for singularity run is the container path, not the command/executable.

I wonder if you need to put the memory argument on an additional line. Also the SBATCH header needs to be defined at the top of the page (under the /bin/bash) before any other variables are defined.

#!/bin/bash

# Name of job
#SBATCH --job-name=fmriprep

# Set partition
#SBATCH --partition=amd

# How long is job?
#SBATCH -t 14:00:00

# Where to output log files? Ensure the logs directory exists, or the script won't run.
#SBATCH --output=../../data/bids/derivatives/fmriprep/logs/fmriprep-%A_%a.log

# How much memory to allocate (in MB)?
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=20000

# Update with your email 
#SBATCH --mail-user=YOURID@pomona.edu
#SBATCH --mail-type=BEGIN,END,FAIL

# Define subject list
subjs=($@)
subject=${subjs[${SLURM_ARRAY_TASK_ID}]}
...

Can you run seff $JOBID (replacing it with your actual job id) and return the outputs here?

Best,
Steven

smburns47 · February 12, 2025, 4:28am

So it looks like the memory error is fixed by moving the sbatch job name to the top as you suggested! But a different error happens instead now where it’s trying to access the data/bids/derivatives/* directory tree from within my home directory, rather than the $project_dir I specified in the .sh script, even though every step before that seems to find the right project directory.

	Tue Feb 11 13:38:52 PST 2025
	mri_nu_correct.mni done

	 talairach_avi --i orig_nu.mgz --xfm transforms/talairach.auto.xfm 

	talairach_avi log file is transforms/talairach_avi.log...
	mv -f /bigdata/lab/sburnslab/MRI_data/NIMH-volunteer/data/bids/derivatives/freesurfer/sub-ON99620/mri/talsrcimg_to_711-2C_as_mni_average_305_t4_vox2vox.txt /bigdata/lab/sburnslab/MRI_data/NIMH-volunteer/data/bids/derivatives/freesurfer/sub-ON99620/mri/transforms/talsrcimg_to_711-2C_as_mni_average_305_t4_vox2vox.txt
	Started at Tue Feb 11 13:38:52 PST 2025
	Ended   at Tue Feb 11 13:39:08 PST 2025
	talairach_avi done

	 cp transforms/talairach.auto.xfm transforms/talairach.xfm 

	lta_convert --src orig.mgz --trg /opt/freesurfer/average/mni305.cor.mgz --inxfm transforms/talairach.xfm --outlta transforms/talairach.xfm.lta --subject fsaverage --ltavox2vox
	7.3.2

	--src: orig.mgz src image (geometry).
	--trg: /opt/freesurfer/average/mni305.cor.mgz trg image (geometry).
	--inmni: transforms/talairach.xfm input MNI/XFM transform.
	--outlta: transforms/talairach.xfm.lta output LTA.
	--s: fsaverage subject name
	--ltavox2vox: output LTA as VOX_TO_VOX transform.
	 LTA read, type : 1
	 1.04837  -0.02048   0.04776  -1.87856;
	-0.03073   1.12968   0.29154  -35.92971;
	-0.06782  -0.22166   1.10654   3.80226;
	 0.00000   0.00000   0.00000   1.00000;
	setting subject to fsaverage
	Writing  LTA to file transforms/talairach.xfm.lta...
	lta_convert successful.
	~/data/bids/derivatives/freesurfer/sub-ON99620/mri/transforms ~/data/bids/derivatives/freesurfer/sub-ON99620/mri 
Stderr:
	/rhome/smb02011/data/bids/derivatives/freesurfer/sub-ON99620/mri: No such file or directory.
Traceback:
	RuntimeError: subprocess exited with code 1.

Current version of the run_fmriprep.sh file:

#!/bin/bash

# Name of job?
#SBATCH --job-name=fmriprep

# Set partition
#SBATCH --partition=amd

# How long is job?
#SBATCH -t 8:00:00

# Where to output log files? The log file will be in the format of the job ID_array number
# make sure this logs directory exists!! otherwise the script won't run
#SBATCH --output='../../data/bids/derivatives/fmriprep/logs/fmriprep-%A_%a.log'

# How much memory to allocate (in MB)?
#SBATCH --cpus-per-task=8 
#SBATCH --mem-per-cpu=20000

# Print job submission info
echo "Slurm job ID: " $SLURM_JOB_ID
date

#change the capitalized parts of these filepaths to match your project
project_dir=/bigdata/lab/sburnslab/MRI_data/NIMH-volunteer
scratch_dir=/scratch/$SLURM_JOB_USER/$SLURM_JOB_ID

data_dir=$project_dir/data
bids_dir=$data_dir/bids
derivatives_dir=$bids_dir/derivatives
scripts_dir=$project_dir/code/preprocessing

module load fmriprep
#change your fmriprep options below
singularity run /opt/linux/rocky/8/software/fmriprep/24.1.1/bin/fmriprep \
    --cleanenv --bind $project_dir --bind $scratch_dir \
    $bids_dir $derivatives_dir participant \
    --participant-label sub-$1 \
    --fs-license-file $project_dir/code/preprocessing/license.txt \
    --anat-only \
    --verbose \
    --no-submm-recon \
    --nthreads 8 --omp-nthreads 8 --mem-mb 18000 \
    --output-spaces T1w fsaverage:den-41k \
                    MNI152NLin2009cAsym:res-native \
    --write-graph --work-dir $scratch_dir \
    --ignore fieldmaps \
    --use-syn-sdc warn \
    --output-layout legacy \

Steven · February 12, 2025, 4:55am

Hi @smburns47

You can try renaming the drives when you bind them (and updating the command accordingly), explicitly setting the —fs-subjects-dir (mounting the drive as necessary), using —containall instead of -e in the singularity preamble, and/or setting the -H to change the home directory in the container.

Best,
Steven

smburns47 · February 13, 2025, 12:15am

Working through these one at a time for others’ reference:

Adding --fs-subjects-dir $derivatives_dir/freesurfer \ to the options list did not change the behavior, same error of trying to use the wrong directory to write output.
Replacing --cleanenv with --containall stopped any process from running, saying there’s no space left on device when there definitely should be:

250212-08:34:25,819 nipype.workflow IMPORTANT:
	 Building fMRIPrep's workflow:
           * BIDS dataset path: /bigdata/lab/sburnslab/MRI_data/NIMH-volunteer/data/bids.
           * Participant list: ['ON99620'].
           * Run identifier: 20250212-083410_783d0ec4-6b34-4fc2-8347-ec9ff0ed2511.
           * Output spaces: T1w fsaverage:den-41k MNI152NLin2009cAsym:res-native.
           * Pre-run FreeSurfer's SUBJECTS_DIR: /bigdata/lab/sburnslab/MRI_data/NIMH-volunteer/data/bids/derivatives/freesurfer.
2025-02-12 08:34:25,819 [IMPORTANT] Building fMRIPrep's workflow:
           * BIDS dataset path: /bigdata/lab/sburnslab/MRI_data/NIMH-volunteer/data/bids.
           * Participant list: ['ON99620'].
           * Run identifier: 20250212-083410_783d0ec4-6b34-4fc2-8347-ec9ff0ed2511.
           * Output spaces: T1w fsaverage:den-41k MNI152NLin2009cAsym:res-native.
           * Pre-run FreeSurfer's SUBJECTS_DIR: /bigdata/lab/sburnslab/MRI_data/NIMH-volunteer/data/bids/derivatives/freesurfer.
Downloading https://templateflow.s3.amazonaws.com/tpl-MNI152NLin2009cAsym/tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz

  0%|          | 0.00/13.7M [00:00<?, ?B/s]
  1%|          | 69.6k/13.7M [00:00<00:27, 502kB/s]
  3%|▎         | 383k/13.7M [00:00<00:08, 1.52MB/s]
  5%|▌         | 731k/13.7M [00:00<00:06, 1.95MB/s]
 16%|█▌        | 2.12M/13.7M [00:00<00:02, 5.07MB/s]
 23%|██▎       | 3.20M/13.7M [00:00<00:01, 5.98MB/s]
 32%|███▏      | 4.31M/13.7M [00:00<00:01, 6.62MB/s]
 40%|███▉      | 5.44M/13.7M [00:00<00:01, 7.06MB/s]
 48%|████▊     | 6.61M/13.7M [00:01<00:00, 7.40MB/s]
 57%|█████▋    | 7.79M/13.7M [00:01<00:00, 7.69MB/s]
 66%|██████▌   | 8.98M/13.7M [00:01<00:00, 7.92MB/s]
 75%|███████▍  | 10.2M/13.7M [00:01<00:00, 8.12MB/s]
 84%|████████▎ | 11.4M/13.7M [00:01<00:00, 8.32MB/s]
 93%|█████████▎| 12.7M/13.7M [00:01<00:00, 8.46MB/s]
 98%|█████████▊| 13.4M/13.7M [00:01<00:00, 6.99MB/s]
Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/templateflow/api.py", line 342, in _s3_get
    f.write(data)
OSError: [Errno 28] No space left on device

Changing that back to --cleanenv and declaring the home directory for the container with singularity run --cleanenv --bind $project_dir --bind $scratch_dir -H $project_dir \ we have a winner!! The job finally ran to completion. I’m still not sure why the container home needed to be declared directly and it couldn’t take the $project_dir argument like other projects I’ve run in the past, but this is a working solution so I’ll mark as solved. Thank you again for your help debugging!

atnjqt · May 2, 2025, 3:32pm

Thanks for this reference here, as this outlines an unexpected situation our team was running into. There is a helpful note on the FMRIPrep Singularity documentation here that outlines the problem: Running fMRIPrep via Singularity containers — fmriprep version documentation

Relevant aspects of the $HOME directory within the container. By default, Singularity will bind the user’s $HOME directory in the host into the /home/$USER (or equivalent) in the container. Most of the times, it will also redefine the $HOME environment variable and update it to point to the corresponding mount point in /home/$USER. However, these defaults can be overwritten in your system. It is recommended to check your settings with your system’s administrators. If your Singularity installation allows it, you can workaround the $HOME specification combining the bind mounts argument (-B) with the home overwrite argument (--home)

Namely, if the $USER who schedules the fmriprep job with singularity does not have a home directory on the host running the container, fmriprep defaults to using the container storage instead of automatically binding. We’re running this across a SLURM cluster where only some nodes had the $USER home directory and as such it wasn’t immediately obvious to me why this was failing. I suspect newer versions of FMRIPrep are pulling dependencies greater than the default container’s storage (i.e. we also got exact failure at 13.4M/143.7M downloaded)