fMRIprep Error on HPC System

Summary of what happened:

I have been trying to get fMRIprep running on the HPC cluster I’m working with for almost two weeks now. This bug in particular I am having a hard time debugging since there isn’t really much information to go off of. If there is any way that I can provide more useful information please let me know because that would probably be helpful for myself too.

Command used (and if a helper script was used, a link to the helper script or the command generated):

module load apptainer

project=~/projects/<lab group>/<username>/bird_data_analysis
sub_num="05"

# Extract zipped BOLD data to temp directory
cp ${project}/data/raw_data/sub_${sub_num}.tar.gz $SLURM_TMPDIR/
# Places fmri_processing directory in SLURM_TMPDIR
tar -xzf $SLURM_TMPDIR/sub_${sub_num}.tar.gz -C $SLURM_TMPDIR/

# Create directories for fMRIprep to access at runtime
mkdir $SLURM_TMPDIR/work_dir
mkdir $SLURM_TMPDIR/sub_${sub_num}_out
mkdir $SLURM_TMPDIR/image
mkdir $SLURM_TMPDIR/license

# Required fMRIprep files
cp ${project}/dataset_description.json $SLURM_TMPDIR/fmri_processing/results/TC2See
cp ${project}/fmriprep2.simg $SLURM_TMPDIR/image
cp ${project}/license.txt $SLURM_TMPDIR/license


apptainer run  --cleanenv \
    -B $SLURM_TMPDIR/fmri_processing/results/TC2See:/raw \
    -B $SLURM_TMPDIR/sub_${sub_num}_out:/output \
    -B $SLURM_TMPDIR/work_dir:/work_dir \
    -B $SLURM_TMPDIR/image:/image \
    -B $SLURM_TMPDIR/license:/license \
    $SLURM_TMPDIR/image/fmriprep.simg \
    /raw /output participant \
    --participant-label ${sub_num} \
    --work-dir /work_dir \
    --fs-license-file /license/license.txt \
    --output-spaces fsaverage \
    --stop-on-first-crash

Version:

fMRIPrep 23.2.1

Environment (Docker, Singularity / Apptainer, custom installation):

I am using an image downloaded using:

singularity build fmriprep-23.2.1.simg docker://poldracklab/fmriprep:23.2.1

and this is being ran on an HPC cluster.

Data formatted according to a validatable standard? Please provide the output of the validator:

Relevant log outputs (up to 20 lines):

 Node res_tmpl failed to run on host cdr482.int.cedar.computecanada.ca.
240313-23:45:36,666 nipype.workflow ERROR:
         Saving crash info to /output/sub-10/log/20240313-234447_6c8dff82-48bf-45de-a65b-4731b65aaf7e/crash-20240313-234536-<my_username>-res_tmpl-6e55d8d7-b76e-4ad6-b989-591474ca4258.txt
Traceback (most recent call last):
  File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node res_tmpl.

Traceback:
        Traceback (most recent call last):
          File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 397, in run
            runtime = self._run_interface(runtime)
          File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/niworkflows/interfaces/nibabel.py", line 314, in _run_interface
            resample_by_spacing(
          File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/niworkflows/utils/images.py", line 255, in resample_by_spacing
            data = gaussian_filter(in_file.get_fdata(), smooth)
          File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nibabel/dataobj_images.py", line 373, in get_fdata
            data = np.asanyarray(self._dataobj, dtype=dtype)
          File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nibabel/arrayproxy.py", line 439, in __array__
            arr = self._get_scaled(dtype=dtype, slicer=())
          File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nibabel/arrayproxy.py", line 406, in _get_scaled
            scaled = apply_read_scaling(self._get_unscaled(slicer=slicer), scl_slope, scl_inter)
          File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nibabel/arrayproxy.py", line 376, in _get_unscaled
            return array_from_file(
          File "/opt/conda/envs/fmriprep/lib/python3.10/site-packages/nibabel/volumeutils.py", line 472, in array_from_file
            raise OSError(
        OSError: Expected 64364544 bytes, got 14000589 bytes from object
         - could the file be damaged?

Screenshots / relevant information:

–cpus-per-task=16
–mem-per-cpu=16G
I have tried using up to 64G per cpu just in case but still had the same error.


Hi @James_McKinnon and welcome to neurostars!

That docker hub address is incorrect. Only the nipreps/fmriprep docker distribution is the official maintained one.

Besides that, is the error specific to that subject or consistent across your dataset? Is it possible there was a Dicom conversion error?

Best,
Steven

Hi Steven, thank you so much for getting back to me.

I just used this command:

singularity build fmriprep-23.2.1.simg docker://nipreps/fmriprep:23.2.1

And it said all of the blobs already exist and skipped through them so it’s possible this was the command I used in the first place. I didn’t have the command saved and got the other command above from the Singularity documentation before posting because that’s what I thought I used but I could have been wrong. I’ll try the job again using this new copy though.

Honestly I’m not 100% sure what a Dicom conversion error is but I know that fMRIprep has worked for this exact data using docker on my computer in the past. I zipped and copied those exact files that worked on my computer to the HPC system and then extracted them at the start of my code above.

I saw other posts where people were recommending a different working directory but in my case this one has a ton of storage capacity so I don’t think that’s a problem.

I ran the code with the new downloaded image and still got the same error.

There is this too all the way at the top of the output I just realized, not sure if it is helpful:

INFO:    underlay of /etc/localtime required more than 50 (100) bind mounts
bids-validator@1.14.1
(node:14115) Warning: Closing directory handle on garbage collection
(Use `node --trace-warnings ...` to show where the warning was created)
        ^[[33m1: [WARN] The recommended file /README is missing. See Section 03 (Modality agnostic files) of the BIDS specification. (code: 101 - README_FILE_MISSING)^[[39m

Hi @James_McKinnon,

It’s possible there was a mistake while data transferring, perhaps try recopying the data? It would also help to know if this error is for all subjects or just a few subjects as I asked earlier.

That suggestion isn’t always about storage, but rather forcing fmriprep to recalculate things from scratch.

Best,
Steven

Hi @Steven, my apologies I totally forgot to mention that this is just used for one subject at a time. I will try running it after I re-copy the data but it was a fairly simple script to zip each subject’s data and then scp them all to the server.

Is there anything that you can think of that may give me more information about what’s going wrong?

Hi @James_McKinnon

It would help to know if this error happens for all subjects or just isolated ones.

Best,
Steven

Hi @Steven, I have tried running the script for 4 different subjects and the error was the same. Are you saying that it would be helpful to try to run one job for all subjects? Like specifying multiple participant labels? Or running all of them separately with their own jobs?

Hi @James_McKinnon,

Good to know.

Can you compare the file sizes of the images on the old and new machine and confirm they’re the same? Can you try a different method of copying data? Do the new files load correctly in image viewers?

Best,
Steven

1 Like