fMRIPrep via Singularity: first subject runs fine, others crash when using HPC slurm array job

tmorin01 · October 12, 2022, 2:11pm

Hello Neurostars!

I’m setting up fmriprep to work on our university’s HPC cluster using a singularity image. Everything works fine if I submit the job with a single subject, but I run into issues when trying to submit additional subjects via an array job. With an array job, the first subject runs fine, but any additional subjects crash after the fmriprep process runs for a few seconds. Interestingly, I get the same error when re-running a single subject who has previously started running but never finished preprocessing. The error looks like this:

singularity: error while loading shared libraries: libseccomp.so.2: cannot open shared object file: No such file or directory

Note: I have included a line for the computer to sleep for “n” minutes before running the nth subject.

I’m including the script I’m using below. Thank you for any help you can offer!

#!/bin/bash -l

#  Author: Tom Morin
#    Date: July, 2022
# Purpose: Run fmriprep from a singularity container. Adapted from Andy's Brain
#          Book and fMRIPrep documentation:
#          andysbrainbook.readthedocs.io/en/latest/OpenScience/OS/fMRIPrep.html
#          fmriprep.org/en/1.5.9/singularity.html

#SBATCH --account=guest
#SBATCH --partition=guest-compute
#SBATCH --time=24:00:00
#SBATCH --job-name=fmriprep
#SBATCH --output=fmriprep_%j.o
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=4G
#SBATCH --cpus-per-task=16
#SBATCH --array=0-1 -N1 tmp

# Get subject from list for this array task
while read sub; do
    subjects+=($sub)
done<$1
index=$(($SLURM_ARRAY_TASK_ID))
SUB=${subjects[$index]}
echo "Subject: $SUB"

# SLEEP to stagger calls to fmriprep
sleep ${index}m

# Load necessary modules
shopt -s expand_aliases
source ~/.bash_aliases
load_singularity
load_freesurfer

# Set User Inputs
BIDS_dir=/work/tommorin/REST_BIDS
WORK_dir=/work/tommorin/fmriprep_work_$SUB
nthreads=12
mem=20 #gb

mkdir -p $WORK_dir

# Convert virtual memory from gb to mb
mem=`echo "${mem//[!0-9]/}"` #remove gb at end
mem_mb=`echo $(((mem*1000)-5000))` # reduce some mem for buffer space during preproc

export SINGULARITYENV_TEMPLATEFLOW_HOME="/templateflow"
export SINGULARITYENV_FS_LICENSE=/freesurfer/license.txt

# Run fmriprep
echo "Running fmriprep..."
singularity run --cleanenv \
  -B /share/labs/berry/freesurfer/7.3.2:/freesurfer \
  -B $BIDS_dir:/data \
  -B /home/tommorin/.cache/templateflow:/templateflow \
  -B $WORK_dir:/workdir \
  /share/labs/berry/fmriprep.simg \
  /data \
  /data/derivatives \
  participant \
  --participant-label $SUB \
  --skip-bids-validation \
  --md-only-boilerplate \
  --fs-license-file /freesurfer/license.txt \
  --output-spaces MNI152NLin2009cAsym:res-2 fsaverage \
  --nthreads $nthreads \
  --mem_mb $mem_mb \
  --omp-nthreads 8 \
  -w /workdir/ \
  -vv

Steven · October 12, 2022, 2:26pm

Hi, and welcome to Neurostars!

Can you clarify, when you get the error while loading shared libraries error, has fMRIPrep began running? That is, do you see something like this in the beginning of your log files?

220413-22:00:16,361 nipype.workflow IMPORTANT:


	 Running fMRIPrep version 21.0.1

         License NOTICE ##################################################
         fMRIPrep 21.0.1
         Copyright 2021 The NiPreps Developers.

Also, what version fMRIPrep are you using? If not the most recent (22.0.2) can you try to replicate the error after updating?

I am also curious as to how you are preparing the job array. Can you confirm that the subjects are being defined correctly (e.g. by looking at the log files, since you seem to check for that when you echo "Subject: $SUB")?

Best,
Steven

tmorin01 · October 12, 2022, 3:05pm

Hi Steven,

Thanks for your reply! Here’s what I’ve got:

For the first subject that works, fMRIPrep does start running. However, for subsequent subjects, it crashes before fMRIPrep has started running. Here’s the full log file for a subject that crashes (Note the “Running fmriprep…” is my own script’s echo):

Subject: B07237
Running fmriprep...
singularity: error while loading shared libraries: libseccomp.so.2: cannot open shared object file: No such file or directory

I am using fMRIPrep version 20.2.0. I will update to 22.0.2 and let you know how it goes.

I call the array job with:

sbatch fmriprep.sh subjects.txt

where fmriprep.sh is the script I included in the original post, and subjects.txt is a list of subjects (for now, just two subjects):

B07277
B07237

From the log files, the subjects seem to be defined correctly. Also, if I list the subjects in a reverse order, the first subject always runs, but the second subject is the one that crashes.

Steven · October 12, 2022, 3:21pm

Got it, thanks for the info. Yes please update if it persists in the most recent version.

For what it’s worth, I also submit fMRIPrep singularity jobs via SBATCH arrays.

What I do is I have a separate script that grabs the subjects and prepares the job array, and a single-subject fmriprep bash script that takes subject ID as an input.

It’s something like this:

submit_job_array.sh

subjs=($@) # You can input a list of subjects by running
# submit_job_array.sh sub-01 sub-02 ....... or just let 
# this script collect all subjects in the BIDS directory

bids=$FULL_PATH_TO_BIDS_DIRECTORY

if [[ $# -eq 0 ]]; then
    # first go to data directory, grab all subjects,
    # and assign to an array
    pushd $bids
    subjs=($(ls sub-* -d))
    popd
fi

# take the length of the array
# this will be useful for indexing later
len=$(expr ${#subjs[@]} - 1) # len - 1

echo Spawning ${#subjs[@]} sub-jobs.

sbatch --array=0-$len $PATH_TO_SINGLESUBJECT_FMRIPREP.sh ${subjs[@]}

Then the fmriprep.sh script is similar to yours, where you can get the individual subject with something like :

subjs=($@)
subject=${subjs[${SLURM_ARRAY_TASK_ID}]}

and then run fmriprep as you normally would. You would just want to change the SBATCH header in the fmriprep script such that you are not defining an array.

Does this make sense?

Best,
Steven

tmorin01 · October 12, 2022, 4:09pm

Thanks Steven! I’ve updated the Singularity image to fMRIPrep v22.0.2 and I’ve implemented your method to submit the job array. (I like the way you’ve organized your scripts!)

Sadly, I’m still running into the same issue. The first subject runs - although now, with new error/warning messages in the log file, just before fMRIPrep starts running:

B07237
Running fmriprep...
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ezjns54e because the default path (/home/fmriprep/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
221012-11:45:34,844 cli INFO:
	 Telemetry system to collect crashes and errors is enabled - thanks for your feedback!. Use option ``--notrack`` to opt out.
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
221012-11:46:10,213 nipype.workflow IMPORTANT:
	 Running fMRIPrep version 22.0.2

         License NOTICE ##################################################
         fMRIPrep 22.0.2
         Copyright 2022 The NiPreps Developers.

The second subject still fails with the same error as before in the log file:

B07277
Running fmriprep...
singularity: error while loading shared libraries: libseccomp.so.2: cannot open shared object file: No such file or directory

Finally I also tried submitting the two subjects as separate jobs, rather than a job array, but again ran into the same issue. (the first subject will run, but the second will crash after a few seconds with the same error).

I’m wondering if I am only able to run one instance of the singularity image for some reason? Or perhaps there is some setting that only allows one instance of fmriprep to access the directories I’m using? Other simple test job arrays seem to work fine (e.g. a quick bash script that prints out a subject name.)

Steven · October 12, 2022, 6:53pm

Hm, strange! I doubt this is an issue with the same image being used multiple times. I would need to know more about your HPC set up / permissions to best diagnose this. What about giving each subject their own working directory?

EDIT: I see you already do this

Do you have write permissions to /tmp, looking at the warning message in the log?

tmorin01 · October 13, 2022, 10:19pm

As an update, I believe this is an issue our cluster. I developed a simple test case using “cowsay”:

I pulled the singularity image for cowsay:
$ singularity pull docker://godlovedc/lolcow

Then I submitted an array job in the same manner we’ve been submitted fmirprep array jobs, asking three cows to say “moo 0”, “moo 1”, and “moo 2”. Just as with fmriprep, the first job runs successfully and I get the following output:

< moo 0 >
 -------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

But for the other two jobs, I get the same error:
singularity: error while loading shared libraries: libseccomp.so.2: cannot open shared object file: No such file or directory

Notably, I was able to try this out on another university’s slurm cluster where I also have an affiliation and everything worked fine. Back at my primary university where things are not working, we do not have a cluster admin, so it might take some time to find a fix…

Since I’ve determined that this is likely a cluster issue and not an fmriprep issue, I could go ahead and close the thread? If anyone has recommendations on where to find resources related to configuring singularity on a cluster, let me know!

Steven · October 15, 2022, 3:06pm

As an alternative, you can try uploading and running your analyses on Brainlife.io? I realize this involves learning a new tool which was probably not the original intention, and it would be best to figure out what is going on in the cluster, but this could be a good temporary workaround.