MRIQC singularity runing on slurm cluster computing node is not mounting sysfs

Hi, dear MRIQC experts,
I am trying to run mriqc singularity on an slurm cluster and singularity is complaining about the sysfs mounting. No idea why this problem came out. I need your suggestions for this problem.
Here are the error message (in singularity debug mode):
" WARNING mount of sysfs: no such file or directory
WARNING passwd file doesn’t exist in container, not updating
WARNING group file doesn’t exist in container, not updating
FATAL shell /bin/sh doesn’t exist in container"
Here are my steps and how I run it:

  1. I build the singularity image with “sudo” from dockerhub: mriqc_v0.15.2.simg;
  2. This image works well on my local computer with:
    “singularity run -B HOME:/home/mriqc --home /home/mriqc --cleanenv \ -B {HOME}/project/HC_BIDS:/data:ro
    -B {HOME}/project/mriqc_out:/out \ -B {HOME}/project/mriqc_work:/mriqc_work
    ${HOME}/container_images/mriqc_v0.15.2.simg /data /out participant
    –participant-label sub-0039 sub-0053 -w /mriqc_work --session-id 1 --ica --no-sub --verbose-reports”
  3. I can even start running this image with same configuration on the head node of the slurm cluster (then I just stopped it);
  4. when I am submitting the job with sbatct (1-40 array), it is giving the above sysfs mounting error;
  5. I have tried to clean the singularity cache by deleting them all and clean cache for this container, and both of these 2 attempts did not solve this problem;
  6. I have build and run this version of mriqc on the same cluster but for different dataset half a year ago, and it works perfectly well.
  7. I have run singularity in debug mode on both cluster head node and compute node, the main difference of the log are as below:
    7.1) head node:
    DEBUG [U=3066619,P=67610]create() Mount all
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting tmpfs to /var/singularity/mnt/session
    DEBUG [U=3066619,P=67610]mountImage() Mounting loop device /dev/loop0 to /var/singularity/mnt/session/rootfs of type squashfs
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting overlay to /var/singularity/mnt/session/final
    DEBUG [U=3066619,P=67610]setPropagationMount() Set RPC mount propagation flag to SLAVE
    VERBOSE [U=3066619,P=67610]Passwd() Checking for template passwd file: /var/singularity/mnt/session/rootfs/etc/passwd
    VERBOSE [U=3066619,P=67610]Passwd() Creating passwd content
    VERBOSE [U=3066619,P=67610]Passwd() Creating template passwd file and appending user data: /var/singularity/mnt/session/rootfs/etc/passwd
    DEBUG [U=3066619,P=67610]addIdentityMount() Adding /etc/passwd to mount list
    VERBOSE [U=3066619,P=67610]addIdentityMount() Default mount: /etc/passwd:/etc/passwd
    VERBOSE [U=3066619,P=67610]Group() Checking for template group file: /var/singularity/mnt/session/rootfs/etc/group
    VERBOSE [U=3066619,P=67610]Group() Creating group content
    DEBUG [U=3066619,P=67610]addIdentityMount() Adding /etc/group to mount list
    VERBOSE [U=3066619,P=67610]addIdentityMount() Default mount: /etc/group:/etc/group
    DEBUG [U=3066619,P=67610]mountGeneric() Remounting /var/singularity/mnt/session/final
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /dev to /var/singularity/mnt/session/final/dev
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /etc/localtime to /var/singularity/mnt/session/final/usr/share/zoneinfo/UTC
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /etc/hosts to /var/singularity/mnt/session/final/etc/hosts
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /var/singularity/mnt/session/actions to /var/singularity/mnt/session/final/.singularity.d/actions
    DEBUG [U=3066619,P=67610]mountGeneric() Remounting /var/singularity/mnt/session/final/.singularity.d/actions
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /proc to /var/singularity/mnt/session/final/proc
    DEBUG [U=3066619,P=67610]mountGeneric() Remounting /var/singularity/mnt/session/final/proc
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting sysfs to /var/singularity/mnt/session/final/sys
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /home/vincentq to /var/singularity/mnt/session/home/vincentq
    DEBUG [U=3066619,P=67610]mountGeneric() Remounting /var/singularity/mnt/session/home/vincentq
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /var/singularity/mnt/session/home/vincentq to /var/singularity/mnt/session/final/home/vincentq
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /tmp to /var/singularity/mnt/session/final/tmp
    DEBUG [U=3066619,P=67610]mountGeneric() Remounting /var/singularity/mnt/session/final/tmp
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /var/tmp to /var/singularity/mnt/session/final/var/tmp
    DEBUG [U=3066619,P=67610]mountGeneric() Remounting /var/singularity/mnt/session/final/var/tmp
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /home/vincentq/scratch to /var/singularity/mnt/session/final/lustre04/scratch/vincentq
    DEBUG [U=3066619,P=67610]mountGeneric() Remounting /var/singularity/mnt/session/final/lustre04/scratch/vincentq
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /var/singularity/mnt/session/etc/resolv.conf to /var/singularity/mnt/session/final/etc/resolv.conf
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /var/singularity/mnt/session/etc/passwd to /var/singularity/mnt/session/final/etc/passwd
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /var/singularity/mnt/session/etc/group to /var/singularity/mnt/session/final/etc/group
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /home/vincentq/scratch/HC_BIDS to /var/singularity/mnt/session/final/data
    DEBUG [U=3066619,P=67610]mountGeneric() Remounting /var/singularity/mnt/session/final/data
    DEBUG [U=3066619,P=67610]mountGeneric() Mounting /home/vincentq/scratch/HC_mriqc to /var/singularity/mnt/session/final/out
    DEBUG [U=3066619,P=67610]mountGeneric() Remounting /var/singularity/mnt/session/final/out
    DEBUG [U=3066619,P=67610]create() Chroot into /var/singularity/mnt/session/final
    DEBUG [U=0,P=67632] Chroot() Hold reference to host / directory
    DEBUG [U=0,P=67632] Chroot() Called pivot_root on /var/singularity/mnt/session/final
    DEBUG [U=0,P=67632] Chroot() Change current directory to host / directory
    DEBUG [U=0,P=67632] Chroot() Apply slave mount propagation for host / directory
    DEBUG [U=0,P=67632] Chroot() Called unmount(/, syscall.MNT_DETACH)
    DEBUG [U=0,P=67632] Chroot() Changing directory to / to avoid getpwd issues
    DEBUG [U=3066619,P=67610]create() Chdir into / to avoid errors
    VERBOSE [U=0,P=67629] wait_child() rpc server exited with status 0
    DEBUG [U=0,P=67629] apply_container_privileges() Set user ID to 3066619
    DEBUG [U=3066619,P=67629] set_parent_death_signal() Set parent death signal to 9
    DEBUG [U=3066619,P=67629]startup() singularity runtime engine selected
    VERBOSE [U=3066619,P=67629]startup() Execute stage 2
    DEBUG [U=3066619,P=67629]StageTwo() Entering stage 2
    DEBUG [U=3066619,P=67610]PostStartProcess() Post start process

7.2) compute node
VERBOSE [U=3066619,P=42257]addCwdMount() Default mount: /lustre04/scratch/vincentq: to the container
DEBUG [U=3066619,P=42257]create() Mount all
DEBUG [U=3066619,P=42257]mountGeneric() Mounting tmpfs to /var/singularity/mnt/session
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /home/vincentq/scratch/HC_mriqc_work to /var/singularity/mnt/session/rootfs
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/rootfs
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/rootfs
DEBUG [U=3066619,P=42257]addActionsMount() Ignoring actions mount, /var/singularity/mnt/session/rootfs/.singularity.d/actions doesn’t exist
WARNING [U=3066619,P=42257]createLayer() skipping mount of sysfs: no such file or directory
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /var/singularity/mnt/session/underlay to /var/singularity/mnt/session/final
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final
DEBUG [U=3066619,P=42257]setPropagationMount() Set RPC mount propagation flag to SLAVE
VERBOSE [U=3066619,P=42257]Passwd() Checking for template passwd file: /var/singularity/mnt/session/rootfs/etc/passwd
WARNING [U=3066619,P=42257]addIdentityMount() passwd file doesn’t exist in container, not updating
VERBOSE [U=3066619,P=42257]Group() Checking for template group file: /var/singularity/mnt/session/rootfs/etc/group
WARNING [U=3066619,P=42257]addIdentityMount() group file doesn’t exist in container, not updating
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /dev to /var/singularity/mnt/session/final/dev
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /etc/localtime to /var/singularity/mnt/session/final/etc/localtime
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /etc/hosts to /var/singularity/mnt/session/final/etc/hosts
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /proc to /var/singularity/mnt/session/final/proc
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final/proc
DEBUG [U=3066619,P=42257]mountGeneric() Mounting sysfs to /var/singularity/mnt/session/final/sys
WARNING [U=3066619,P=42257]mountGeneric() Skipping mount sysfs [kernel]: /sys doesn’t exist in container
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /var/singularity/mnt/session/home/mriqc to /var/singularity/mnt/session/final/home/mriqc
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final/home/mriqc
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /tmp to /var/singularity/mnt/session/final/tmp
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final/tmp
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /var/tmp to /var/singularity/mnt/session/final/var/tmp
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final/var/tmp
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /home/vincentq/scratch to /var/singularity/mnt/session/final/lustre04/scratch/vincentq
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final/lustre04/scratch/vincentq
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /var/singularity/mnt/session/etc/resolv.conf to /var/singularity/mnt/session/final/etc/resolv.conf
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /home/vincentq to /var/singularity/mnt/session/final/home/mriqc
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final/home/mriqc
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /home/vincentq/scratch/HC_BIDS to /var/singularity/mnt/session/final/data
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final/data
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /home/vincentq/scratch/HC_mriqc to /var/singularity/mnt/session/final/out
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final/out
DEBUG [U=3066619,P=42257]mountGeneric() Mounting /lustre04/scratch/vincentq to /var/singularity/mnt/session/final/mriqc_work
DEBUG [U=3066619,P=42257]mountGeneric() Remounting /var/singularity/mnt/session/final/mriqc_work
DEBUG [U=3066619,P=42257]create() Chroot into /var/singularity/mnt/session/final
DEBUG [U=0,P=42344] Chroot() Hold reference to host / directory
DEBUG [U=0,P=42344] Chroot() Called pivot_root on /var/singularity/mnt/session/final
DEBUG [U=0,P=42344] Chroot() Change current directory to host / directory
DEBUG [U=0,P=42344] Chroot() Apply slave mount propagation for host / directory
DEBUG [U=0,P=42344] Chroot() Called unmount(/, syscall.MNT_DETACH)
DEBUG [U=0,P=42344] Chroot() Changing directory to / to avoid getpwd issues
DEBUG [U=3066619,P=42257]create() Chdir into / to avoid errors
VERBOSE [U=0,P=42343] wait_child() rpc server exited with status 0
DEBUG [U=0,P=42343] apply_container_privileges() Set user ID to 3066619
DEBUG [U=3066619,P=42343] set_parent_death_signal() Set parent death signal to 9
DEBUG [U=3066619,P=42343]startup() singularity runtime engine selected
VERBOSE [U=3066619,P=42343]startup() Execute stage 2
DEBUG [U=3066619,P=42343]StageTwo() Entering stage 2
FATAL [U=3066619,P=42343]StageTwo() shell /bin/sh doesn’t exist in container
DEBUG [U=3066619,P=42257]startContainer() stage 2 process reported an error, waiting status
DEBUG [U=3066619,P=42257]Master() Child exited with exit status 255

Hi @1118 !

Could you share your SLURM submission script ? Since it works well locally and on the head node, I’m wondering what your sbatch script includes.

Elizabeth

Hi, dear @emdupre, great thanks for your feedback, here are the script I am using:

  1. I submit the job with
    sbatch mriqc.slurm ${BIDS_DIR} ${OUT_DIR} ${SUB_LIST} ${CON_IMG} ${MRIQC_WORK_DIR} >> ${LOG_DIR};
  2. mriqc.slurm
    #!/bin/bash
    #SBATCH --job-name=str
    #SBATCH --time=6:00:00
    #SBATCH --account=str
    #SBATCH --cpus-per-task=8
    #SBATCH --mem-per-cpu=6GB
    #SBATCH --array=1-40
    #Outputs ----------------------------------
    #SBATCH -o %x-%A-%a_%j.out
    #SBATCH -e %x-%A-%a_%j.err
    #SBATCH --mail-user=str@gmail.com
    #SBATCH --mail-type=ALL
    #------------------------------------------
    DATA_DIR=(${@:1:1})
    OUT_DIR=(${@:2:1})
    SUB_LIST=(${@:3:1})
    CON_IMG=(${@:4:1})
    MRIQC_WORK_DIR=(${@:5:1})
    echo ${DATA_DIR} ${OUT_DIR} ${PARA_LIST} ${SUB_LIST} ${MRIQC_WORK_DIR}
    #Begin work section
    SUB_STR=$(sed -n “${SLURM_ARRAY_TASK_ID}p” ${SUB_LIST})
    echo ${SUB_STR}
    SUB_ID="$(cut -d’-’ -f2 <<<${SUB_STR})"
    echo "Slurm task ID: " ${SLURM_ARRAY_TASK_ID} "SUBJECT_ID: " ${SUB_ID}
    singularity run -B $HOME:/home/mriqc --home /home/mriqc --cleanenv
    -B ${DATA_DIR}:/data:ro
    -B ${OUT_DIR}:/out
    -B ${MRIQC_WORK_DIR}:/mriqc_work
    ${CON_IMG} /data /out participant
    –participant-label ${SUB_ID} -w /mriqc_work --session-id 1 --ica --no-sub --n_procs 8 --verbose-reports

Thanks, @1118 !

And sorry, just to follow up with one more question:

Could you share the beginning of a log file ? Not the one in console, but what it’s writing to. I just want to see what those variables you’re “echo” ing read as.

In particular, I see this difference between the successful:

and failing builds:

where it seems to be trying to mount your working directory to the container when it shouldn’t be.

Here you go
mriqc singularity log
and the full slurm logs are:
https://drive.google.com/file/d/1uzRJPYqvLoKx-HEhuRJoqISHdeP034v4/view?usp=sharing

In the logs you sent, I see these three outputs:

/home/vincentq/scratch/HC_BIDS /home/vincentq/scratch/HC_mriqc /home/vincentq/scratch/HC_mriqc_subjects.list

when I’m expecting these five variables:

echo ${DATA_DIR} ${OUT_DIR} ${PARA_LIST} ${SUB_LIST} ${MRIQC_WORK_DIR}

Could you try updating your scripts (maybe by adding a few echo statements) to confirm that you’re assigning these variables correctly ? One thing that might help, too, is if this section could be simplified:

DATA_DIR=(${@:1:1})
OUT_DIR=(${@:2:1})
SUB_LIST=(${@:3:1})
CON_IMG=(${@:4:1})
MRIQC_WORK_DIR=(${@:5:1})

I’d imagine almost all of those should be relatively constant and so could be strings included in the script itself, rather than variables you pass in !

Elizabeth

Great thanks, I have checked my script, and finally targeted the problem. The singularity reported FETAL ERROR is due to the failure of mounting mriqc_workdir, and this is caused by the arguments parsing from my mriqc.sh to mriqc.slurm as shown in the slurm.out file.
For the arguments parsing problem, I was using different variable names for the contrainer image (CON_IMG_DIR in mriqc.sh (as the 4th parameter for mriqc.slurm) and CON_IMG (get the 4th input parameter), they are consistant in their corresponding files), I just changed the variable name in mriqc.slurm to CON_IMG_DIR , and then it works.
Great thanks for your patience and experience!