Summary of what happened:
Hi,
I am trying to run fMRI prep on a HPC and everything seems to be working well but it keeps hanging on the freesurfer processes. It doesn’t exit, just hangs.
Command used (and if a helper script was used, a link to the helper script or the command generated):
Here is my setup…
#!/bin/bash
#
#-----------------------------------------------------------------------------
#
#SBATCH -J lastest # Job name
#SBATCH -o 128_max_tempflow_fMRIprep_.%j # Name of stdout output file (%j expands to jobId)
#SBATCH -p normal # Queue name
#SBATCH -N 1 # Total number of nodes requested (68 cores/node)
#SBATCH -n 1 # Total number of mpi tasks requested
#SBATCH -t 48:00:00 # Run time (hh:mm:ss) - 30 minutes
#SBATCH -A IBN22006 # allocation to run under
# created by Jennifer May, 2023
# -----------------------------------------------------------------------------------------
SLURM_CPUS_PER_TASK=64 #Cores = 128 (64 cores / socket)
SLURM_MEM_PER_NODE=128000 #total is 128 GB of RAM per node
TEMPLATEFLOW_HOST_HOME=/home1/06953/jes6785/.cache/templateflow
# location and user inputs - make sure this is up to date for your computer
#----------------------------------------------------------------------------------
module load tacc-apptainer
# DIRECTORY LOCATIONS
BIDS_DIR=/scratch/06953/jes6785/NECTARY_DATA/
OUTPUT_DIR=${BIDS_DIR}derivatives/fmriprep-v23.0.2/
find ${OUTPUT_DIR}sourcedata/freesurfer/sub-B043/ -name "*IsRunning*" -type f -delete
unset PYTHONPATH;
apptainer run -B /scratch/06953/jes6785/NECTARY_DATA/:/scratch/06953/jes6785/NECTARY_DATA/ \
-B /scratch/06953/jes6785/working_dir/:/scratch/06953/jes6785/working_dir/ \
-B /scratch/06953/jes6785/NECTARY_DATA/derivatives/fmriprep-v23.0.2/code:/scratch/06953/jes6785/NECTARY_DATA/derivatives/fmriprep-v23.0.2/code \
-B /home1/06953/jes6785/.cache/templateflow:/opt/templateflow --cleanenv \
/work/06953/jes6785/Containers/fmriprep_23.0.2.sif \
/scratch/06953/jes6785/NECTARY_DATA/ \ /scratch/06953/jes6785/NECTARY_DATA/derivatives/fmriprep-v23.0.2/ \
participant --participant-label B043 -w /scratch/06953/jes6785/working_dir/ \
--fs-license-file /scratch/06953/jes6785/NECTARY_DATA/derivatives/fmriprep-v23.0.2/code/license_2.txt \
--skip_bids_validation -vvv --nprocs $SLURM_CPUS_PER_TASK \
--mem_mb $SLURM_MEM_PER_NODE \
--bids-filter-file /scratch/06953/jes6785/NECTARY_DATA/derivatives/fmriprep-v23.0.2/code/ses-01_bf.json
Version:
23.0.2
Environment (Docker, Singularity, custom installation):
Singularity/Apptainer
Data formatted according to a validatable standard? Please provide the output of the validator:
Relevant log outputs (up to 20 lines):
This is the output I keep getting over and over again…
[Node] Up-to-date cache found for "fmriprep_23_0_wf.single_subject_B043_wf.func_preproc_ses_01_task_cyb_dir_AP_wf.bold_confounds_wf.tcc_metadata_fmt".
230515-09:23:04,909 nipype.workflow DEBUG:
Checking hash "fmriprep_23_0_wf.single_subject_B043_wf.func_preproc_ses_01_task_cyb_dir_AP_wf.bold_confounds_wf.tcc_metadata_fmt" locally: cached=True, updated=True.
230515-09:23:04,909 nipype.workflow DEBUG:
Skipping cached node fmriprep_23_0_wf.single_subject_B043_wf.func_preproc_ses_01_task_cyb_dir_AP_wf.bold_confounds_wf.tcc_metadata_fmt with ID 195.
230515-09:23:04,910 nipype.workflow INFO:
[Job 195] Cached (fmriprep_23_0_wf.single_subject_B043_wf.func_preproc_ses_01_task_cyb_dir_AP_wf.bold_confounds_wf.tcc_metadata_fmt).
230515-09:23:06,803 nipype.workflow DEBUG:
Progress: 376 jobs, 199/1/0 (done/running/ready), 1/176 (pending_tasks/waiting).
230515-09:23:06,804 nipype.workflow DEBUG:
Tasks currently running: 1. Pending: 1.
230515-09:23:06,806 nipype.workflow INFO:
[MultiProc] Running 1 tasks, and 0 jobs ready. Free memory (GB): 123.00/128.00, Free processors: 56/64.
Currently running:
* fmriprep_23_0_wf.single_subject_B043_wf.anat_preproc_wf.surface_recon_wf.autorecon_resume_wf.autorecon2_vol
Screenshots / relevant information:
How can I fix this, please help.
What I have tried…
I’ve tried messing around with the --nprocs $SLURM_CPUS_PER_TASK --mem_mb $SLURM_MEM_PER_NODE
, but this doesn’t seem to resolve the problem. What am I missing here?