Summary of what happened:
Dear neurostars community,
I am encountering persistent errors when running QSIPrep version 0.23.0 on an HPC system using Apptainer. Despite multiple attempts, the workflow fails at specific nodes, and I have been unable to resolve the issue. Below are the details of my setup and the error messages:
I do not have the ability to update the Apptainer version due to lack of admin privileges for the cluster.
Key observations:
- The error occurs regardless of whether I run interactively or via SLURM.
- The anat_nlin_normalization node appears to complete successfully before subsequent nodes fail.
- The error is not resolved by re-running the workflow or clearing the working directory
Request for Assistance
Could you please advise on the following:
- Diagnosing the Issue: What could be causing the failure in the gather_inputs and synthseg nodes? Are there logs or specific files I should check for more details?
- Resolution Steps: Given that I cannot update the Apptainer or QSIPrep versions due to system constraints, are there any suggested workarounds or parameter adjustments to avoid this issue?
- Workflow Caching: Could the outdated cache warning indicate an issue? If so, what is the best way to clear/reset the workflow cache entirely?
Any insights or recommendations would be greatly appreciated. Please let me know if additional information is needed.
Thank you for your help!
Command used (and if a helper script was used, a link to the helper script or the command generated):
Interactively:
qsiprep input/ output/ -w ~/palmer_scratch/qsi_work/ participant --participant-label 01 --output-resolution 2.5 --skip-bids-validation
Slurm:
#!/bin/bash
#SBATCH --job-name=qsiprep
#SBATCH --output=q_%j.out
#SBATCH --error=q_%j.err
#SBATCH --cpus-per-task=55
#SBATCH --mail-type=ALL
#SBATCH --mem 100GB
#SBATCH --time 24:00:00
#SBATCH --partition day
PARTICIPANT_FILE="participants.txt"
PARTICIPANT_NUMBERS=$(tr '\n' ' ' < "$PARTICIPANT_FILE")
NUM_PARTICIPANTS=$(wc -l < "$PARTICIPANT_FILE")
OUTPUT_DIR="/home/kab285/palmer_scratch/fmriprep/output/"
WORK_DIR="/home/kab285/palmer_scratch/qsi_work/"
RESOLUTION=2.5
apptainer exec qsiprep.sif qsiprep . $OUTPUT_DIR -w $WORK_DIR participant --participant-label $PARTICIPANT_NUMBERS --output-resolution $RESOLUTION --anat_modality T1W
Version:
0.23.0
Environment (Docker, Singularity / Apptainer, custom installation):
Apptainer
Data formatted according to a validatable standard? Please provide the output of the validator:
PASTE VALIDATOR OUTPUT HERE
Relevant log outputs (up to 20 lines):
[Nipype] Setting-up "qsiprep_0_23_wf.sub_01_wf.anat_preproc_wf.anat_normalization_wf.anat_nlin_normalization".
[INFO] Outdated cache found for "qsiprep_0_23_wf.sub_01_wf.anat_preproc_wf.anat_normalization_wf.anat_nlin_normalization".
[INFO] Executing "anat_nlin_normalization" <qsiprep.interfaces.niworkflows.RobustMNINormalizationRPT>
...
[ERROR] could not run node: qsiprep_0_23_wf.sub_01_wf.dwi_preproc_wf.hmc_sdc_wf.gather_inputs
[ERROR] could not run node: qsiprep_0_23_wf.sub_01_wf.anat_preproc_wf.synthseg_wf.synthseg
[CRITICAL] QSIPrep failed: 2 raised. Re-raising first.