Running Tractoflow in HPC multi-node

Summary of what happened:

I’m trying to run Tractoflow within an HCP cluster managed by SLURM, using multiple nodes, having changed the nextlow.config file to include executor = ‘slurm’.

Command used:

#!/bin/bash
#SBATCH --account=haslab
#SBATCH --job-name=tractoflow-all-run
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --time=48:00:00
#SBATCH --output=/projects/ricarlojo/Tractoflow-adap-workspace/run_%j_n2_cpu.out  # std out
#SBATCH --error=/projects/ricarlojo/Tractoflow-adap-workspace/run_%j_n2_cpu.err   # std err

export OMP_NUM_THREADS=8
export ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=8
export NXF_VER=21.10.6
export MPLCONFIGDIR=/projects/ricarlojo/Tractoflow-workspace/Matlab-w


nextflow run /projects/ricarlojo/tractoflow-adapt/main.nf --input /home/ricarlojo/HCP_files/T1w \
-with-singularity /projects/ricarlojo/scilus_1.6.0.sif --output_dir /projects/ricarlojo/Tractoflow-adap-workspace/results-n2 -with-tower

Version:

v.2.4.3

Environment:

Singularity

Relevant log outputs:

[b4/361e0f] NOTE: Error submitting process 'N4_T1 (S2)' for execution -- Execution is retried (2)
[9a/fadafa] NOTE: Error submitting process 'N4_T1 (S1)' for execution -- Execution is retried (2)
[b7/5ac631] NOTE: Error submitting process 'Denoise_DWI (S2)' for execution -- Execution is retried (3)
[ed/209420] NOTE: Error submitting process 'Denoise_DWI (S1)' for execution -- Execution is retried (3)
[18/3471df] NOTE: Error submitting process 'README (README)' for execution -- Execution is retried (3)
[5a/f68310] NOTE: Error submitting process 'Denoise_DWI (S3)' for execution -- Error is ignored
[b5/a45e3d] NOTE: Error submitting process 'N4_T1 (S3)' for execution -- Execution is retried (3)
[17/ec3d8b] NOTE: Error submitting process 'N4_T1 (S2)' for execution -- Execution is retried (3)
[0b/127b50] NOTE: Error submitting process 'N4_T1 (S1)' for execution -- Execution is retried (3)
[b5/9910bc] NOTE: Error submitting process 'Denoise_DWI (S2)' for execution -- Error is ignored
[e5/4f1aa4] NOTE: Error submitting process 'Denoise_DWI (S1)' for execution -- Error is ignored
[8b/471a2e] NOTE: Error submitting process 'README (README)' for execution -- Error is ignored
[27/ba8bec] NOTE: Error submitting process 'N4_T1 (S3)' for execution -- Error is ignored
[b3/505148] NOTE: Error submitting process 'N4_T1 (S2)' for execution -- Error is ignored
[2a/ffb959] NOTE: Error submitting process 'N4_T1 (S1)' for execution -- Error is ignored

Hello @Ricardo_A ,

I would need to know what’s behind these errors.
Can you check one them ? Please run this command:

cat work/18/3471df*/.command.err

Can you also give me the singularity version you have with this command line:

singularity version

Thank you
Arnaud

Hello @abore,

I checked the work folder recursively and it seems that the error file is not being created for any subprocess, just the .command.run and the .command.sh files.

My singularity version is 4.1.2-1.el9.

Thank you
Ricardo