I’m trying to run Tractoflow within an HCP cluster managed by SLURM, using multiple nodes, having changed the nextlow.config file to include executor = ‘slurm’.
[b4/361e0f] NOTE: Error submitting process 'N4_T1 (S2)' for execution -- Execution is retried (2)
[9a/fadafa] NOTE: Error submitting process 'N4_T1 (S1)' for execution -- Execution is retried (2)
[b7/5ac631] NOTE: Error submitting process 'Denoise_DWI (S2)' for execution -- Execution is retried (3)
[ed/209420] NOTE: Error submitting process 'Denoise_DWI (S1)' for execution -- Execution is retried (3)
[18/3471df] NOTE: Error submitting process 'README (README)' for execution -- Execution is retried (3)
[5a/f68310] NOTE: Error submitting process 'Denoise_DWI (S3)' for execution -- Error is ignored
[b5/a45e3d] NOTE: Error submitting process 'N4_T1 (S3)' for execution -- Execution is retried (3)
[17/ec3d8b] NOTE: Error submitting process 'N4_T1 (S2)' for execution -- Execution is retried (3)
[0b/127b50] NOTE: Error submitting process 'N4_T1 (S1)' for execution -- Execution is retried (3)
[b5/9910bc] NOTE: Error submitting process 'Denoise_DWI (S2)' for execution -- Error is ignored
[e5/4f1aa4] NOTE: Error submitting process 'Denoise_DWI (S1)' for execution -- Error is ignored
[8b/471a2e] NOTE: Error submitting process 'README (README)' for execution -- Error is ignored
[27/ba8bec] NOTE: Error submitting process 'N4_T1 (S3)' for execution -- Error is ignored
[b3/505148] NOTE: Error submitting process 'N4_T1 (S2)' for execution -- Error is ignored
[2a/ffb959] NOTE: Error submitting process 'N4_T1 (S1)' for execution -- Error is ignored
I checked the work folder recursively and it seems that the error file is not being created for any subprocess, just the .command.run and the .command.sh files.
First time I see something like this, I always found some *.err files.
Another thing, I’m not using singularity this way anymore. I’m using it through apptainer and now their version is close to 1.3.x . I don’t know if it can make a difference.
Can you share your .nextflow.log ?
I hope we can get more info using this file. Another option would be to give it a try with a single node and remove the executor = ‘slurm’ in the nextflow.config.
You are right, it was definitively an issue with the way you submitted your command with slurm.
Tell me if you get any tractoflow related errors.
Best,
Arnaud