Summary of what happened:
Hi,
I am back to running QSIPrep on on an HPC system on some in-house data that contains T1w and AP/PA fieldmaps and dMRI data.
The running time for a single subject seems to be hard to predict even if my allocation request is staying the same across my job submission for the set of BIDS studies in my dataset.
I had a job running for 12 hours on Wednesday for a study having a single participant and the job did not complete; all anat
processing seem to have completed:
sub-ID_desc-aseg_dseg.nii.gz
sub-ID_desc-brain_mask.nii.gz
sub-ID_desc-preproc_T1w.nii.gz
sub-ID_dseg.nii.gz
sub-ID_from-MNI152NLin2009cAsym_to-T1w_mode-image_xfm.h5
sub-ID_from-orig_to-T1w_mode-image_xfm.txt
sub-ID_from-T1wACPC_to-T1wNative_mode-image_xfm.mat
sub-ID_from-T1wNative_to-T1wACPC_mode-image_xfm.mat
sub-ID_from-T1w_to-MNI152NLin2009cAsym_mode-image_xfm.h5
but when the allocated time finished the dwi
folder contained only:
sub-ID_acq-dir99_confounds.tsv
sub-ID_acq-dir99_desc-SliceQC_dwi.json
I had a participant with the same pre-processing steps whose job got completed in slightly over 6 hours.
Yesterday I re-submitted the job but allocated runtime for 24 hours at 9:35 AM. After the job time allocation has finished today, I see the last log trace was from 9:36 AM yesterday with the message:
(...)
### References
250109-09:36:43,35 nipype.workflow INFO:
[Node] Setting-up "qsiprep_wf.single_subject_ID_wf.anat_preproc_wf.output_grid_wf.deoblique_autobox" in "/workdir/qsiprep_wf/single_subject_ID_wf/anat_preproc_wf/output_grid_wf/deoblique_autobox".
That is, it is not showing almost any entries for the processing that it has actually done. I am really confused by this.
I see in the output directory that by 9:37 AM yesterday it had already pre-processed the anat
data and supposedly all data files had been written.
However, the dwi
folder contains the same two files as on Wednesday; i.e. no real progress has been done during a reasonable number of hours (>12).
So:
- It seems weird that for the same allocation (except for the time) the tool varies all that much across participant data whose scanning protocol was the same. Am I missing something?
- It seems weird to me that across different runs the log files vary that much in their contents. Am I missing something?
- I am wondering what I may be missing to have the pre-processing completed with this participant, as there is no trace of failure in the logs.
Thanks.
Command used (and if a helper script was used, a link to the helper script or the command generated):
I am running the following command in my SLURM script:
cmd="singularity run --cleanenv \
--bind ${in_bids_dirname} \
--bind ${out_dirname} \
--bind ${fs_license_fname}:${mapped_fs_license_fname} \
--bind ${work_dirname} \
${qsiprep_singularity_fname} \
${in_bids_dirname} \
${out_dirname} \
participant \
--participant-label ${participant} \
--output-resolution ${output_resolution} \
--fs-license-file ${mapped_fs_license_fname} \
--work_dir ${work_dirname} \
--skip_bids_validation"
Version:
0.19.0
Environment (Docker, Singularity / Apptainer, custom installation):
Singularity
Data formatted according to a validatable standard? Please provide the output of the validator:
PASTE VALIDATOR OUTPUT HERE
Relevant log outputs (up to 20 lines):
The log file contained the following last lines
(...)
250108-18:27:15,268 nipype.workflow INFO:
[Node] Finished "split_eddy_lps", elapsed time 113.47129s.
250108-18:27:16,206 nipype.workflow INFO:
[Node] Setting-up "qsiprep_wf.single_subject_ID_wf.dwi_finalize_acq_dir99_wf.transform_dwis_t1.get_interpolation" in "/workdir/qsiprep_wf/single_subject_ID_wf/dwi_finalize_acq_dir99_wf/transform_dwis_t1/get_interpolation".
250108-18:27:16,436 nipype.workflow INFO:
[Node] Executing "get_interpolation" <qsiprep.interfaces.images.ChooseInterpolator>
[1. 1. 1.] [1.3499999 1.3499999 1.3499999]
250108-18:27:16,548 nipype.interface WARNING:
Using BSpline interpolation for upsampling
250108-18:27:16,549 nipype.workflow INFO:
[Node] Finished "get_interpolation", elapsed time 0.109116s.