Very long initialization time on HPC (Qsiprep)

taka2223 · October 23, 2023, 10:36am

Summary of what happened:

I want to use Qsiprep to preprocess HCP-A data. And I submitted jobs respectively using slurm and normal bash commands. But both of them took very long time for just initialization. Specifically, the bash version took about 40 min. What’s worse, the slurm version took almost 4 hours to start processing. For comparison, fmriprep will take about 20~30 min to start in slurm.

Command used (and if a helper script was used, a link to the helper script or the command generated):

The script I used:

#!/bin/bash
export SINGULARITYENV_FS_LICENSE=/work/users/t/d/freesurfer/license.txt
singularity run --cleanenv -B /work -B /proj /work/users/t/d/qsiprep.sif \
    /work/users/t/d/HCPA-bids /proj/projects/HCPA-qsi-derivatives \
    participant \
    --participant-label ${1} \
    --fs-license-file /work/users/t/d/freesurfer/license.txt \
    -w  /proj/projects/HCPA-qsi-work \
    --output-resolution 1.2 \
    -v \
    --skip-bids-validation \
    --nthreads 16 \
    --omp-nthreads 8 \
    --distortion-group-merge average \
    --recon_spec mrtrix_multishell_msmt_ACT-hsvs \
    --freesurfer-input /proj/projects/HCPA_bids_out/sourcedata/freesurfer \

The comman I used to submit job:

sbatch -n 1 --cpus-per-task 16 -t 2-00:00:00 --mem=64g --wrap="/work/users/t/d/run_qsiprep_hcpa_bash ${participant_label}" -o ${participant_label}.out -e ${participant_label}.err

Version:

0.19

Environment (Docker, Singularity, custom installation):

Singularity

Data formatted according to a validatable standard? Please provide the output of the validator:

Relevant log outputs (up to 20 lines):

231023-02:11:39,854 nipype.workflow INFO:
	 Running with omp_nthreads=8, nthreads=16
231023-02:11:39,914 nipype.workflow IMPORTANT:
	 
    Running qsiprep version 0.19.0:
      * BIDS dataset path: /work/users/t/d/HCPA-bids.
      * Participant list: ['HCAXXXXXX'].
      * Run identifier: 20231022-222834_ded2c545-80b2-4f4f-8780-b61aac324455.
    
231023-03:31:38,902 nipype.workflow INFO:
       Running nonlinear normalization to template

We can see that the time interval between startup and formally beginning operation is approximately one and a half hours. And actually before it printed the first line, INFO: Converting SIF file to temporary sandbox... had last about 2 hours. And I wonder if it’s because the resource required by Qsiprep is much more than fmriprep?

Steven · October 25, 2023, 4:02pm

Hi @taka2223,

The first thing these BIDS apps do is make a PyBIDS SQLITE object of your BIDS dataset. This takes a long time for big datasets. You can premake this and reuse it to save initialization time. Alternatively, what I like to do, is make temporary single-subject BIDS directories using shortcuts of the original data (such as to not use tons of storage).

Best,
Steven

taka2223 · October 26, 2023, 6:53am

Thank you very much! @Steven
So in my understanding it means that if one’s tasks do not need group analysis, it’s faster to maintain a bids for each single subject and will also save more storage. Am I right?

Steven · October 26, 2023, 12:37pm

Hi @taka2223,

You can do what I suggested and still do group analyses later. You can set your output directory to go back to the original BIDS location instead of the single-subject BIDS. You also might find it easier to premake the BIDS SQLITE object as described in my first answer.

Just to clarify, the single-subject BIDS organization is just a temporary setup for running BIDS apps. It will not save storage (especially if you are copying the data to the single-subejct BIDS dataset instead of making symlinks/shortcuts). Once you’re done with the preprocessing you can delete the temporary datasets.

Best,
Steven