QSIPrep Optimal Compute Resources

Hi there! I’m using qsiprep to preprocess diffusion imaging and was wondering if other people figured out optimal combination of resources to run it efficiently, in particular in using cluster resources.

For example, for a SLURM job scheduler on shared cluster, I’ve been using scripts like:

#SBATCH --nodes=1          # I don't think QSIPrep can handle several compute nodes at the same time
#SBATCH --ntasks=12        # number of threads/cores to use
#SBATCH --mem=40G          # RAM required
#SBATCH --time=24:00:00    # time before job is killed (important for the queue system prioritization)

...
# QSIPrep call
singularity run --cleanenv \
    -B /path/to/derivatives:/derivatives:ro \
    -B /path/to/derivatives:/out \
    -B ${WORKDIR}:${WORKDIR} \
    -B /local/path/to/freesurfer/license.txt:/opt/freesurfer/license.txt \
    /path/to/qsiprep-0.20.0.sif \
    /derivatives/nii /out participant \
    --fs-license-file /opt/freesurfer/license.txt \
    --output-resolution ${RESOLUTION} \
    --work-dir ${WORKDIR} \
    --nthreads ${SLURM_NTASKS:-1} \
    -v -v

Sometimes it makes use of most of these threads, but at other times it doesn’t… (if I understand the log below correctly)

240313-10:56:51,510 nipype.workflow INFO:
	 [MultiProc] Running 2 tasks, and 0 jobs ready. Free memory (GB): 226.14/226.54, Free processors: 10/12.
                     Currently running:
                       * qsiprep_wf.single_subject_test_wf.dwi_preproc_ses_03_acq_ABCDscannerCorrected_dir_AP_wf.pre_hmc_wf.merge_and_denoise_wf.dwi_denoise_ses_03_acq_ABCDscannerCorrected_dir_AP_dwi_wf.denoiser
                       * qsiprep_wf.single_subject_test_wf.anat_preproc_wf.output_grid_wf.deoblique_autobox
240313-11:51:07,545 nipype.interface INFO:
	 Generating denoising visual report

If submitting many jobs or if compute resources are scarce keeping nthreads low makes sense, but if a lot of resources are available or if we want to speed up a specific job, what’s the best way to leverage the resources?

Thanks!

2 Likes

Hi @pierre-nedelec and welcome to neurostars!

I think --cpus=12 would be more appropriate here? Maybe it doesn’t make a difference.

You should also specify memory in the qsiprep (e.g., --mem_mb 39500) command to doubly ensure qsiprep will not try to go above the SLURM-defined memory.

Some tasks are limited to only using low number of CPUs (either because they are simple and wouldn’t benefit from more or because otherwise they eat up too much memory). Besides these jobs, qsiprep should try to maximize the thread count to maximize efficiency.

BUT, for some computer reasons I do not fully understand, some jobs run better when there are 2^n cores (4, 8, 16). So, you can set --omp-nthreads 8 (for example) such that tasks with a lot of resources will get 8 regardless, even if more are available. This makes your resource intensive tasks at a 2^n value while leaving more resources for other tasks. I’ve not bench marked the performance, but empirically, it has helped me.

Best,
Steven

2 Likes