QSIPREP 0.18.1 stuck when preproccesing?

MagicLudo · August 10, 2023, 10:26am

Hi !

I’m new to using QSIPREP and I’m wondering about the different steps it does in the background…

I’ve been running the following command on a compute cluster for almost 10 hours, but I’ve got the feeling that the process is getting stuck somewhere because the output no longer changes… Also, various errors are appearing, some due to multi-threading and others I don’t know…

I also tested with --sloppy but it produces the same thing.

If any of you have experience of this workflow, I’d love to hear your opinion…

Command used:

SUB="sub-JEU26"

singularity run --cleanenv \
    -B ${HOME}/EcriPark_Code:/code,${EcriPark}:/data,/scratch/lcorcos/EcriPark_QSIPREP:/out,${FREESURFER_HOME}/license.txt,/scratch/lcorcos/Temp_QSIPREP:/tmp \
    ${HOME}/qsiprep-0.18.1.sif /data/ \
    /out/ participant --participant_label ${SUB} \
    --skip_bids_validation \
    --fs_license_file ${FREESURFER_HOME}/license.txt \
    --work_dir /tmp/ \
    --output_resolution 1 \
    --eddy_config /code/eddy_params.json \
    --verbose \
    --anat_modality T2w \
    --b0_threshold 50 \
    --dwi_denoise_window 5 \
    --denoise_method dwidenoise \
    --unringing_method mrdegibbs \
    --distortion_group_merge average \
    --bo_motion_corr_to iterative \
    --hmc_transform Affine \
    --hmc_model eddy \
    --skull_strip_template OASIS \
    --write_graph

Version:

qsiprep 0.18.1
freesurfer-linux-centos8_x86_64-7.4.1-20230613-7eb8460
singularity 1.1.6-1.el7

Hardware:

Dell PowerEdge C4140 (32 cores) Intel Xeon CPU 5218
380 Go RAM
NVIDIA Tesla V100

Relevant log outputs

Attached are various outputs for different subjects. This is the “shell” output of qsiprep
Also, the logs of the same two subjects but which are present in the qsiprep output folder.

You can access to the files here: 194.9 KB folder on MEGA

Thanks for your help

Steven · August 10, 2023, 12:51pm

Hi @MagicLudo,

Any chance you can update your version of Singularity/Apptainer or use a recent Docker version? And how much memory/CPUs are you devoting to the job? I am not sure if Singularity uses all available resources by default.

And one process that takes a long time and tends to use a lot of memory is SynthSeg, but in an upcoming release, SynthSeg will be forced to only use a single thread, which should reduce memory usage.

I also see you have errors for getting the template image. Does your machine have internet access?

Best,
Steven

jsein · August 10, 2023, 9:08pm

Hi @MagicLudo ,

Complementary comments to those from @Steven, but I am not sure it will help here:

It would be interesting to know the content of /code/eddy_params.json. When you have access to a GPU, eddy is much faster with the command eddy_cuda than eddy_openmp. In that case, your singularity command would also need --nv argument at the beginning when you have access to a node with GPU.

Other comments:

--anat_modality T2w is a new option, be careful by checking the result.
--output_resolution 1 will drastically upsample your processed dwi images and increase the data size. To begin with, you may stay close to the native dwi resolution.

MagicLudo · August 11, 2023, 12:12am

Hi @Steven

No, unfortunately, I’m not an administrator and I can’t update the various installed programs.

From what I’ve read about output, it seems that qsiprep has access to all available resources.

Normally, yes, the cluster has access to the Internet. But I changed --anat_modality T2w to --anat_modality T1w and the error in the template image disappeared.

Hi @jsein

The file eddy_params.json:

{
  "flm": "quadratic",
  "slm": "linear",
  "fep": false,
  "interp": "spline",
  "nvoxhp": 1000,
  "fudge_factor": 10,
  "dont_sep_offs_move": false,
  "dont_peas": false,
  "niter": 5,
  "method": "jac",
  "repol": true,
  "num_threads": 1,
  "is_shelled": true,
  "use_cuda": true,
  "cnr_maps": true,
  "residuals": true,
  "output_type": "NIFTI_GZ",
  "estimate_move_by_susceptibility": true,
  "mporder": 8,
  "slice_order": "/home/lcorcos/EcriPark_Code/slspec_EcriPark.txt",
  "args": "--ol_nstd=5"
}

I’m sorry, but I don’t see where I can put these commands (eddy_cuda or eddy_openmp)… In the eddy_params.json file, I paid attention to this line: "use_cuda": true

Thanks for the --nv !

After changing --anat_modality, the process goes further…

With regard to --output_resolution, it seems to me that the docs indicates that this is the resolution at which the data will be resampled after preprocessing, so it’s only applied at the very end of the pipeline, right?

I launched two other tests (which are still ongoing).
First test (12h of runtime):

singularity run --cleanenv \
 -B ${HOME}/EcriPark_Code:/code,${EcriPark}:/data,/scratch/lcorcos/EcriPark_QSIPREP:/out,${FREESURFER_HOME}/license.txt,/scratch/lcorcos/Temp_QSIPREP:/tmp \
    --nv ${HOME}/qsiprep-0.18.1.sif /data/ \
    /out/ participant --participant_label ${SUB} \
    --skip_bids_validation \
    --sloppy \
    --fs_license_file ${FREESURFER_HOME}/license.txt \
    --work_dir /tmp/ \
    --output_resolution 1 \
    --eddy_config /code/eddy_params.json \
    --anatomical-template MNI152NLin2009cAsym \
    --verbose \
    --anat_modality T1w \
    --b0_threshold 50 \
    --dwi_denoise_window 5 \
    --denoise_method dwidenoise \
    --unringing_method mrdegibbs \
    --distortion_group_merge average \
    --bo_motion_corr_to iterative \
    --hmc_transform Affine \
    --hmc_model eddy \
    --skull_strip_template OASIS \
    --write_graph

→ No crash file, but the last line of the log (enclosed) is from 10 hours ago:

230810-17:07:29,977 nipype.workflow INFO:
	 [MultiProc] Running 1 tasks, and 0 jobs ready. Free memory (GB): 338.68/338.88, Free processors: 31/32.
                     Currently running:
                       * qsiprep_wf.single_subject_JEU26_wf.sub_JEU26_ses_01_final_merge_wf.distortion_merger

So I tried to adapt to nthreads and omp_nthreads in test 2. I’ve also remove --sloppy

Second test (5h of runtime):

singularity run --cleanenv \
    -B ${HOME}/EcriPark_Code:/code,${EcriPark}:/data,/scratch/lcorcos/EcriPark_QSIPREP/TEST:/out,${FREESURFER_HOME}/license.txt,/scratch/lcorcos/Temp_QSIPREP:/tmp \
    --nv ${HOME}/qsiprep-0.18.1.sif /data/ \
    /out/ participant --participant_label ${SUB} \
    --skip_bids_validation \
    --nthreads 32 \
    --omp_nthreads 16 \
    --fs_license_file ${FREESURFER_HOME}/license.txt \
    --work_dir /tmp/ \
    --output_resolution 1 \
    --eddy_config /code/eddy_params.json \
    --anatomical-template MNI152NLin2009cAsym \
    --verbose \
    --anat_modality T1w \
    --b0_threshold 50 \
    --dwi_denoise_window 5 \
    --denoise_method dwidenoise \
    --unringing_method mrdegibbs \
    --distortion_group_merge average \
    --bo_motion_corr_to iterative \
    --hmc_transform Affine \
    --hmc_model eddy \
    --skull_strip_template OASIS \
    --skull_strip_fixed_seed \
    --write_graph

→ crash file for synthseg

log_test_1.txt (605.7 KB)

log_test_2.txt (82.1 KB)
crash_test_2.txt (3.3 KB)

I will continue to do more tests…

Thanks for your help!

jsein · August 11, 2023, 7:03am

It is exactly what I meant: you are indeed using eddy_cuda , since you use the option: "use_cuda": true. All good there.

Yes, but I had the feeling while looking at the logs that the resampling was not finishing and I was wondering if too much memory was asked for this step with the upsampling to 1mm…

Good luck with your tests!

MagicLudo · August 11, 2023, 11:39pm

Last update, I had made further tests by modifying various optimization parameters, both for QSIPREP and for SLURM without effects… I got crashes mostly for synthseg and eddy…

In the end, I reinstalled QSIPREP and FSL in my home directory (on the cluster) using a GPU node. Since then, I’ve run 3 tests that have worked…

Final code I’m using:

#!/bin/bash

#SBATCH -J QSIPREP
#SBATCH -p volta
#SBATCH --gres=gpu:2
#SBATCH -A b356
#SBATCH -N 1
#SBATCH --mem=300gb
#SBATCH --cpus-per-task=16
#SBATCH -t 50:00:00
#SBATCH --output=/home/lcorcos/logs/QSIPREP/%j-stdout.txt
#SBATCH --error=/home/lcorcos/logs/QSIPREP/%j-stderr.txt
#SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT
#SBATCH --mail-user=ludovic.corcos@gmail.com

set -e

date

EcriPark="/scratch/lcorcos/EcriPark"
cd /home/lcorcos
source .bashrc

SUB="sub-JEU26"

singularity run --cleanenv \
    -B ${HOME}/EcriPark_Code:/code,${EcriPark}:/data,/scratch/lcorcos/EcriPark_QSIPREP/TESTV18:/out,/home/lcorcos/freesurfer/license.txt,/scratch/lcorcos/Temp_QSIPREP:/tmp \
    --nv ${HOME}/qsiprep-0.18.1.sif /data/ \
    /out/ participant --participant_label ${SUB} \
    --skip_bids_validation \
    --nthreads 24 \
    --omp_nthreads 12 \
    --fs_license_file /home/lcorcos/freesurfer/license.txt \
    --work_dir /tmp/ \
    --output_resolution 1.8 \
    --eddy_config /code/eddy_params.json \
    --anatomical-template MNI152NLin2009cAsym \
    --verbose \
    --anat_modality T1w \
    --b0_threshold 50 \
    --dwi_denoise_window 5 \
    --denoise_method dwidenoise \
    --unringing_method mrdegibbs \
    --distortion_group_merge average \
    --bo_motion_corr_to iterative \
    --hmc_transform Affine \
    --hmc_model eddy \
    --skull_strip_template OASIS \
    --skull_strip_fixed_seed \
    --write_graph

date

Steven · August 11, 2023, 11:53pm

Glad that it appears to work! They recently released 0.19.0 which implements the synthseg update I mentioned earlier.