Qsiprep might be stuck

Summary of what happened:

Hi experts
I’m Haodong Wang, a graduate student from department of psychology, Sun Yat-sen University. When I worked with the qsiprep docker, I found it seemed to get stuck after this step"nipype.workflow INFO:
[Node] Finished “t1_mask_to_b0” ", it stopped making new records for a long time after that. I wonder what goes wrong. I would appreciate it if you give me an reply!
Best wish!

Wang

Command used (and if a helper script was used, a link to the helper script or the command generated):

PASTE CODE HERE

Version:

Environment (Docker, Singularity / Apptainer, custom installation):

Data formatted according to a validatable standard? Please provide the output of the validator:

PASTE VALIDATOR OUTPUT HERE

Relevant log outputs (up to 20 lines):

PASTE LOG OUTPUT HERE

Screenshots / relevant information:


Hi @OWL,

For the future, please open software support issues under the Software Support category, which provides a post template that prompts you for important information. I have added the template in for you. You can edit your post to add the information in.

Beyond the information in the template, I am curious what kind of resources (memory and CPUs) you are giving to the job, and how long you waited.

Best,
Steven

Thanks for your reminder! @Steven
The script:

#!/bin/bash
sudo docker run -ti --rm \
    -v $HOME/input:/data \
    -v $HOME/dockerout:/output \
    -v $HOME/working:/work_dir \
    -v ${FREESURFER_HOME}/license.txt:/opt/freesurfer/license.txt \
    pennbbl/qsiprep:latest \
    /data /output participant \
    -w /work_dir \
    --output-resolution 1.2 \
    --fs-license-file /opt/freesurfer/license.txt

The qsiprep docker version: latest
The validator: BIDS vaildator 1.8.4

Thanks @OWL, but what about memory/cpu resources?

Best,
Steven

Hi @Steven ,
I reopened the docker for qsiprep, and processed only one subject (previously it was processing two subjects), and found that it is still stuck at a certain step (After this display the programme gets stuck and no longer displays anything new. I’m not sure exactly how long I’ve been stuck)::
240311-17:47:47,662 nipype.workflow INFO.
[Node] Finished “dwi_rpt”, elapsed time 23.483261s.
Two processes related to qsiprep were found with the following information.

PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20 0 25.4g 24.2g 5400 R 800.0 12.9 3868:05 eddy_open+
20 0 1134796 317328 101260 S 7.4 0.2 38:20.89 qsiprep

Hi @OWL,

What memory/cpus are you giving to the task?

Best,
Steven

I’m not quite sure what you mean…The process with command as eddy_open+ has a CPU% of 800.0 and a %MEM of 12.9; the process with command as qsiprep has a CPU% of 7.4 and a %MEM of 0.2.

Those are percentages of cpu and memory, I am wondering what the total amount of memory and cpus you are providing docker with, or what is available on your machine.

Best,
Steven

Thanks @Steven , the number of CPUs available on the machine is 36 and the total amount of memory available is 187GB.

Can you confirm in your docker settings that you’re giving docker enough memory to run the task? Also based on your task information it looks like eddy is the job taking a while, which is typical. How long have you tried waiting, and how big are your days (e.g., how many volumes)?

Thank you for your reply @Steven ! I’m not sure how much memory is needed for docker to run, can you recommend me? Also the data is from a subject obtained from the WU-Minn Human Connectome Project (Change format to BIDS) as shown below:






The size of the dwi file is 2.8G.
I’m not sure exactly how long the wait was, but it was at least a morning.

Ah, HCP is a lot of data, I would try to get at least 64GB if you can, and then adjust downward for other subjects if that’s too much.

Ok, thanks @Steven! And I’m curious that is it reasonable to wait a morning for the eddy step?

How long is a morning by your definition? HCP is multiple runs of dense images, so I would give it up to 8 hours, depending on how many CPU’s are being used.

Thanks ! This step took at least four hours to complete. After this step is complete, the terminal displays the following error message:

240312-04:57:35,989 nipype.workflow INFO:
	 [Node] Finished "eddy", elapsed time 40265.75546s.
240312-04:57:35,989 nipype.workflow WARNING:
	 Storing result file without outputs
240312-04:57:36,8 nipype.workflow WARNING:
	 [Node] Error on "qsiprep_wf.single_subject_100307_wf.dwi_preproc_wf.hmc_sdc_wf.eddy" (/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/eddy)
240312-04:57:36,664 nipype.workflow ERROR:
	 Node eddy failed to run on host 61d94d9a3711.

And the crash info is as follows:

Node: qsiprep_wf.single_subject_100307_wf.dwi_preproc_wf.hmc_sdc_wf.eddy
Working directory: /work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/eddy

Node inputs:

args = 
cnr_maps = True
dont_peas = False
dont_sep_offs_move = False
environ = {'FSLOUTPUTTYPE': 'NIFTI_GZ', 'OMP_NUM_THREADS': '8'}
estimate_move_by_susceptibility = <undefined>
fep = False
field = <undefined>
field_mat = <undefined>
flm = linear
fudge_factor = 10.0
fwhm = <undefined>
in_acqp = <undefined>
in_bval = <undefined>
in_bvec = <undefined>
in_file = <undefined>
in_index = <undefined>
in_mask = <undefined>
in_topup_fieldcoef = <undefined>
in_topup_movpar = <undefined>
initrand = <undefined>
interp = spline
is_shelled = True
json = <undefined>
mbs_ksp = <undefined>
mbs_lambda = <undefined>
mbs_niter = <undefined>
method = jac
mporder = <undefined>
multiband_factor = <undefined>
multiband_offset = <undefined>
niter = 5
num_threads = 8
nvoxhp = 1000
out_base = eddy_corrected
outlier_nstd = <undefined>
outlier_nvox = <undefined>
outlier_pos = <undefined>
outlier_sqr = <undefined>
outlier_type = <undefined>
output_type = NIFTI_GZ
repol = True
residuals = False
session = <undefined>
slice2vol_interp = <undefined>
slice2vol_lambda = <undefined>
slice2vol_niter = <undefined>
slice_order = <undefined>
slm = linear
use_cuda = False

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node eddy.

Cmdline:
	eddy_openmp  --cnr_maps --field=/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/topup/fieldmap_HZ --field_mat=/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/topup_to_eddy_reg/topup_reg_image_flirt.mat --flm=linear --ff=10.0 --acqp=/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/gather_inputs/eddy_acqp.txt --bvals=/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/pre_hmc_wf/rpe_concat/merge__merged.bval --bvecs=/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/pre_hmc_wf/rpe_concat/merge__merged.bvec --imain=/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/pre_hmc_wf/rpe_concat/merge__merged.nii.gz --index=/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/gather_inputs/eddy_index.txt --mask=/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/pre_eddy_b0_ref_wf/synthstrip_wf/mask_to_original_grid/topup_imain_corrected_avg_trans_mask_trans.nii.gz --interp=spline --data_is_shelled --resamp=jac --niter=5 --nvoxhp=1000 --out=/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/eddy/eddy_corrected --repol --slm=linear
Stdout:

Stderr:
	Killed
Traceback:
	Traceback (most recent call last):
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 453, in aggregate_outputs
	    setattr(outputs, key, val)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
	    value = super(File, self).validate(objekt, name, value, return_pathlike=True)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
	    self.error(objekt, name, str(value))
	  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/base_trait_handler.py", line 74, in error
	    raise TraitError(
	traits.trait_errors.TraitError: The 'out_corrected' trait of an ExtendedEddyOutputSpec instance must be a pathlike object or string representing an existing file, but a value of '/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/eddy/eddy_corrected.nii.gz' <class 'str'> was specified.

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 400, in run
	    outputs = self.aggregate_outputs(runtime)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 460, in aggregate_outputs
	    raise FileNotFoundError(msg)
	FileNotFoundError: No such file or directory '/work_dir/qsiprep_wf/single_subject_100307_wf/dwi_preproc_wf/hmc_sdc_wf/eddy/eddy_corrected.nii.gz' for output 'out_corrected' of a ExtendedEddy interface

May I ask what should be done to solve the current problem?

Hi @OWL,

The error message of Killed means that the job ran out of memory. Have you checked your docker settings to increase the amount of memory docker is allowed to use? If that doesn’t work you can use a —bids-filter-file to run subsets of data at a time, although that will get very tedious.

Best,
Steven

Thanks! I’ll try it.