Fmriprep Hanging on BrokenProcessPool

Summary of what happened:

As part of my research internship, I am reanalyzing an old fMRI dataset using fMRIPrep on a system with an M2 Max chip and 32 GB of RAM. While I managed to run the pipeline initially, incorporating Susceptibility Distortion Correction (SDC) led to persistent issues, specifically the BrokenProcessPool error. Fmriprep does not crash after that, it hangs. CPU usage also drops to almost none. I encounter the same problem when running with versions 24.0.0 or 23.2.3.

Initially, I successfully ran the fMRIPrep pipeline (but without Susceptibility Distortion Correction). However, after specifying the B0FieldIdentifiers (I removed them in this run but the problem remains if included) and IntendedFor fields for my field maps and adding the TotalReadoutTime and EffectiveEchoSpacing to the metadata JSON files, I encountered the BrokenProcessPool error. Since I am new to fmriprep and fmri analysis in general, it is possible that these parameters are incorrectly set, so I am open to suggestions. I also incorporated a nypipe.yaml file:

plugin: LegacyMultiProc
plugin_args: 
  maxtasksperchild: 1
  memory_gb: 32
  n_procs: 12
  raise_insufficient: false

The code with which I calculated TRT and effective echo spacing is provided below.

What seems to be the problem? Could it be in any way related to the TRT or IntendedFor parameters, or is it something else I am missing?

Command used (and if a helper script was used, a link to the helper script or the command generated):

docker run --rm -ti \
    --platform linux/amd64 \
    --cpus 12 \
    -m 30g \
    -v /Users/urbansirca/PycharmProjects/fMRI_project/fmri_dataset_single_subject:/data:ro \
    -v /Users/urbansirca/PycharmProjects/fMRI_project/output:/out \
    -v /Users/urbansirca/PycharmProjects/fMRI_project/work:/work \
    -v /Users/urbansirca/PycharmProjects/fMRI_project/freesurfer_license.txt:/opt/freesurfer/license.txt \
    nipreps/fmriprep /data /out participant \
    --participant-label 01 \
    --work-dir /work \
    --fs-license-file /opt/freesurfer/license.txt \
    --skull-strip-t1w skip \
    --output-spaces MNI152NLin2009cAsym:res-2 \

Version:

24.0.1

Environment (Docker, Singularity / Apptainer, custom installation):

Docker 4.31.0

Data formatted according to a validatable standard? Please provide the output of the validator:

bids-validator@1.14.6
(node:9) Warning: Closing directory handle on garbage collection
(Use `node --trace-warnings ...` to show where the warning was created)
This dataset appears to be BIDS compatible.
        Summary:                  Available Tasks:        Available Modalities: 
        702 Files, 12.04GB        localizer               MRI                   
        14 - Subjects             yesno                                         
        3 - Sessions                  

Relevant log outputs (up to 20 lines):

concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
exception calling callback for <Future at 0x7fffe190c890 state=finished raised BrokenProcessPool>
Traceback (most recent call last):
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
             ^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
             ^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
             ^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
             ^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
             ^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
             ^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
             ^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
             ^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Screenshots / relevant information:

def calculate_total_readout_time(file):
    with open(file) as f:
        data = json.load(f)

    # print(data.keys())
    WaterFatShift = data["WaterFatShift"]
    EPI_Factor = data["EpiFactor"] # effective train length
    sense_factor = data["SenseFactor"]
    EchoTime = data["EchoTime"] # 27.63 ms (0.02763)
    effective_number_of_echoes = EPI_Factor / sense_factor # 11.67
    # round to the nearest whole number
    effective_number_of_echoes_rounded = round(effective_number_of_echoes) # 12

    ETL = EPI_Factor + 1 # echo train length

    echo_spacing = WaterFatShift / (434.215*(ETL/sense_factor))
    echo_spacing = round(echo_spacing, 8)
    print(f"Echo spacing: {echo_spacing}")

    total_readout_time = echo_spacing * EPI_Factor # 0.08216355
    # EffectiveEchoSpacing = 0.005
    # ReconMatrixPE = 127 # y-dimension of the image
    # total_readout_time = EffectiveEchoSpacing * (ReconMatrixPE - 1)

    total_readout_time = round(total_readout_time, 8)
    print(f"Total readout time: {total_readout_time}")

    return echo_spacing, total_readout_time

task_yesno_bold.json

{
    "EchoTime": 0.02763,
    "EpiFactor": 35,
    "SenseFactor": 3,
    "PhaseEncodingDirection": "j",
    "SliceEncodingDirection": "k",
    "RepetitionTime": 2.0,
    "MultiBandFactor": 1,
    "NumberDummyScans": 8,
    "PhysiologySampleRate": 496,
    "WaterFatShift": 12.232,
    "SliceTiming": [
        0.0,
        0.05714286,
        0.11428571,
        0.17142857,
        0.22857143,
        0.28571429,
        0.34285714,
        0.4,
        0.45714286,
        0.51428571,
        0.57142857,
        0.62857143,
        0.68571429,
        0.74285714,
        0.8,
        0.85714286,
        0.91428571,
        0.97142857,
        1.02857143,
        1.08571429,
        1.14285714,
        1.2,
        1.25714286,
        1.31428571,
        1.37142857,
        1.42857143,
        1.48571429,
        1.54285714,
        1.6,
        1.65714286,
        1.71428571,
        1.77142857,
        1.82857143,
        1.88571429,
        1.94285714
    ],
    "SliceOrder": "ascending",
    "TaskName": "yesno",
    "EffectiveEchoSpacing": 0.00234753,
    "TotalReadoutTime": 0.08216355
}

sub-01_ses-01_phasediff.json

{
    "EchoTime1": 0.003,
    "EchoTime2": 0.008,
    "IntendedFor": [
        "ses-01/func/sub-01_ses-01_task-yesno_run-2_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-1_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-localizer_run-1_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-4_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-3_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-6_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-5_bold.nii.gz"
    ]
}

sub-01_ses-01_magnitude1.json

{
    "IntendedFor": [
        "ses-01/func/sub-01_ses-01_task-yesno_run-2_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-1_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-localizer_run-1_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-4_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-3_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-6_bold.nii.gz",
        "ses-01/func/sub-01_ses-01_task-yesno_run-5_bold.nii.gz"
    ]
}

BrokenProcessPool means that a subprocess was killed by the OS in a way that Python is unable to recover from, typically from a memory allocation request that could not be fulfilled. We are limited by our tools here, and have spent a great deal of time trying to reduce the frequency that this error is encountered, but I do not think it can be eliminated in principle.

You may want to try --nthreads 1 --omp-nthreads 8. It will continue to multithread jobs that are inherently multithreaded, but will disable nipype’s internal multiprocess scheduling. If that fails, then there is a fundamental problem in some node, and I would guess it’s related to the M2 architecture. At the very least, you should see exactly what node is failing. Try adding -vv to increase the logging verbosity as well.

Thank you for the quick response. I tried to use --nthreads 1 --omp-nthreads 8.

Here is the new command:

docker run -it --rm \
  --platform linux/amd64 \
  -v /Users/urbansirca/PycharmProjects/fMRI_project/freesurfer_license.txt:/opt/freesurfer/license.txt:ro \
  -v /Users/urbansirca/PycharmProjects/fMRI_project/bids_dataset:/data:ro \
  -v /Users/urbansirca/PycharmProjects/fMRI_project/output_sub1_v5/output:/out \
  -v /Users/urbansirca/PycharmProjects/fMRI_project/output_sub1_v5/work:/work \
  -v /Users/urbansirca/PycharmProjects/fMRI_project/nipype.yaml:/nipype.yaml:ro \
  nipreps/fmriprep:24.0.1 /data /out participant \
  --participant-label 01 \
  --fs-license-file /opt/freesurfer/license.txt \
  --skull-strip-t1w skip \
  --work-dir /work \
  --verbose \
  --nthreads 1 \
  --omp-nthreads 8 \
  --mem_mb 32000 \
  -vv \

However I get a new error now. Could it be a problem with freesurfer and/or what new debugging steps should I try? Thank you.

nipype.workflow CRITICAL:
	 fMRIPrep failed: Traceback (most recent call last):
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node _autorecon_surfs1.

Cmdline:
	recon-all -autorecon-hemi rh -noparcstats -noparcstats2 -noparcstats3 -nohyporelabel -nobalabels -rh-only -openmp 8 -subjid sub-01 -sd /out/sourcedata/freesurfer 
Stdout:
	fs-check-version --s sub-01 --o /tmp/tmp.bX9JSN
	Thu Jul 18 20:01:09 UTC 2024

	setenv SUBJECTS_DIR /out/sourcedata/freesurfer
	cd /work/fmriprep_24_0_wf/sub_01_wf/anat_fit_wf/surface_recon_wf/autorecon_resume_wf/autorecon_surfs/mapflow/_autorecon_surfs1
	/opt/freesurfer/bin/fs-check-version --s sub-01 --o /tmp/tmp.bX9JSN
	-rwxrwxr-x 1 root root 18565 Aug  4  2022 /opt/freesurfer/bin/fs-check-version

	freesurfer-linux-ubuntu22_x86_64-7.3.2-20220804-6354275
	$Id$
	Linux 5916bd41db18 6.6.31-linuxkit #1 SMP Thu May 23 08:36:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
	pid 4910
	Current FS Version freesurfer-linux-ubuntu22_x86_64-7.3.2-20220804-6354275
	bstampfile exists /out/sourcedata/freesurfer/sub-01/scripts/build-stamp.txt
	Subject FS Version: freesurfer-linux-ubuntu22_x86_64-7.3.2-20220804-6354275
	No constraints on version because REQ=UnSet and FsVerFile=NotThere
	#@#% fs-check-version match = 1
	fs-check-version Done
	INFO: SUBJECTS_DIR is /out/sourcedata/freesurfer
	Actual FREESURFER_HOME /opt/freesurfer
	Linux 5916bd41db18 6.6.31-linuxkit #1 SMP Thu May 23 08:36:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
	/out/sourcedata/freesurfer/sub-01/mri/transforms /out/sourcedata/freesurfer/sub-01 
	/out/sourcedata/freesurfer/sub-01 
	#--------------------------------------------
	#@# Tessellate rh Thu Jul 18 20:01:13 UTC 2024
	/out/sourcedata/freesurfer/sub-01/scripts

	 mri_pretess ../mri/filled.mgz 127 ../mri/norm.mgz ../mri/filled-pretess127.mgz