Diagnosing NiPype crashes that only occur in Docker Containers

Summary of what happened:

I’m able to run a pipeline from start to finish locally using nypipe==1.8.6 and niworkflows==1.7.8, however as soon as I attempt to run the same code within a Docker container I begin to run into FileNotFound or FileExists errors. It doesn’t seem that running the container as root or as user has much effect, so what I’m trying to determine is what is different about the docker environment than my “local” machine.

For my next steps I’m going to try to track down what files are all being touched/created by the pipeline and then compare that across both a virtual machine/container and a native linux host. It might be pessimistic, but it always seems to be the case that containerizing this sort of of software (not just nipype or nipreps specifically) leads to this sort of mount/copy whack-a-mole when it’s done after the fact.

Anyway, let me know if you would rather this be moved to one of the linked issues below in the relevant information section.

Thanks.

Command used (and if a helper script was used, a link to the helper script or the command generated):

Below is the command that correctly runs locally:

python3 run.py --bids_dir $BIDS_DIR --output_dir $OUTPUT_DIR --wm --n_procs 6 --petprep_hmc

And here is the dockerized version of that command that leads into the entrypoint of python3 run.py

docker run --user=331122:1104 -a stderr -a stdout --rm -v /home/galassiae/Data/sharing/run_docker_on_this/sharing/test2:/bids_dir -v /home/galassiae/Data/sharing/run_docker_on_this/sharing/test2/derivatives/petprep_extract_tacs:/output_dir -v /home/galassiae/Projects/petprep_extract_tacs:/workdir -v /home/galassiae/Projects/petprep_extract_tacs:/petprep_extract_tacs -v /home/galassiae/freesurfer/license.txt:/opt/freesurfer/license.txt --platform linux/amd64 petprep_extract_tacs:latest --bids_dir /bids_dir --output_dir /output_dir --analysis_level participant --n_procs 1 --wm  --petprep_hmc  system_platform=Linux

Version:

nypipe==1.8.6
niworkflows==1.7.8

Environment (Docker, Singularity / Apptainer, custom installation):

Docker version 24.0.5, build ced0996
linux/x86_64 python:3.9

locally

python3.9, RHEL 8.7 x86

Relevant log outputs (up to 20 lines):

Traceback:
	Traceback (most recent call last):
	  File "/usr/local/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 397, in run
	    runtime = self._run_interface(runtime)
	  File "/usr/local/lib/python3.9/site-packages/nipype/interfaces/utility/wrappers.py", line 142, in _run_interface
	    out = function_handle(**args)
	  File "<string>", line 4, in create_weighted_average_pet
	  File "/usr/local/lib/python3.9/site-packages/niworkflows/interfaces/bids.py", line 52, in <module>
	    import templateflow as tf
	  File "/usr/local/lib/python3.9/site-packages/templateflow/__init__.py", line 18, in <module>
	    from . import api
	  File "/usr/local/lib/python3.9/site-packages/templateflow/api.py", line 8, in <module>
	    from .conf import TF_LAYOUT, TF_S3_ROOT, TF_USE_DATALAD, requires_layout
	  File "/usr/local/lib/python3.9/site-packages/templateflow/conf/__init__.py", line 53, in <module>
	    _init_cache()
	  File "/usr/local/lib/python3.9/site-packages/templateflow/conf/__init__.py", line 50, in _init_cache
	    _update_s3(TF_HOME, local=True, overwrite=True)
	  File "/usr/local/lib/python3.9/site-packages/templateflow/conf/_s3.py", line 19, in update
	    retval = _update_skeleton(skel_file, dest, overwrite=overwrite, silent=silent)
	  File "/usr/local/lib/python3.9/site-packages/templateflow/conf/_s3.py", line 54, in _update_skeleton
	    zipref.extractall(str(dest))
	  File "/usr/local/lib/python3.9/zipfile.py", line 1642, in extractall
	    self._extract_member(zipinfo, path, pwd)
	  File "/usr/local/lib/python3.9/zipfile.py", line 1692, in _extract_member
	    os.mkdir(targetpath)
	FileExistsError: [Errno 17] File exists: '/.cache/templateflow/tpl-OASIS30ANTs'

Screenshots / relevant information:

I found this bug mentioned here → Unable to use custom template fMRIPrep 22.1.1 - #15 by effigies

Which seems to indicate it might be fixable by mounting a local version of fmriprep, but that’s not really super great if we want this to run as a stand alone image. And I’m a bit curious as to whether there’s simply a fix that might apply to this and this similar issue as referenced in a issue on nipreps/fmriprep here → FileExistsError · Issue #2505 · nipreps/fmriprep · GitHub


Hi @bendhouseart,

You did not mount /.cache/templateflow so Docker cannot find it.

You can try something like what you did, but adding -c "export TEMPLATEFLOW_HOME=/templateflow" and -v /.cache/templateflow:/templateflow to your Docker preamble. Please let me know if that solution is not clear to you.

Best,
Steven

Hi Steven,

Are you sure about that? It’s a file exists error, not a FileNotFoundError? Also, I would rather not have to mount a TemplateFlow folder, but instead include any necessary files in the image. Do you know what command I would use to collect all the templates normally stored in .cache/templateflow/?

Thanks,

Anthony

Hi @bendhouseart,

You can use something like datalad get -r /.cache/templateflow/* to download all the templatefiles, or replace that wildcard with whatever templates you need.

Best,
Steven

Alright…it’s not an issue of the templates not being there. This is indeed the same issue that was referenced in issue 2505 above.

When running nipype with n_procs=1 this doesn’t occur, for n_procs > 1 I can produced the same error message reliably.

So I guess that’s a start.

Okay, so import templateflow is not multiprocessing-safe, since it can lead to processes racing to populate the template skeleton.

I think the short-term fix is to add an import templateflow during workflow build time to ensure that it’s done before you start multiprocessing. The long-term fix is probably rethinking how templateflow starts up.

cc @oesteban

There might be a fix already for this @effigies, see FIX: Avoid directory clobber during zip extraction by mgxd · Pull Request #131 · templateflow/python-client · GitHub

I’m trying to test if it works on my end, but running into some pybids version conflicts. Presently failing to track down why the ~0.9.1 version of pybids keeps popping up despite me forking over to my own templateflow and niworkflows repos and specifying a much more recent version. But I digress.

Although I’ll take whatever someone is able to push through at either niworkflows or templateflow.

Tested PR to templateflow created by mgxd and confirmed the fix for this bug. For future users be sure to specify a version of templateflow greater or equal to one containing the following PR:

Note, as of this writing I don’t believe this change has been rolled into a release. This user is currently using the github url in his setup.py/requirements.