Issue with QSIprep templateflow on HPC host with no internet access

Summary of what happened

When attempting to run QSIPrep on an HPC setup without internet access, the process failed during the get_template_image node. The error occurred while trying to fetch a required template file from templateflow.s3.amazonaws.com. The absence of pre-cached templates or clear instructions on managing required files locally seems to be the root cause.

Command used

apptainer run --cleanenv --containall \
    -B ${DERIVATIVES_DIR_TMP}:/derivatives:ro,${QSIPREP_DIR_TMP}:/out,${WORK_DIR}:${WORK_DIR},${FS_LICENSE}:/opt/freesurfer/license.txt \
    -B $BIDS_FILTER_FILE:$BIDS_FILTER_FILE \
    /path/to/qsiprep_<VERSION>.sif \
    /derivatives/nii /out participant \
    --fs-license-file /opt/freesurfer/license.txt \
    --output-resolution 1.7 \
    --work-dir ${WORK_DIR} \
    --participant-label ${subjectID} \
    --separate-all-dwis \
    --skip-anat-based-spatial-normalization \
    --nthreads ${ncpus} \
    --omp-nthreads $((ncpus - 2)) \
    --bids-filter-file ${BIDS_FILTER_FILE} \
    --stop-on-first-crash \
    -v -v

Version

QSIPrep version 1.0.0rc2.dev0+g789be41.d20241119

Environment (Docker, Singularity / Apptainer, custom installation)

Apptainer container running on an HPC system with no internet access.

Data formatted according to a validatable standard? Please provide the output of the validator

bids-validator@1.8.4

1: [WARN] The recommended file /README is missing. (code: 101)
2: [WARN] The Authors field of dataset_description.json should contain an array of fields. (code: 102)

Relevant log outputs

241220-13:55:48,228 nipype.workflow ERROR:
    Node get_template_image failed to run on host XXX

Traceback:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='templateflow.s3.amazonaws.com', port=443): Max retries exceeded with url: /tpl-MNI152NLin2009cAsym/tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at ...>: Failed to resolve 'templateflow.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)"))

Screenshots / relevant information:

Problem:
- The HPC environment lacks internet access, causing QSIPrep to fail when downloading necessary template files.
- No clear documentation on pre-fetching templates or caching them locally.

Suggestions:
1. Enhance documentation to include:
- A list of all required templates/files for specific QSIPrep workflows.
- Instructions on pre-fetching and configuring the TEMPLATEFLOW_HOME directory for offline use.
2. Allow QSIPrep to gracefully handle missing templates by providing actionable error messages.
3. Option to initialize QSIPrep locally and pre-download necessary files to cache them for offline execution.

Example path for desired caching:

$HOME/.cache/templateflow/

This would enable running jobs on an internet-restricted HPC without interruptions.
What do people think?

Happy holiday season!

Hi @pierre-nedelec,

Fmriprep already has some good documentation on this (FAQ - Frequently Asked Questions ā€” fmriprep version documentation) and thereā€™s been a few threads here with those steps too. If you are still struggling after taking those steps, let us know.

I suppose we could add links to that fmriprep documentation page in the qsiprep documentation, but any PRs updating docs are always appreciated!

Best,
Steven

Hi, thanks Steven for pointing to these fmriprep docs. Happy to help update QSIprep docs with a PR once I figure this out!

A couple things Iā€™ve understood, and current point of failure:

  • running the apptainer command with --cleanenv --containall (or any of these two) removes the $TEMPLATEFLOW_HOME from the container env variables.
  • I managed to follow the steps in that link to download what I need to a $TEMPLATEFLOW_HOME directory. I passed this to the apptainer with:
apptainer exec --cleanenv --containall \
  -B ${TEMPLATEFLOW_HOME}:${TEMPLATEFLOW_HOME} \
  --env "TEMPLATEFLOW_HOME=$TEMPLATEFLOW_HOME" \
  path/to/qsiprep_${VERSION}.sif

I checked that the container knows about that env variable, and could list files within it. However, when running the run qsiprep command, it still fails. I notice a few issues:

241223-11:51:37,746 nipype.workflow VERBOSE:
	 QSIPrep config:
		[execution]
		templateflow_home = "my_home_dir (-> $HOME)"
  1. Itā€™s different than $TEMPLATEFLOW_HOME, and it doesnā€™t tell me exactly where itā€™s looking for the data (~/.cache/templateflow, just ~/.templateflow ā€“ guessing the latter but unclear from logs). How is qsiprep filling in that variable if not from env variables?
  2. Even if I change my config and download templateflow data to ~/.cache/templateflow, bind that folder -B $HOME/.cache/templateflow:$HOME/.cache/templateflow, it seems qsiprep/nipype still tries to download stuff. And I canā€™t see in the logs whether itā€™s even trying to access cached data! See full log below.
QSIprep log for templateflow
apptainer run --cleanenv --containall \
  -B ${TEMPLATEFLOW_HOME}:${TEMPLATEFLOW_HOME} \
  -B $HOME/.templateflow:$HOME/.templateflow \
  -B $HOME/.cache/templateflow:$HOME/.cache/templateflow \
  --env "TEMPLATEFLOW_HOME=$TEMPLATEFLOW_HOME" \
  path/to/qsiprep_${VERSION}.sif \

(trying to bind everything I can hereā€¦)

Logs:

241223-12:13:24,809 nipype.workflow INFO:
	 QSIPrep workflow graph with 250 nodes built successfully.
241223-12:13:43,243 nipype.workflow VERBOSE:
	 QSIPrep config:
		[environment]
		cpu_count = 56
		exec_env = "apptainer"
		free_mem = 225.0
		overcommit_policy = "heuristic"
		overcommit_limit = "50%"
		nipype_version = "1.9.1"
		templateflow_version = "23.1.0"
		version = "1.0.0rc2.dev0+g789be41.d20241119"
		
		[execution]
		bids_dir = "/derivatives/nii"
		bids_database_dir = "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/20241223-121258_74ad198e-cf28-4219-852b-55a08adc8362/bids_db"
		bids_description_hash = "420d026fd3cc6788dd08e5ad18db0019ceda974786fb657b87a07097520b778d"
		boilerplate_only = false
		sloppy = false
		debug = []
		layout = "BIDS Layout: .../derivatives/nii | Subjects: 1 | Sessions: 1 | Runs: 0"
		log_dir = "/out/logs"
		log_level = 15
		low_mem = false
		notrack = true
		output_dir = "/out"
		reports_only = false
		run_uuid = "20241223-121258_74ad198e-cf28-4219-852b-55a08adc8362"
		participant_label = [ "02",]
		processing_list = [ "02:01",]
		skip_anat_based_spatial_normalization = true
		templateflow_home = "$HOME"
		work_dir = "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01"
		write_graph = false
		
		[workflow]
		anat_modality = "T1w"
		anat_only = false
		anatomical_template = "MNI152NLin2009cAsym"
		b0_threshold = 100
		b0_motion_corr_to = "iterative"
		b0_to_t1w_transform = "Rigid"
		b1_biascorrect_stage = "final"
		denoise_after_combining = false
		denoise_method = "dwidenoise"
		distortion_group_merge = "none"
		dwi_denoise_window = "auto"
		dwi_no_biascorr = false
		dwi_only = false
		fmap_bspline = false
		force_syn = false
		hmc_model = "eddy"
		hmc_transform = "Affine"
		ignore = []
		infant = false
		intramodal_template_iters = 0
		intramodal_template_transform = "BSplineSyN"
		subject_anatomical_reference = "first-alphabetically"
		longitudinal = false
		no_b0_harmonization = false
		output_resolution = 1.7
		pepolar_method = "TOPUP"
		separate_all_dwis = true
		shoreline_iters = 2
		use_syn_sdc = false
		spaces = "MNI152NLin2009cAsym"
		
		[nipype]
		crashfile_format = "txt"
		get_linked_libs = false
		nprocs = 8
		omp_nthreads = 6
		plugin = "MultiProc"
		remove_unnecessary_outputs = true
		resource_monitor = false
		stop_on_first_crash = true
		
		[seeds]
		master = 2797
		ants = 2218
		numpy = 42410
		
		[execution.derivatives]
		
		[execution.dataset_links]
		raw = "/derivatives/nii"
		templateflow = "$HOME"
		
		[nipype.plugin_args]
		maxtasksperchild = 1
		raise_insufficient = false
		
		[execution.bids_filters.t1w]
		acquisition = "precontrast"
241223-12:13:43,244 nipype.workflow IMPORTANT:
	 QSIPrep started!
241223-12:13:43,381 nipype.workflow INFO:
	 Workflow qsiprep_1_0_wf settings: ['check', 'execution', 'logging', 'monitoring']
241223-12:13:43,495 nipype.workflow INFO:
	 Running in parallel.
241223-12:13:43,508 nipype.workflow INFO:
	 [MultiProc] Running 0 tasks, and 8 jobs ready. Free memory (GB): 226.59/226.59, Free processors: 8/8.
241223-12:13:43,644 nipype.workflow INFO:
	 [Node] Setting-up "qsiprep_1_0_wf.sub_02_ses_01_wf.about" in "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/qsiprep_1_0_wf/sub_02_ses_01_wf/about".
241223-12:13:43,646 nipype.workflow INFO:
	 [Node] Executing "about" <qsiprep.interfaces.reports.AboutSummary>
241223-12:13:43,648 nipype.workflow INFO:
	 [Node] Finished "about", elapsed time 0.000269s.
241223-12:13:43,651 nipype.workflow INFO:
	 [Job 1] Completed (qsiprep_1_0_wf.sub_02_ses_01_wf.about).
241223-12:13:45,507 nipype.workflow INFO:
	 [MultiProc] Running 7 tasks, and 0 jobs ready. Free memory (GB): 225.19/226.59, Free processors: 1/8.
                     Currently running:
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_finalize_ses_01_acq_ABCD_wf.merged_sidecar
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_preproc_ses_01_acq_ABCD_wf.pre_hmc_wf.merge_and_denoise_wf.bias_images
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_preproc_ses_01_acq_ABCD_wf.pre_hmc_wf.merge_and_denoise_wf.noise_images
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_preproc_ses_01_acq_ABCD_wf.pre_hmc_wf.merge_and_denoise_wf.conform_dwis01
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.anat_preproc_wf.output_grid_wf.voxel_size_chooser
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.anat_preproc_wf.get_template_image
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.bidssrc
241223-12:13:47,791 nipype.workflow INFO:
	 [Node] Setting-up "qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_preproc_ses_01_acq_ABCD_wf.pre_hmc_wf.merge_and_denoise_wf.noise_images" in "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/qsiprep_1_0_wf/sub_02_ses_01_wf/dwi_preproc_ses_01_acq_ABCD_wf/pre_hmc_wf/merge_and_denoise_wf/noise_images".
241223-12:13:47,794 nipype.workflow INFO:
	 [Node] Executing "noise_images" <nipype.interfaces.utility.base.Merge>
241223-12:13:47,799 nipype.workflow INFO:
	 [Node] Finished "noise_images", elapsed time 0.000234s.
241223-12:13:48,471 nipype.workflow INFO:
	 [Node] Setting-up "qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_preproc_ses_01_acq_ABCD_wf.pre_hmc_wf.merge_and_denoise_wf.bias_images" in "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/qsiprep_1_0_wf/sub_02_ses_01_wf/dwi_preproc_ses_01_acq_ABCD_wf/pre_hmc_wf/merge_and_denoise_wf/bias_images".
241223-12:13:48,474 nipype.workflow INFO:
	 [Node] Executing "bias_images" <nipype.interfaces.utility.base.Merge>
241223-12:13:48,475 nipype.workflow INFO:
	 [Node] Finished "bias_images", elapsed time 0.0002s.
241223-12:13:49,508 nipype.workflow INFO:
	 [Job 5] Completed (qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_preproc_ses_01_acq_ABCD_wf.pre_hmc_wf.merge_and_denoise_wf.noise_images).
241223-12:13:49,510 nipype.workflow INFO:
	 [Job 6] Completed (qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_preproc_ses_01_acq_ABCD_wf.pre_hmc_wf.merge_and_denoise_wf.bias_images).
241223-12:13:49,512 nipype.workflow INFO:
	 [MultiProc] Running 5 tasks, and 0 jobs ready. Free memory (GB): 225.59/226.59, Free processors: 3/8.
                     Currently running:
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_finalize_ses_01_acq_ABCD_wf.merged_sidecar
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_preproc_ses_01_acq_ABCD_wf.pre_hmc_wf.merge_and_denoise_wf.conform_dwis01
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.anat_preproc_wf.output_grid_wf.voxel_size_chooser
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.anat_preproc_wf.get_template_image
                       * qsiprep_1_0_wf.sub_02_ses_01_wf.bidssrc
241223-12:13:51,362 nipype.workflow INFO:
	 [Node] Setting-up "qsiprep_1_0_wf.sub_02_ses_01_wf.anat_preproc_wf.get_template_image" in "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/qsiprep_1_0_wf/sub_02_ses_01_wf/anat_preproc_wf/get_template_image".
241223-12:13:51,365 nipype.workflow INFO:
	 [Node] Executing "get_template_image" <qsiprep.interfaces.anatomical.GetTemplate>
Downloading https://templateflow.s3.amazonaws.com/tpl-MNI152NLin2009cAsym/tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz
241223-12:13:51,409 nipype.workflow INFO:
	 [Node] Finished "get_template_image", elapsed time 0.026386s.
241223-12:13:51,409 nipype.workflow WARNING:
	 Storing result file without outputs
241223-12:13:51,411 nipype.workflow WARNING:
	 [Node] Error on "qsiprep_1_0_wf.sub_02_ses_01_wf.anat_preproc_wf.get_template_image" (/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/qsiprep_1_0_wf/sub_02_ses_01_wf/anat_preproc_wf/get_template_image)
241223-12:13:51,440 nipype.workflow INFO:
	 [Node] Setting-up "qsiprep_1_0_wf.sub_02_ses_01_wf.anat_preproc_wf.output_grid_wf.voxel_size_chooser" in "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/qsiprep_1_0_wf/sub_02_ses_01_wf/anat_preproc_wf/output_grid_wf/voxel_size_chooser".
241223-12:13:51,442 nipype.workflow INFO:
	 [Node] Executing "voxel_size_chooser" <qsiprep.interfaces.anatomical.VoxelSizeChooser>
241223-12:13:51,448 nipype.workflow INFO:
	 [Node] Finished "voxel_size_chooser", elapsed time 0.000165s.
241223-12:13:51,510 nipype.workflow ERROR:
	 Node get_template_image failed to run on host XXX.
241223-12:13:51,512 nipype.workflow ERROR:
	 Saving crash info to /out/sub-02/log/20241223-121258_74ad198e-cf28-4219-852b-55a08adc8362/crash-20241223-121351-$USER-get_template_image-b83f54ef-2c74-46c8-99fa-7982e3cbe451.txt
Traceback (most recent call last):
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 66, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 525, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 643, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 769, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node get_template_image.

Traceback:
	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connection.py", line 196, in _new_conn
	    sock = connection.create_connection(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
	    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/socket.py", line 955, in getaddrinfo
	    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
	socket.gaierror: [Errno -3] Temporary failure in name resolution

	The above exception was the direct cause of the following exception:

	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
	    response = self._make_request(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request
	    raise new_e
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
	    self._validate_conn(conn)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
	    conn.connect()
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connection.py", line 615, in connect
	    self.sock = sock = self._new_conn()
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connection.py", line 203, in _new_conn
	    raise NameResolutionError(self.host, self, e) from e
	urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x14c5954f08e0>: Failed to resolve 'templateflow.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)

	The above exception was the direct cause of the following exception:

	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
	    resp = conn.urlopen(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
	    retries = retries.increment(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment
	    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
	urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='templateflow.s3.amazonaws.com', port=443): Max retries exceeded with url: /tpl-MNI152NLin2009cAsym/tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x14c5954f08e0>: Failed to resolve 'templateflow.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)"))

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 401, in run
	    runtime = self._run_interface(runtime)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsiprep/interfaces/anatomical.py", line 228, in _run_interface
	    template_path = get_template(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/templateflow/conf/__init__.py", line 69, in wrapper
	    return func(*args, **kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/templateflow/api.py", line 145, in get
	    _s3_get(filepath)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/templateflow/api.py", line 299, in _s3_get
	    r = requests.get(url, stream=True)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/api.py", line 73, in get
	    return request("get", url, params=params, **kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/api.py", line 59, in request
	    return session.request(method=method, url=url, **kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
	    resp = self.send(prep, **send_kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
	    r = adapter.send(request, **kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
	    raise ConnectionError(e, request=request)
	requests.exceptions.ConnectionError: HTTPSConnectionPool(host='templateflow.s3.amazonaws.com', port=443): Max retries exceeded with url: /tpl-MNI152NLin2009cAsym/tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x14c5954f08e0>: Failed to resolve 'templateflow.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)"))


241223-12:13:51,513 nipype.workflow CRITICAL:
	 QSIPrep failed: Traceback (most recent call last):
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 66, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 525, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 643, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 769, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node get_template_image.

Traceback:
	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connection.py", line 196, in _new_conn
	    sock = connection.create_connection(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
	    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/socket.py", line 955, in getaddrinfo
	    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
	socket.gaierror: [Errno -3] Temporary failure in name resolution

	The above exception was the direct cause of the following exception:

	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
	    response = self._make_request(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request
	    raise new_e
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
	    self._validate_conn(conn)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
	    conn.connect()
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connection.py", line 615, in connect
	    self.sock = sock = self._new_conn()
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connection.py", line 203, in _new_conn
	    raise NameResolutionError(self.host, self, e) from e
	urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x14c5954f08e0>: Failed to resolve 'templateflow.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)

	The above exception was the direct cause of the following exception:

	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
	    resp = conn.urlopen(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
	    retries = retries.increment(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment
	    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
	urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='templateflow.s3.amazonaws.com', port=443): Max retries exceeded with url: /tpl-MNI152NLin2009cAsym/tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x14c5954f08e0>: Failed to resolve 'templateflow.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)"))

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 401, in run
	    runtime = self._run_interface(runtime)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsiprep/interfaces/anatomical.py", line 228, in _run_interface
	    template_path = get_template(
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/templateflow/conf/__init__.py", line 69, in wrapper
	    return func(*args, **kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/templateflow/api.py", line 145, in get
	    _s3_get(filepath)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/templateflow/api.py", line 299, in _s3_get
	    r = requests.get(url, stream=True)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/api.py", line 73, in get
	    return request("get", url, params=params, **kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/api.py", line 59, in request
	    return session.request(method=method, url=url, **kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
	    resp = self.send(prep, **send_kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
	    r = adapter.send(request, **kwargs)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
	    raise ConnectionError(e, request=request)
	requests.exceptions.ConnectionError: HTTPSConnectionPool(host='templateflow.s3.amazonaws.com', port=443): Max retries exceeded with url: /tpl-MNI152NLin2009cAsym/tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x14c5954f08e0>: Failed to resolve 'templateflow.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)"))


241223-12:13:51,591 nipype.workflow INFO:
	 [Node] Setting-up "qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_preproc_ses_01_acq_ABCD_wf.pre_hmc_wf.merge_and_denoise_wf.conform_dwis01" in "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/qsiprep_1_0_wf/sub_02_ses_01_wf/dwi_preproc_ses_01_acq_ABCD_wf/pre_hmc_wf/merge_and_denoise_wf/conform_dwis01".
241223-12:13:51,593 nipype.workflow INFO:
	 [Node] Setting-up "qsiprep_1_0_wf.sub_02_ses_01_wf.bidssrc" in "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/qsiprep_1_0_wf/sub_02_ses_01_wf/bidssrc".
241223-12:13:51,595 nipype.workflow INFO:
	 [Node] Executing "conform_dwis01" <qsiprep.interfaces.images.ConformDwi>
241223-12:13:51,596 nipype.workflow INFO:
	 [Node] Executing "bidssrc" <qsiprep.interfaces.bids.BIDSDataGrabber>
241223-12:13:51,601 nipype.interface WARNING:
	 No 'flair' images found for sub-<undefined>
241223-12:13:51,601 nipype.interface WARNING:
	 No 'sbref' images found for sub-<undefined>
241223-12:13:51,602 nipype.interface WARNING:
	 No 'roi' images found for sub-<undefined>
241223-12:13:51,604 nipype.workflow INFO:
	 [Node] Finished "bidssrc", elapsed time 0.00246s.
241223-12:13:52,318 nipype.workflow INFO:
	 [Node] Setting-up "qsiprep_1_0_wf.sub_02_ses_01_wf.dwi_finalize_ses_01_acq_ABCD_wf.merged_sidecar" in "/scratch/1318934.1.member.q/qsiprep_sub-02_ses-01/qsiprep_1_0_wf/sub_02_ses_01_wf/dwi_finalize_ses_01_acq_ABCD_wf/merged_sidecar".
241223-12:13:52,325 nipype.workflow INFO:
	 [Node] Executing "merged_sidecar" <qsiprep.interfaces.bids.DerivativesSidecar>
241223-12:13:52,328 nipype.workflow INFO:
	 [Node] Finished "merged_sidecar", elapsed time 0.001117s.
241223-12:14:02,461 nipype.interface INFO:
	 Not applying reorientation to /derivatives/nii/sub-02/ses-01/dwi/sub-02_ses-01_acq-ABCD_dwi.nii.gz: already in LAS
241223-12:14:02,462 nipype.workflow INFO:
	 [Node] Finished "conform_dwis01", elapsed time 10.861479s.
qsiprep failed. Exiting...

Any further insights would be really appreciated! also I think this issue appeared with a relatively recent version of qsiprep as I used to run stuff without issues.

Hi @pierre-nedelec

You should be setting APPTAINERENV_TEMPLATEFLOW_HOME for it to be carried into the container. But the --env "TEMPLATEFLOW_HOME=$TEMPLATEFLOW_HOME" should accomplish the same goal.

You have to make sure you download the files into templateflow (with datalad get) before qsiprep. You can see here (qsiprep/qsiprep/interfaces/anatomical.py at master Ā· PennLINC/qsiprep Ā· GitHub) that the files you need are the mni152nlin2009casym T1w image and brain mask (resolution index of 1), assuming you are using a T1w image as the anatomical reference.

Templateflow was introduced in a more recent update for consistency with other Nipreps software.

Best,
Steven

Thanks for the prompt reply!

I had also tried the APPTAINERENV_TEMPLATEFLOW_HOME (as per this), but with no success either.

And regarding files needed by qsiprep, Iā€™ve run it on a development node that does have internet access on the same cluster, so my $HOME/.cache/templateflow should have everything in itā€¦

āÆ ls ~/.cache/templateflow
tpl-Fischer344  tpl-MNI152NLin2009aAsym  tpl-MNI152NLin2009cAsym  tpl-MNI305            tpl-MouseIn      tpl-onavg      tpl-VALiDATe29
tpl-fsaverage   tpl-MNI152NLin2009aSym   tpl-MNI152NLin2009cSym   tpl-MNIColin27        tpl-NKI          tpl-PNC        tpl-WHS
tpl-fsLR        tpl-MNI152NLin2009bAsym  tpl-MNI152NLin6Asym      tpl-MNIInfant         tpl-NMT31Sym     tpl-RESILIENT
tpl-MNI152Lin   tpl-MNI152NLin2009bSym   tpl-MNI152NLin6Sym       tpl-MNIPediatricAsym  tpl-OASIS30ANTs  tpl-UNCInfant
A bunch of the above directories were created in 2022, so I think qsiprep has been using them for a while. What seems new is the inability of templateflow to check that folder.

Hi @pierre-nedelec,

Have you confirmed that the files have been retrieved? That is, they are present and full files (file size is large and can be viewed in an image viewer)?

The use of --containall may preclude your home directory from being used.

Best,
Steven

QSIprep ran to completion successfully on that node today. I also tried rerunning the job without --cleanenv --containall to no avail, same exact error message.

Sorry, Iā€™m confused then.

Using the latest version? Which command was used in that case? What difference was there between that completion and the invocation with the error?

Perhaps it would be easier for me to troubleshoot if you start from scratch.

  1. In somewhere (maybe try outside home dir), clone the templateflow dir:
datalad clone https://github.com/templateflow/templateflow.git
  1. Get the required files:
cd templateflow
datalad get tpl-MNI152NLin2009cAsym/tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz
datalad get tpl-MNI152NLin2009cAsym/tpl-MNI152NLin2009cAsym_res-01_desc-brain_mask.nii.gz
  1. Set export APPTAINERENV_TEMPLATEFLOW_HOME=/path/to/your/templateflow, mount it in your apptainer run command, making sure to not rename it.

Best,
Steven

Ah, I figured it out! sorry about all the confusion, this was an issue with --cleanenv --containall, and environment variables not properly setā€¦

TL;DR

The following script works:

export TEMPLATEFLOW_HOME=path/to/.templateflow
apptainer run --cleanenv --containall \
  -B ${TEMPLATEFLOW_HOME}:${TEMPLATEFLOW_HOME} \
  --env "TEMPLATEFLOW_HOME=$TEMPLATEFLOW_HOME" \
  /path/to/qsiprep_${VERSION}.sif etc.
  • run it once on a node with internet until downloads finish,
  • then it can be submitted as a job to node without internet.

Details

My $TEMPLATEFLOW_HOME was not set in my submission script. When I was submitting it on the dev node via the terminal, it already had it in the environment so it worked. But not in the script, so it was going back to default home directory.

To your earlier point @Steven, I think my $HOME/.cache/templateflow folder was out of date. I used to run this script without the flags --cleanenv --containall, so templateflow would download to $HOME/.cache/templateflow automatically, which would be available both to the dev node (with internet) and the compute nodes (without).

But since I started using these flags, dev node runs were downloading the templates into their own qsiprep working directory, which are not available elsewhere on the cluster. This led job runs to not find the files, try to download them, and fail.

Thank you Steven for your help, really appreciate it! Iā€™ll look into submitting some more info in the qsiprep docs regarding this.

1 Like

And hereā€™s the corresponding PR. Hope this will help others in the future!

1 Like