Xcp_d Dataset Type Key Error

Summary of what happened:

Hi all,

I am currently trying to run the XCP_d container with apptainer on an HPC cluster and running into the same error regardless of what I do, with the output from fmriprep version 23.1.3 which ran successfully through.

I’m hoping this is just a simple fix - but the documentation is not the best on the website to try to troubleshoot and can’t find this problem on here, or google or other forums.

Thank you in advance! and let me know if you require any further information or clarification.

Best,

Brittany

Command used (and if a helper script was used, a link to the helper script or the command generated):

apptainer run --cleanenv xcp_d.sif CCNA/derivatives/fmriprep/ CCNA/derivatives/ participant --participant-label BCT2991 --input-type fmriprep -w CCNA/work/

Version:

0.6.0

Environment (Docker, Singularity / Apptainer, custom installation):

Singularity / Apptainer on HPC cluster (Compute Canada)

Data formatted according to a validatable standard? Please provide the output of the validator:

PASTE VALIDATOR OUTPUT HERE

Relevant log outputs (up to 20 lines):

I receive the following error:

240228-18:12:45,307 nipype.workflow IMPORTANT:
         Running xcp_d version 0.6.0:
    * fMRI directory path: /lustre07/scratch/bnintzan/CCNA/derivatives/fmriprep.
    * Participant list: ['BCT2991'].
    * Run identifier: 20240228-181206_6f11fde0-2003-4ea0-bbf4-daa6bff540ab.


Process Process-2:
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/cli/run.py", line 977, in build_workflow
    retval["workflow"] = init_xcpd_wf(
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/workflows/base.py", line 196, in init_xcpd_wf
    write_dataset_description(fmri_dir, os.path.join(output_dir, "xcp_d"))
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/utils/bids.py", line 676, in write_dataset_description
    assert dset_desc["DatasetType"] == "derivative"
KeyError: 'DatasetType'

Screenshots / relevant information:


Your preprocessing pipeline should write out a dataset_description.json file that includes a field named "DatasetType" with a value of "derivative". It’s odd that this would be missing from the dataset_description.json of an fMRIPrep dataset.

Hi @bnintzan,

I also notice you do not have bind strings to mount drives. Can you confirm that CCNA is mounted? Also, I do not recommend putting your work directory in the BIDS root directory.

Best,
Steven

Hi both,

thanks for your quick response.
@tsalo - this is what I’m very confused by as well - I just assumed that this was part of the output, the whole purpose of using these two pipelines for me is that they seem to work so well together. Do you have any suggestions? Should I re-run one participant and see if that is automatically output? Or maybe it is in a different location or something?

@Steven - yes it is mounted, I reranv it with it being obvious that they are mounted and have moved the work directory as you can see below (with the exact same issue):

apptainer run --cleanenv -B /lustre07/scratch/bnintzan/CCNA/derivatives/fmriprep/:/input -B /lustre07/scratch/bnintzan/CCNA/derivatives:/output -B /lustre07/scratch/bnintzan/work:/work xcp_d.sif /input /output participant --participant-label BCT2991 -w /work/
240228-19:40:12,256 nipype.workflow IMPORTANT:
         Running xcp_d version 0.6.0:
    * fMRI directory path: /input.
    * Participant list: ['BCT2991'].
    * Run identifier: 20240228-193745_fea1e00b-455d-417f-b1d7-e7491e65305f.


Process Process-2:
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/cli/run.py", line 977, in build_workflow
    retval["workflow"] = init_xcpd_wf(
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/workflows/base.py", line 196, in init_xcpd_wf
    write_dataset_description(fmri_dir, os.path.join(output_dir, "xcp_d"))
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/utils/bids.py", line 676, in write_dataset_description
    assert dset_desc["DatasetType"] == "derivative"
KeyError: 'DatasetType'

Hi @bnintzan

What files are in your fmriprep directory besides the subject folders and htmls? If you have a dataset_descreption.json file, what are the contents?

Best,
Steven

Hi @Steven ,

There is the following outside of the subject folders and htmls:

-rw-r-----. 1 bnintzan bnintzan   634 Feb 28 10:51 dataset_description.json

-rw-r-----. 1 bnintzan bnintzan 49926 Feb 27 22:22 desc-aseg_dseg.tsv
-rw-r-----. 1 bnintzan bnintzan 49926 Feb 27 22:22 desc-aparcaseg_dseg.tsv
drwxr-x---. 2 bnintzan bnintzan 25600 Feb 27 06:16 logs
drwxr-x---. 3 bnintzan bnintzan 25600 Feb 14 10:40 sourcedata

Hi @bnintzan,

And what are the contents of the dataset description? you could add the key value pair DatasetType: "derivative" to the json if it is not there, and that should avoid the error.

Hi @Steven,

I changed the dataset_description.json to this:

 "BIDSVersion": "1.4.1",
  "DatasetDOI": "TODO: eventually a DOI for the dataset",
  "DatasetType": "derivatives",

and now get the following error:

240228-20:39:15,828 nipype.workflow IMPORTANT:
         Running xcp_d version 0.6.0:
    * fMRI directory path: /input.
    * Participant list: ['BCT2991'].
    * Run identifier: 20240228-203853_f94e898c-8e23-43c4-9c8a-b02ad8178c78.


Process Process-2:
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/cli/run.py", line 977, in build_workflow
    retval["workflow"] = init_xcpd_wf(
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/workflows/base.py", line 196, in init_xcpd_wf
    write_dataset_description(fmri_dir, os.path.join(output_dir, "xcp_d"))
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/utils/bids.py", line 676, in write_dataset_description
    assert dset_desc["DatasetType"] == "derivative"
AssertionError

Hi @bnintzan,

If you look at the error you can see it expects it be “derivative”, not “derivatives”.

Best,
Steven

Hi @Steven ,

At least your eyes are working … I did this and it worked! Unfortunately now I have a new error:

ash-4.4$ apptainer run --cleanenv -B /lustre07/scratch/bnintzan/CCNA/derivatives/fmriprep/:/input -B /lustre07/scratch/bnintzan/CCNA/derivatives:/output -B /lustre07/scratch/bnintzan/work:/work xcp_d.sif /input /output participant --participant-label BCT2991 -w /work/
240228-21:13:51,659 nipype.workflow IMPORTANT:
         Running xcp_d version 0.6.0:
    * fMRI directory path: /input.
    * Participant list: ['BCT2991'].
    * Run identifier: 20240228-211325_e36b207e-d575-43c0-9449-a05b96097a63.


240228-21:13:52,252 nipype.utils IMPORTANT:
         Collected data:
anat_brainmask: /input/sub-BCT2991/anat/sub-BCT2991_desc-brain_mask.nii.gz
anat_dseg: /input/sub-BCT2991/anat/sub-BCT2991_dseg.nii.gz
anat_to_template_xfm: /input/sub-BCT2991/anat/sub-BCT2991_from-T1w_to-MNI152NLin2009cAsym_mode-image_xfm.h5
bold:
- /input/sub-BCT2991/func/sub-BCT2991_task-rest_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz
t1w: /input/sub-BCT2991/anat/sub-BCT2991_desc-preproc_T1w.nii.gz
t2w: null
template_to_anat_xfm: /input/sub-BCT2991/anat/sub-BCT2991_from-MNI152NLin2009cAsym_to-T1w_mode-image_xfm.h5

240228-21:13:52,475 nipype.utils IMPORTANT:
         Collected mesh files:
lh_pial_surf: /input/sub-BCT2991/anat/sub-BCT2991_hemi-L_pial.surf.gii
lh_wm_surf: /input/sub-BCT2991/anat/sub-BCT2991_hemi-L_white.surf.gii
rh_pial_surf: /input/sub-BCT2991/anat/sub-BCT2991_hemi-R_pial.surf.gii
rh_wm_surf: /input/sub-BCT2991/anat/sub-BCT2991_hemi-R_white.surf.gii

240228-21:13:52,637 nipype.utils IMPORTANT:
         Collected morphometry files:
cortical_thickness: null
cortical_thickness_corr: null
myelin: null
myelin_smoothed: null
sulcal_curv: null
sulcal_depth: null

Process Process-2:
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/cli/run.py", line 977, in build_workflow
    retval["workflow"] = init_xcpd_wf(
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/workflows/base.py", line 199, in init_xcpd_wf
    single_subj_wf = init_subject_wf(
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/workflows/base.py", line 446, in init_subject_wf
    info_dict = get_preproc_pipeline_info(input_type=input_type, fmri_dir=fmri_dir)
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/utils/bids.py", line 739, in get_preproc_pipeline_info
    "name": dataset_dict["GeneratedBy"][0]["Name"],
KeyError: 'GeneratedBy'

So I tried to change the data description to have “GeneratedBy” and include that and I only get :

Process Process-2:
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/miniconda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/cli/run.py", line 977, in build_workflow
    retval["workflow"] = init_xcpd_wf(
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/workflows/base.py", line 196, in init_xcpd_wf
    write_dataset_description(fmri_dir, os.path.join(output_dir, "xcp_d"))
  File "/usr/local/miniconda/lib/python3.8/site-packages/xcp_d/utils/bids.py", line 681, in write_dataset_description
    generated_by.insert(
AttributeError: 'str' object has no attribute 'insert'

Hi @bnintzan,

Just put this in (adapted from my 23.2.0 run)

{
    "Name": "fMRIPrep - fMRI PREProcessing workflow",
    "BIDSVersion": "1.4.1",
    "DatasetType": "derivative",
    "GeneratedBy": [
        {
            "Name": "fMRIPrep",
            "Version": "23.1.3",
            "CodeURL": "https://github.com/nipreps/fmriprep/archive/23.2.0a2.tar.gz"
        }
    ],
    "HowToAcknowledge": "Please cite our paper (https://doi.org/10.1038/s41592-018-0235-4), and include the generated citation boilerplate within the Methods section of the text.",
    "License": "CC0"
}

Best,
Steven

It is now working!

Thank you for the easy fixes as always.

Best,

Brittany

I’ve merged a PR to make the DatasetType check into a warning instead of an error, so the same error won’t occur starting in the next release (0.6.3).

1 Like