Minimal XCP-D inputs for HCP YA

I have very limited disk space, if possible i would prefer not to download the entire output folder for each subject, so it will be great if i can simply download only the files required for XCP-d.

the minimal inputs for fMRIprep are listed here but not for the HCP preprocessing pipeline.

Hi @gerardyu,

While we don’t have it as an explicit list yet you can see the ingestion code here, which lists the files XCPD uses (line 115): xcp_d/xcp_d/ingression/hcpya.py at main · PennLINC/xcp_d · GitHub

Best,
Steven

I tried XCP on a single HCP YA subject with the following downloaded files

sub-100206/
└── files
    └── MNINonLinear
        ├── aparc+aseg.nii.gz
        ├── Brainmask_fs.nii.gz
        ├── fsaverage_LR32k
        │   ├── 100206.L.corrThickness.32k_fs_LR.shape.gii
        │   ├── 100206.L.curvature.32k_fs_LR.shape.gii
        │   ├── 100206.L.MyelinMap.32k_fs_LR.func.gii
        │   ├── 100206.L.pial.32k_fs_LR.surf.gii
        │   ├── 100206.L.SmoothedMyelinMap.32k_fs_LR.func.gii
        │   ├── 100206.L.sulc.32k_fs_LR.shape.gii
        │   ├── 100206.L.thickness.32k_fs_LR.shape.gii
        │   ├── 100206.L.white.32k_fs_LR.surf.gii
        │   ├── 100206.R.corrThickness.32k_fs_LR.shape.gii
        │   ├── 100206.R.curvature.32k_fs_LR.shape.gii
        │   ├── 100206.R.MyelinMap.32k_fs_LR.func.gii
        │   ├── 100206.R.pial.32k_fs_LR.surf.gii
        │   ├── 100206.R.SmoothedMyelinMap.32k_fs_LR.func.gii
        │   ├── 100206.R.sulc.32k_fs_LR.shape.gii
        │   ├── 100206.R.thickness.32k_fs_LR.shape.gii
        │   └── 100206.R.white.32k_fs_LR.surf.gii
        ├── Results
        │   ├── rfMRI_REST1_LR
        │   │   ├── brainmask_fs.2.0.nii.gz
        │   │   ├── Movement_AbsoluteRMS.txt
        │   │   ├── Movement_Regressors.txt
        │   │   ├── rfMRI_REST1_LR_Atlas_MSMAll.dtseries.nii
        │   │   ├── rfMRI_REST1_LR.nii.gz
        │   │   └── SBRef_dc.nii.gz
        │   ├── rfMRI_REST1_RL
        │   │   ├── brainmask_fs.2.0.nii.gz
        │   │   ├── Movement_AbsoluteRMS.txt
        │   │   ├── Movement_Regressors.txt
        │   │   ├── rfMRI_REST1_RL_Atlas_MSMAll.dtseries.nii
        │   │   ├── rfMRI_REST1_RL.nii.gz
        │   │   └── SBRef_dc.nii.gz
        │   ├── rfMRI_REST2_LR
        │   │   ├── brainmask_fs.2.0.nii.gz
        │   │   ├── Movement_AbsoluteRMS.txt
        │   │   ├── Movement_Regressors.txt
        │   │   ├── rfMRI_REST2_LR_Atlas_MSMAll.dtseries.nii
        │   │   ├── rfMRI_REST2_LR.nii.gz
        │   │   └── SBRef_dc.nii.gz
        │   └── rfMRI_REST2_RL
        │       ├── brainmask_fs.2.0.nii.gz
        │       ├── Movement_AbsoluteRMS.txt
        │       ├── Movement_Regressors.txt
        │       ├── rfMRI_REST2_RL_Atlas_MSMAll.dtseries.nii
        │       ├── rfMRI_REST2_RL.nii.gz
        │       └── SBRef_dc.nii.gz
        ├── ribbon.nii.gz
        └── T1w.nii.gz

and i ran into the following error

A valid FreeSurfer license file is recommended. Set the FS_LICENSE environment variable or use the '--fs-license-file' flag.
Framewise displacement-based scrubbing is disabled. The following parameters will have no effect:
	--min-time
250602-23:07:07,903 nipype.utils WARNING:
	 convert_hcp2bids is an experimental function.
250602-23:07:07,903 nipype.utils INFO:
	 Converting 100206
250602-23:07:07,904 nipype.utils INFO:
	 Converted dataset already exists. Skipping conversion.
250602-23:07:21,150 nipype.workflow IMPORTANT:
	 Running XCP-D version 0.10.7.dev16+g4d40272
250602-23:07:21,197 nipype.workflow IMPORTANT:
	 Building XCP-D's workflow:
           * Preprocessing derivatives path: /scratch/junhong.yu/HCP/work/dset_bids/derivatives/hcp.
           * Participant list: ['100206'].
           * Run identifier: 20250602-230702_009b3d90-5ebd-4f7e-a715-28421363d8c2.
Process Process-2:
Traceback (most recent call last):
  File "/home/junhong.yu/XCPDenv/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/junhong.yu/XCPDenv/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/junhong.yu/XCPDenv/lib/python3.10/site-packages/xcp_d/cli/workflow.py", line 100, in build_workflow
    retval['workflow'] = init_xcpd_wf()
  File "/home/junhong.yu/XCPDenv/lib/python3.10/site-packages/xcp_d/workflows/base.py", line 81, in init_xcpd_wf
    single_subject_wf = init_single_subject_wf(subject_id)
  File "/home/junhong.yu/XCPDenv/lib/python3.10/site-packages/xcp_d/workflows/base.py", line 127, in init_single_subject_wf
    subj_data = collect_data(
  File "/home/junhong.yu/XCPDenv/lib/python3.10/site-packages/xcp_d/utils/bids.py", line 211, in collect_data
    raise FileNotFoundError(
FileNotFoundError: No BOLD data found in allowed spaces (fsLR).

Query: {'datatype': 'func', 'desc': ['preproc', None], 'suffix': 'bold', 'extension': '.dtseries.nii', 'space': 'fsLR'}

Found files:

here are the files that were generated in dset_bids within the working directory

dset_bids/
└── derivatives
    └── hcp
        ├── dataset_description.json
        └── sub-100206
            ├── anat
            │   ├── sub-100206_from-MNI152NLin6Asym_to-T1w_mode-image_xfm.txt
            │   └── sub-100206_from-T1w_to-MNI152NLin6Asym_mode-image_xfm.txt
            ├── func
            ├── sub-100206_scans.tsv
            └── work

Hi @gerardyu,

Your command would be helpful to know.

Are you sure these files were downlaoded properly? E.g., if using datalad distribution, you have to datalad get them. Do these files open properly in an image viewer (e.g., connectome workbench)?

Best,
Steven

I downloaded the files using a bash script, modified from this

the XCP command:

xcp_d $HCP_dir \
$output_dir \
participant \
--input-type hcp \
--mode linc \
--participant-label $d \
-w $work_dir \
--linc-qc n \
--combine-runs n \
--smoothing 0 \
--skip-parcellation \
--omp-nthreads 2 \
--nthreads 4 \
--md-only-boilerplate

wb_view doesn’t work for me because of missing libraries, but i’m able to do wb_command -nifti-information on all the files, except for the brain masks brainmask_fs.2.0.nii.gz and Brainmask_fs.nii.gz, which i got the error:
ERROR: error reading NIfTI file brainmask_fs.2.0.nii.gz: brainmask_fs.2.0.nii.gz is not a valid NIfTI file
None of these files are 0 bytes though- they all have the same file size of 208kb.

the URL for one of the brain masks that i use in my bash script:
https://db.humanconnectome.org/data/archive/projects/HCP_1200/subjects//experiments/_CREST/resources/_CREST/files/MNINonLinear/Results/rfMRI_REST1_LR/Brainmask_fs.2.nii.gz

Hi @gerardyu,

Yes it looks like all your images are still symlinks that need to be retrieved with datalad get or something equivalent. I have not accessed HCP in the way you referenced, only have used the datalad dataset version: GitHub - datalad-datasets/human-connectome-project-openaccess: WU-Minn HCP1200 Data: 3T/7T MR scans from young healthy adults twins and non-twin siblings (ages 22-35) [T1w, T2w, resting-state and task fMRI, high angular resolution dMRI]

Best,
Steven

looking at the file sizes, i dont think they are symlinks

./files/MNINonLinear/Results:
total 226K
drwxr-x---. 2 junhong.yu hpc_junhongyu_group 293 Jun  3 00:14 rfMRI_REST2_RL
drwxr-x---. 2 junhong.yu hpc_junhongyu_group 293 Jun  3 00:14 rfMRI_REST2_LR
drwxr-x---. 2 junhong.yu hpc_junhongyu_group 293 Jun  3 00:14 rfMRI_REST1_RL
drwxr-x---. 2 junhong.yu hpc_junhongyu_group 293 Jun  3 00:14 rfMRI_REST1_LR

./files/MNINonLinear/Results/rfMRI_REST2_RL:
total 1.6G
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 209K Jun  3 00:14 Brainmask_fs.2.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 209K Jun  2 23:15 brainmask_fs.2.0.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group  11K Jun  2 23:01 Movement_AbsoluteRMS.txt
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 156K Jun  2 23:01 Movement_Regressors.txt
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 419M Jun  2 23:00 rfMRI_REST2_RL_Atlas_MSMAll.dtseries.nii
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 960M Jun  2 23:00 rfMRI_REST2_RL.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 2.4M Jun  2 22:58 SBRef_dc.nii.gz

./files/MNINonLinear/Results/rfMRI_REST2_LR:
total 1.6G
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 209K Jun  3 00:14 Brainmask_fs.2.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 209K Jun  2 23:15 brainmask_fs.2.0.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 9.8K Jun  2 22:58 Movement_AbsoluteRMS.txt
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 156K Jun  2 22:58 Movement_Regressors.txt
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 419M Jun  2 22:58 rfMRI_REST2_LR_Atlas_MSMAll.dtseries.nii
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 958M Jun  2 22:57 rfMRI_REST2_LR.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 2.4M Jun  2 22:56 SBRef_dc.nii.gz

./files/MNINonLinear/Results/rfMRI_REST1_RL:
total 1.6G
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 209K Jun  3 00:14 Brainmask_fs.2.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 209K Jun  2 23:15 brainmask_fs.2.0.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group  11K Jun  2 22:55 Movement_AbsoluteRMS.txt
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 156K Jun  2 22:55 Movement_Regressors.txt
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 419M Jun  2 22:55 rfMRI_REST1_RL_Atlas_MSMAll.dtseries.nii
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 961M Jun  2 22:54 rfMRI_REST1_RL.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 2.5M Jun  2 22:53 SBRef_dc.nii.gz

./files/MNINonLinear/Results/rfMRI_REST1_LR:
total 1.6G
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 209K Jun  3 00:14 Brainmask_fs.2.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 209K Jun  2 23:15 brainmask_fs.2.0.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group  11K Jun  2 22:52 Movement_AbsoluteRMS.txt
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 156K Jun  2 22:52 Movement_Regressors.txt
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 419M Jun  2 22:52 rfMRI_REST1_LR_Atlas_MSMAll.dtseries.nii
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 958M Jun  2 22:52 rfMRI_REST1_LR.nii.gz
-rw-rw-r--. 1 junhong.yu hpc_junhongyu_group 2.5M Jun  2 22:50 SBRef_dc.nii.gz

Oh sorry, I misread your message, I thought you were saying all files were 208kb except for the mask.

Can you please provide your command?

I see your work dir is inside your derivatives directory. I don’t recommend that (unless that is something XCP created under the hood).

dset_bids is actually within a parent work dir

work/
├── 20250602-235806_d9e851f2-2deb-4211-bc05-9d4c6bb18961
│   ├── bids_db
│   │   └── layout_index.sqlite
│   └── config.toml
└── dset_bids
    └── derivatives
        └── hcp
            ├── dataset_description.json
            └── sub-100206
                ├── anat
                │   ├── sub-100206_from-MNI152NLin6Asym_to-T1w_mode-image_xfm.txt
                │   └── sub-100206_from-T1w_to-MNI152NLin6Asym_mode-image_xfm.txt
                ├── func
                ├── sub-100206_scans.tsv
                └── work

Are you using apptainer / docker?

no. i’ll reattempt this using a singularity container later.

but i supposed the issue has something to do with the brain masks?

over at the hcpya.py, it is spelt as brainmask_fs.2.0.nii.gz but according to this it is spelt as Brainmask_fs.2.nii.gz.

I modified the URLs and downloaded both anyway, but both could not be read using wb_command -nifti-information

Unfortunately there are too many unknowns for me to describe an exact solution. I do not know what the variables described in your command link to, what are in those folders, if the behavior will persist in a container, and how the HCP data were downloaded.

If space is an issue, datalad will be a good option, since you can get and remove necessary files before/after running.

I got this figured out. There were two problems.

First, the brainmasks were indeed invalid because the filenames which i got from the HCP reference manual were incorrect. For instance, it should have been brainmask_fs.2.nii.gz instead of Brainmask_fs.2.nii.gz (as listed in the HCP reference manual).

Second, the file directory structure described in hcpya.py is incorrect

it should have been

<sub_id>
└── MNINonLinear
    ├── Results
    │   ├── *_<TASK_ID><RUN_ID>_<DIR_ID>
    │   │   ├── SBRef_dc.nii.gz
    │   │   ├── *_<TASK_ID><RUN_ID>_<DIR_ID>.nii.gz
    │   │   ├── *_<TASK_ID><RUN_ID>_<DIR_ID>_Atlas_MSMAll.dtseries.nii
    │   │   ├── Movement_Regressors.txt
    │   │   ├── Movement_AbsoluteRMS.txt
    │   │   └── brainmask_fs.2.nii.gz
    ├── T1w.nii.gz
    ├── aparc+aseg.nii.gz
    ├── brainmask_fs.nii.gz
    └── ribbon.nii.gz

instead of

##from hcpya.py
 sub-<sub_id>
            └── files
                └── MNINonLinear
                    ├── Results
                    │   ├── *_<TASK_ID><RUN_ID>_<DIR_ID>
                    │   │   ├── SBRef_dc.nii.gz
                    │   │   ├── *_<TASK_ID><RUN_ID>_<DIR_ID>.nii.gz
                    │   │   ├── *_<TASK_ID><RUN_ID>_<DIR_ID>_Atlas_MSMAll.dtseries.nii
                    │   │   ├── Movement_Regressors.txt
                    │   │   ├── Movement_AbsoluteRMS.txt
                    │   │   └── brainmask_fs.2.0.nii.gz
                    ├── T1w.nii.gz
                    ├── aparc+aseg.nii.gz
                    ├── brainmask_fs.nii.gz
                    └── ribbon.nii.gz

Note, i’m leaving out all the entire fsaverage_LR32k directory because i realized that these files are not required if the --warp-surfaces-native2std flag is not used.

Also the ALFF processing is taking several hours for each subject. I understand that there isn’t an option to turn it off at the moment. So i went to comment out all the alff sections in cifti.py and outputs.py