How do you BIDS-ify HCP1200 data?

I’m having difficulty converting HCP1200 data to BIDS.

The directory layout of each participant is of the form

100206
    /.xdlm
    /MNINonLinear
    /T1w
    /release-notes
    /unprocessed

I’m unsure what subdirectories to focus on and how to go forward at all, actually. Any hints or pointers - or even detailed instructions - would be appreciated!

Darcy

Hi @dlq and welcome to neurostars!

https://micapipe.readthedocs.io/en/latest/pages/04.start2end/index.html

Best,
Steven

So far, I’ve tried hcp2bids but it creates BIDS directory structures that seem way to small and don’t validate.

BTW, I’m not new to BIDS-ifying data - I’ve bulk BIDS-ified UK Biobank brain imaging data with various tools.

I haven’t tried any of the micapipe tools. I’ll try and get back to you if it worked for me. The hpc2bids I tried was independent of micapipe, I think.

Looking at that function I doubt it will get you all the way there, since I don’t see any mention of fieldmaps or linking them to their EPIs. Are you looking to BIDSify the raw or processed data?

Hi @dlq,

ANAT
T1w: HCP1200/${SUB}/unprocessed/3T/T1w_MPR1/${SUB}_3T_T1w_MPR1.nii.gz
BIDS example name: sub-X/ses-3T/dwi/sub-X_ses-3T_T1w.nii.gz
T2w: HCP1200/${SUB}/unprocessed/3T/T1w_SPC1/100307_3T_T2w_SPC1.nii.gz
BIDS example name: sub-X/ses-3T/dwi/sub-X_ses-3T_T2w.nii.gz
jsons not required for anatomical files
DWI:
HCP1200/${SUB}/unprocessed/3T/Diffusion
files are fairly self explanatory within there. The bias files will probably not be of use to you (and do not have a BIDS valid name yet, but can go in a .bidsignore)
BIDS example name: sub-X/ses-3T/dwi/sub-X_ses-3T_acq-${ACQ}_dir-${DIR}_dwi.nii.gz / bval / bvec
fMRI (rest)
HCP1200/${SUB}/unprocessed/3T/rfMRI_REST_${DIR}
SBREF file: ${SUB}_3T_rfMRI_REST1_*_SBRef.nii.gz
BOLD file: 1${SUB}_3T_rfMRI_REST1_*.nii.gz
fMRI (task)
HCP1200/${SUB}/unprocessed/3T/tfMRI_${TASK}_${DIR}
SBREF file: ${SUB}_3T_rfMRI_${TASK}_${DIR}_SBRef.nii.gz
BIDS example name: sub-X/ses-3T/func/sub-X_ses-3T_task-${TASK}_sbref.nii.gz
BOLD file: ${SUB}_3T_rfMRI_${TASK}_${DIR}.nii.gz
BIDS example name: sub-X/ses-3T/func/sub-X_ses-3T_task-${TASK}_bold.nii.gz
Fieldmaps
HCP1200/${SUB}/unprocessed/3T/${BOLD_FOLDER}
Phase1: ${SUB}_3T_SpinEchoFieldMap_RL
Phase2: ${SUB}_3T_SpinEchoFieldMap_LR
BIDS example name: sub-X/ses-3T/fmap/sub-X_ses-3T_acq-BOLD${TASK}Fmap_dir-${DIR}_epi.nii.gz

Unfortunately there are no JSONS, so these will need to be produced. At a minimum you will need to add RepetitionTime, TotalReadoutTime (I think for HCP people just use a standard value of 0.05 throughout), PhaseEncodingDirection. Beyond this, the fieldmaps/epis will need the proper IntendedFor/B0FieldSource/B0FieldIdentifier tags attached to them, the BOLD jsons will need a TaskName json field (EDITED TO ADD also need SliceTiming).

RepetitionTime you can get from the nifti header (4th dimension)
TotalReadoutTime you can just fake, as long as consistent throughout (unless you find that info somewhere)
PhaseEncodingDirection you get from file name / folder
TaskName you get from file name / folder
IntendedFor/B0FieldSource/B0FieldIdentifier you would have to define those pairings.

ChatGPT or other AI tools would probably be good to start with making an automation script for this.

Best,
Steven

1 Like

Good question. I’m actually unsure if raw or processed is needed. I’ll start with the raw data.

unprocessed in the HCP1200 context is raw, correct?

Ok, I’m confused about what’s in T1W vs what’s in unprocessed/3T/T1w_MPR1.

Hi @dlq I edited my previous post to tell you all the relevant data for unprocessed. The preprocessed structural / diffusion / subject space bold is in the T1w folder, where as the MNI standard space BOLD results are in the MNINonLinear folder.

If you get something to work I am happy to test it on my end and I’m sure it would be valuable to people.

Best,
Steven

1 Like

Hi @dlq,

Here’s a start via ChatGPT. Untested because I don’t have a subject cloned locally. But at cursory glance looks almost there. The phase encoding direction will need to be corrected to the i/i- though, based on LR / LR. Should be a pretty trivial fix.

#!/usr/bin/env python3
"""
BIDS-ify HCP Data with Fieldmap, DWI SBRef, and Dynamic IntendedFor Paths

This script processes an HCP subject’s 3T data (anatomical, diffusion, resting fMRI,
task fMRI, and corresponding fieldmaps). It copies files into a BIDS‑compliant folder
structure and creates JSON sidecars. For fMRI runs, the script:
  - Reads the NIfTI header to extract RepetitionTime.
  - Sets a TotalReadoutTime (here fixed to 0.05).
  - Derives PhaseEncodingDirection from the file or folder name.
  - Extracts the TaskName from the folder (or uses "rest" for resting fMRI).
  - Generates a unique fieldmap identifier for the run.
  - Updates the BOLD JSON to include a "B0FieldSource" key (with that identifier).
  - Processes fieldmaps by copying them into the BIDS “fmap” folder and generating JSON
    sidecars that include an "IntendedFor" field pointing to the corresponding fMRI file.
    The IntendedFor path is relative to the subject folder (e.g., "ses-3T/func/...").
    
For DWI data, the script also processes diffusion SBRef files (if present).

Usage:
    python bidsify_hcp.py --input_dir /path/to/HCP1200 \
        --subject 100307 --output_dir /path/to/BIDS_dataset
"""

import os
import glob
import re
import argparse
import shutil
import json
import nibabel as nib

# Standard value for TotalReadoutTime (modify as needed)
TOTAL_READOUT_TIME = 0.05

def get_repetition_time(nifti_path):
    """Extract the RepetitionTime from the NIfTI header (4th dimension voxel size)."""
    try:
        img = nib.load(nifti_path)
        zooms = img.header.get_zooms()
        if len(zooms) >= 4:
            return zooms[3]
        else:
            return None
    except Exception as e:
        print(f"Error reading header from {nifti_path}: {e}")
        return None

def create_json_file(json_path, data):
    """Write a JSON sidecar file."""
    with open(json_path, 'w') as f:
        json.dump(data, f, indent=4)
    print(f"Created JSON sidecar: {json_path}")

def process_anat(subject, input_base, output_base):
    """Process anatomical scans (T1w and T2w)."""
    anat_dir = os.path.join(output_base, f"sub-{subject}", "ses-3T", "anat")
    os.makedirs(anat_dir, exist_ok=True)

    # Process T1w
    t1w_src = os.path.join(input_base, subject, "unprocessed", "3T", "T1w_MPR1", f"{subject}_3T_T1w_MPR1.nii.gz")
    if os.path.exists(t1w_src):
        t1w_dest = os.path.join(anat_dir, f"sub-{subject}_ses-3T_T1w.nii.gz")
        shutil.copy2(t1w_src, t1w_dest)
        print(f"Copied T1w: {t1w_src} -> {t1w_dest}")
    else:
        print(f"T1w source not found: {t1w_src}")

    # Process T2w
    t2w_src = os.path.join(input_base, subject, "unprocessed", "3T", "T1w_SPC1", f"{subject}_3T_T2w_SPC1.nii.gz")
    if os.path.exists(t2w_src):
        t2w_dest = os.path.join(anat_dir, f"sub-{subject}_ses-3T_T2w.nii.gz")
        shutil.copy2(t2w_src, t2w_dest)
        print(f"Copied T2w: {t2w_src} -> {t2w_dest}")
    else:
        print(f"T2w source not found: {t2w_src}")

def process_dwi(subject, input_base, output_base):
    """Process diffusion-weighted imaging (DWI) including SBRef files."""
    dwi_input = os.path.join(input_base, subject, "unprocessed", "3T", "Diffusion")
    dwi_output = os.path.join(output_base, f"sub-{subject}", "ses-3T", "dwi")
    os.makedirs(dwi_output, exist_ok=True)

    # Process main DWI files (skip bias and SBRef files)
    nifti_files = glob.glob(os.path.join(dwi_input, "*.nii.gz"))
    for nifti in nifti_files:
        basename = os.path.basename(nifti)
        if "BIAS" in basename or "SBRef" in basename:
            continue

        m = re.search(r"DWI_dir(?P<acq>\d+)_?(?P<dir>[A-Z]{2})", basename)
        if m:
            acq = m.group("acq")
            dir_val = m.group("dir")
        else:
            acq = "unknown"
            dir_val = "unknown"

        bids_name = f"sub-{subject}_ses-3T_acq-{acq}_dir-{dir_val}_dwi.nii.gz"
        dest_path = os.path.join(dwi_output, bids_name)
        shutil.copy2(nifti, dest_path)
        print(f"Copied DWI: {nifti} -> {dest_path}")

        for ext in [".bval", ".bvec"]:
            src_ext = nifti.replace(".nii.gz", ext)
            if os.path.exists(src_ext):
                dest_ext = dest_path.replace(".nii.gz", ext)
                shutil.copy2(src_ext, dest_ext)
                print(f"Copied {ext} file: {src_ext} -> {dest_ext}")

    # Process DWI SBRef files
    sbref_files = glob.glob(os.path.join(dwi_input, "*SBRef.nii.gz"))
    for sbref in sbref_files:
        basename = os.path.basename(sbref)
        m = re.search(r"DWI_dir(?P<acq>\d+)_?(?P<dir>[A-Z]{2})", basename)
        if m:
            acq = m.group("acq")
            dir_val = m.group("dir")
        else:
            acq = "unknown"
            dir_val = "unknown"
        bids_name = f"sub-{subject}_ses-3T_acq-{acq}_dir-{dir_val}_sbref.nii.gz"
        dest_path = os.path.join(dwi_output, bids_name)
        shutil.copy2(sbref, dest_path)
        print(f"Copied DWI SBRef: {sbref} -> {dest_path}")

def process_func_rest(subject, input_base, output_base, total_readout=TOTAL_READOUT_TIME):
    """Process resting-state fMRI scans and their fieldmaps."""
    for pe_dir in ["LR", "RL"]:
        func_input = os.path.join(input_base, subject, "unprocessed", "3T", f"rfMRI_REST_{pe_dir}")
        if not os.path.exists(func_input):
            print(f"Resting fMRI folder not found: {func_input}")
            continue

        func_output = os.path.join(output_base, f"sub-{subject}", "ses-3T", "func")
        os.makedirs(func_output, exist_ok=True)

        # Unique identifier for this run's fieldmap pairing
        identifier = f"fmap_{subject}_rest_{pe_dir.lower()}"

        # Use the phase encoding direction as part of the acquisition label for fMRI files.
        sbref_bids = f"sub-{subject}_ses-3T_task-rest_acq-{pe_dir.lower()}_sbref.nii.gz"
        bold_bids  = f"sub-{subject}_ses-3T_task-rest_acq-{pe_dir.lower()}_bold.nii.gz"

        for sbref in glob.glob(os.path.join(func_input, f"{subject}_3T_rfMRI_REST1_*_SBRef.nii.gz")):
            dest_path = os.path.join(func_output, sbref_bids)
            shutil.copy2(sbref, dest_path)
            print(f"Copied Rest SBRef: {sbref} -> {dest_path}")
            rt = get_repetition_time(sbref)
            json_data = {
                "RepetitionTime": rt,
                "TotalReadoutTime": total_readout,
                "PhaseEncodingDirection": pe_dir.lower(),
                "TaskName": "rest"
            }
            create_json_file(dest_path.replace(".nii.gz", ".json"), json_data)

        for bold in glob.glob(os.path.join(func_input, f"1{subject}_3T_rfMRI_REST1_*.nii.gz")):
            dest_path = os.path.join(func_output, bold_bids)
            shutil.copy2(bold, dest_path)
            print(f"Copied Rest BOLD: {bold} -> {dest_path}")
            rt = get_repetition_time(bold)
            json_data = {
                "RepetitionTime": rt,
                "TotalReadoutTime": total_readout,
                "PhaseEncodingDirection": pe_dir.lower(),
                "TaskName": "rest",
                "B0FieldSource": identifier
            }
            create_json_file(dest_path.replace(".nii.gz", ".json"), json_data)

        fmap_output = os.path.join(output_base, f"sub-{subject}", "ses-3T", "fmap")
        os.makedirs(fmap_output, exist_ok=True)
        for fmap in glob.glob(os.path.join(func_input, "*SpinEchoFieldMap*.nii.gz")):
            basename = os.path.basename(fmap)
            m = re.search(r"SpinEchoFieldMap_(?P<pe>[A-Z]{2})", basename)
            fmap_pe = m.group("pe").lower() if m else "unknown"
            # Fieldmap file name for resting is built using acq-rest
            bids_name = f"sub-{subject}_ses-3T_acq-rest_dir-{fmap_pe}_epi.nii.gz"
            dest_fmap = os.path.join(fmap_output, bids_name)
            shutil.copy2(fmap, dest_fmap)
            print(f"Copied Rest Fieldmap: {fmap} -> {dest_fmap}")
            rt = get_repetition_time(fmap)
            json_data = {
                "RepetitionTime": rt,
                "TotalReadoutTime": total_readout,
                "PhaseEncodingDirection": fmap_pe,
                "IntendedFor": [f"ses-3T/func/{bold_bids}"],
                "B0FieldIdentifier": identifier
            }
            create_json_file(dest_fmap.replace(".nii.gz", ".json"), json_data)

def process_func_task(subject, input_base, output_base, total_readout=TOTAL_READOUT_TIME):
    """Process task fMRI scans and their fieldmaps."""
    tfmri_base = os.path.join(input_base, subject, "unprocessed", "3T")
    task_folders = [d for d in os.listdir(tfmri_base) if d.startswith("tfMRI_")]
    for folder in task_folders:
        m = re.match(r"tfMRI_(?P<task>[A-Z]+)_(?P<dir>[A-Z]{2})", folder)
        if not m:
            print(f"Could not parse task folder: {folder}")
            continue
        task = m.group("task").lower()
        pe_dir = m.group("dir")
        func_input = os.path.join(tfmri_base, folder)
        func_output = os.path.join(output_base, f"sub-{subject}", "ses-3T", "func")
        os.makedirs(func_output, exist_ok=True)

        identifier = f"fmap_{subject}_{task}_{pe_dir.lower()}"

        sbref_bids = f"sub-{subject}_ses-3T_task-{task}_acq-{pe_dir.lower()}_sbref.nii.gz"
        bold_bids  = f"sub-{subject}_ses-3T_task-{task}_acq-{pe_dir.lower()}_bold.nii.gz"

        for sbref in glob.glob(os.path.join(func_input, f"{subject}_3T_rfMRI_{m.group('task')}_{pe_dir}_SBRef.nii.gz")):
            dest_path = os.path.join(func_output, sbref_bids)
            shutil.copy2(sbref, dest_path)
            print(f"Copied Task SBRef: {sbref} -> {dest_path}")
            rt = get_repetition_time(sbref)
            json_data = {
                "RepetitionTime": rt,
                "TotalReadoutTime": total_readout,
                "PhaseEncodingDirection": pe_dir.lower(),
                "TaskName": task
            }
            create_json_file(dest_path.replace(".nii.gz", ".json"), json_data)

        for bold in glob.glob(os.path.join(func_input, f"{subject}_3T_rfMRI_{m.group('task')}_{pe_dir}.nii.gz")):
            dest_path = os.path.join(func_output, bold_bids)
            shutil.copy2(bold, dest_path)
            print(f"Copied Task BOLD: {bold} -> {dest_path}")
            rt = get_repetition_time(bold)
            json_data = {
                "RepetitionTime": rt,
                "TotalReadoutTime": total_readout,
                "PhaseEncodingDirection": pe_dir.lower(),
                "TaskName": task,
                "B0FieldSource": identifier
            }
            create_json_file(dest_path.replace(".nii.gz", ".json"), json_data)

        fmap_output = os.path.join(output_base, f"sub-{subject}", "ses-3T", "fmap")
        os.makedirs(fmap_output, exist_ok=True)
        for fmap in glob.glob(os.path.join(func_input, "*SpinEchoFieldMap*.nii.gz")):
            basename = os.path.basename(fmap)
            m_fmap = re.search(r"SpinEchoFieldMap_(?P<pe>[A-Z]{2})", basename)
            fmap_pe = m_fmap.group("pe").lower() if m_fmap else "unknown"
            # Fieldmap file name now does NOT include a "task-" label.
            # The task info is captured in the acq- label (e.g., acq-emotionfmap)
            bids_name = f"sub-{subject}_ses-3T_acq-{task}fmap_dir-{fmap_pe}_epi.nii.gz"
            dest_fmap = os.path.join(fmap_output, bids_name)
            shutil.copy2(fmap, dest_fmap)
            print(f"Copied Task Fieldmap: {fmap} -> {dest_fmap}")
            rt = get_repetition_time(fmap)
            json_data = {
                "RepetitionTime": rt,
                "TotalReadoutTime": total_readout,
                "PhaseEncodingDirection": fmap_pe,
                "IntendedFor": [f"ses-3T/func/{bold_bids}"],
                "B0FieldIdentifier": identifier
            }
            create_json_file(dest_fmap.replace(".nii.gz", ".json"), json_data)

def main():
    parser = argparse.ArgumentParser(description="BIDS-ify HCP 3T data for one subject with fMRI fieldmap pairing and DWI SBRef processing")
    parser.add_argument("--input_dir", required=True,
                        help="Path to the HCP1200 directory")
    parser.add_argument("--subject", required=True,
                        help="Subject ID (e.g., 100307)")
    parser.add_argument("--output_dir", required=True,
                        help="Path to output BIDS directory")
    args = parser.parse_args()

    print(f"Processing subject {args.subject} ...")
    process_anat(args.subject, args.input_dir, args.output_dir)
    process_dwi(args.subject, args.input_dir, args.output_dir)
    process_func_rest(args.subject, args.input_dir, args.output_dir)
    process_func_task(args.subject, args.input_dir, args.output_dir)
    print("BIDS-ification complete.")

if __name__ == "__main__":
    main()

Best,
Steven

Wow!

That actually gets me a good part of the way there and generates

└── ses-3T
    ├── anat
    │   └── sub-100206_ses-3T_T1w.nii.gz
    ├── dwi
    │   ├── sub-100206_ses-3T_acq-95_dir-LR_dwi.bval
    │   ├── sub-100206_ses-3T_acq-95_dir-LR_dwi.bvec
    │   ├── sub-100206_ses-3T_acq-95_dir-LR_dwi.nii.gz
    │   ├── sub-100206_ses-3T_acq-95_dir-LR_sbref.nii.gz
    │   ├── sub-100206_ses-3T_acq-95_dir-RL_dwi.bval
    │   ├── sub-100206_ses-3T_acq-95_dir-RL_dwi.bvec
    │   ├── sub-100206_ses-3T_acq-95_dir-RL_dwi.nii.gz
    │   ├── sub-100206_ses-3T_acq-95_dir-RL_sbref.nii.gz
    │   ├── sub-100206_ses-3T_acq-96_dir-LR_dwi.bval
    │   ├── sub-100206_ses-3T_acq-96_dir-LR_dwi.bvec
    │   ├── sub-100206_ses-3T_acq-96_dir-LR_dwi.nii.gz
    │   ├── sub-100206_ses-3T_acq-96_dir-LR_sbref.nii.gz
    │   ├── sub-100206_ses-3T_acq-96_dir-RL_dwi.bval
    │   ├── sub-100206_ses-3T_acq-96_dir-RL_dwi.bvec
    │   ├── sub-100206_ses-3T_acq-96_dir-RL_dwi.nii.gz
    │   ├── sub-100206_ses-3T_acq-96_dir-RL_sbref.nii.gz
    │   ├── sub-100206_ses-3T_acq-97_dir-LR_dwi.bval
    │   ├── sub-100206_ses-3T_acq-97_dir-LR_dwi.bvec
    │   ├── sub-100206_ses-3T_acq-97_dir-LR_dwi.nii.gz
    │   ├── sub-100206_ses-3T_acq-97_dir-LR_sbref.nii.gz
    │   ├── sub-100206_ses-3T_acq-97_dir-RL_dwi.bval
    │   ├── sub-100206_ses-3T_acq-97_dir-RL_dwi.bvec
    │   ├── sub-100206_ses-3T_acq-97_dir-RL_dwi.nii.gz
    │   └── sub-100206_ses-3T_acq-97_dir-RL_sbref.nii.gz
    ├── fmap
    │   ├── sub-100206_ses-3T_acq-relationalfmap_dir-lr_epi.json
    │   └── sub-100206_ses-3T_acq-relationalfmap_dir-lr_epi.nii.gz
    └── func

This is incomplete because of the error below but it looks right?

Now, of course, on a ChatGPT first pass for this it give me an error

  File "/lustre04/scratch/dlq/bidsify_hcf.py", line 280, in <module>
    main()
    ~~~~^^
  File "/lustre04/scratch/dlq/bidsify_hcf.py", line 276, in main
    process_func_task(args.subject, args.input_dir, args.output_dir)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre04/scratch/dlq/bidsify_hcf.py", line 260, in process_func_task
    create_json_file(dest_fmap.replace(".nii.gz", ".json"), json_data)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre04/scratch/dlq/bidsify_hcf.py", line 52, in create_json_file
    json.dump(data, f, indent=4)
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v4/Compiler/gcccore/python/3.13.2/lib/python3.13/json/__init__.py", line 179, in dump
    for chunk in iterable:
                 ^^^^^^^^
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v4/Compiler/gcccore/python/3.13.2/lib/python3.13/json/encoder.py", line 432, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v4/Compiler/gcccore/python/3.13.2/lib/python3.13/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v4/Compiler/gcccore/python/3.13.2/lib/python3.13/json/encoder.py", line 439, in _iterencode
    o = _default(o)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v4/Compiler/gcccore/python/3.13.2/lib/python3.13/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
                    f'is not JSON serializable')
TypeError: Object of type float32 is not JSON serializable

But I think I can now fool around with this and the micapipe script and get something working tomorrow.

Forgot to mention that SliceTiming will be important for fMRI jsons.