Building a BIDS conform subdataset with Datalad with only specific MRI Sequences

Hello together,

i am new to neuroimaging and I am trying to establish a python based workflow like the following with datalad as version control system:

  1. raw data

  2. BIDS dataset (created with Heudiconv)

  3. BIDS subdataset / derivative with specific mri sequences (resting state fmri)

  4. further analysis

  5. and 2. contain anatomical, functional (task & rest) and dwi images. For my analysis I would like to tailor the dataset to only resting state and anatomical data for further analysis.

Could somebody give me a hint, which tool would be best to achieve this. fmriprep, bidsonym and others need to be implemented in the workflow, but I did not see a smart way to perform further analysis only with parts of the initial data. Is there a datalad (YODA compliant) way of doing this. Maybe via pybids oder one of the nipy packages (nistats, nibabel, etc.)? Does one need to clean the JSON files afterwards manually or is there a automatic solution?

Kind regards

for 3. you don’t want derivatives to be subdataset of BIDS, rather the other way around. May be GitHub - ReproNim/containers: Containers "distribution" for reproducible neuroimaging could of help? overall, for YODA style, I think what would work is a

  • directory with all those datasets starting with BIDS one (so it is kinda “not YODA” here). I guess could also be a dataset
  • create a new derivatives (sub)dataset, install its sources within in --reckless=ephemeral mode, produce results, uninstall source datasets

Thanks for the hints. Let me see if I understood you correctly. I create the BIDS dataset in a own directory (BIDS/). Then for each step I create a new dataset in datalad and register the step before as sourcedataset via datalad clone --reckless=ephemeral mode. Which would result in the following structure


  • BIDS/
  • BIDSonym/
    ++ source/
    +++ BIDS/
  • fMRIprep/
    ++ source/
    +++ BIDSonym
  • Analysis/
    ++ source/
    +++ fMRIprep

What is the main reason for --reckless=ephemeral mode? To have no duplicate file structures/annex? What would be the disadvantage of a “normal” local clone from the “parent” dataset?

Regarding the Repronim/containers, I plan on using them, but I thought they don´t automatically manage my data structure but only ensure that there is BIDS input and output?

Kind regards