Raw and derivatives seperation

Hi all,

I’m wondering if separating types of data higher on the hierarchy would be considered BIDS compatible or would likely to create any problems. My raw_bids folder is fully compatible (passes validator) but it is not clear to me if the project at all would be compatible or not.
IMO it keeps the principal of separation of datatypes and makes navigation easier within the project.

The way that I envision doing so below:

Project Name/
raw_bids/
sub-0001/…
sub-0002/…

derivatives/
pipeline1/
sub-0001/…
sub-0002/…

sourcedata/

looks like formatting was lost when posting…
Idea is there are 3 folders under project_name, representing the 3 major types of data - source, raw, derivatives. Each of those follows all the spec.

Thanks,
nir

Hi @nirjac

You can use triple-backticks as follows:

```
<pre-formatted text>
```

This will preserve the spaces. I’m guessing you had something like:

Project Name/
  raw_bids/
    sub-0001/…
    sub-0002/…
    …
  derivatives/
    pipeline1/
      sub-0001/…
      sub-0002/…
    …
  sourcedata/

This is perfectly fine. In this case your BIDS directory would be Project Name/raw_bids/, which is a valid name, and derivatives datasets may be standalone. derivatives/ and sourcedata/ are only required directory names when you want to nest datasets, but even here there’s some flexibility. My preferred arrangement is actually the inverse of the prototype from the spec:

Project/
  derivatives/
    fmriprep/
      sourcedata/  # original dataset as a datalad submodule
        sub-01/
        ...
      code/
        fmriprep_<version>.simg
      sub-01/
      ...

This allows the specific version of the source data and the preprocessing pipeline to be stored with the derivatives dataset.

to clarify,

Yes, Project/raw_bids is indeed my bids folder (input for bids app).
Project/derivatives is my output folder (when calling fMRI prep for example).
That should be ok under bids derivatives extension? should be able to work with fitlins and future derivative apps?

Yes, this should be fine.

So, do I have this right: if I have a standalone derivatives dataset, I don’t need the derivatives folder? At the moment having just the derivatives in the root with no subject directories is giving me the following error.

Error 1: [Code 45] SUBJECT_FOLDERS

Click here for more information about this issue

There are no subject folders (labeled “sub-*”) in the root of this dataset.

Hi @dprice80,

Thank you for your question. I believe you do need to identify the derivatives folder (so the validator should see and skip because derivatives has been officially merged in the specification yet). Then within derivatives may not need subject directories because that has been specified yet

Reopening this thread (sorry for previous post! Accidentally hit the hotkey to submit post before I was finished):

Since this original thread, have there been updates to the recommended BIDS raw vs. derivatives structure? I’m currently working on following this tutorial to run FSL GLMs on BIDS-compliant fmriprep-ped data. The tutorial depends on pybids, so I’ve been taking a look at that package.

I notice that in the newest version of pybids , once you’ve initialized a BIDSLayout object pointing to a raw BIDS directory, it allows you to automatically get entities in the derivatives folder, but only if the derivatives folder is inside the raw BIDS folder. Otherwise, it appears that you can manually specify the location of the derivatives folder using BIDSLayout.add_derivatives(path="/absolute/path/to/derivatives"), but that’s more verbose. (I tried it with a relative path from the raw BIDS folder, but no dice)

Does this indicate that the recommended raw + derivatives setup is as below?

project/
  raw_bids/
    derivatives/
      pipeline1/
        sub-01/...
        sub-02/...
        ...
    sub-01/...
    sub-02/...
    ...

thanks for your help/recommendations!