Figuring out whether this directory is BIDS compliant, any help much appreciated!

Hi all,

We are learning to use BIDS and could use your input for the questions in the attached slide-image (sorry it is an image, but the visual structure really helps with the directory intuitively, so did not want to change that). Any suggestions welcome! This will be used to make material that we share online for others (e.g., a manual on resting-state analysis), so we want to make it easy for others to be BIDS compliant and your suggestions will help many others as well.

Thanks,

Ann*

ps. the image says ‘see next slide’ but new users are only allowed to upload 1 image so I could not share the whole presentation I had

Hi,
Thanks for reaching out! The best way to check if a given dataset is BIDS compliant is to run bids-validator on the folder. It runs on the command line and in the browser so should be easy to use.

Looking at your hierarchy the first thing that jumps out is mixing raw and derived data. This is something that is not supported by BIDS (see https://github.com/bids-standard/bids-specification/blob/master/src/02-common-principles.md#sourcevs-raw-vs-derived-data for more details). Raw data should be separate from derivatives to protect them from accidental overwriting and make it easier to share only raw data or to recover disk space by removing some derivatives that can be recreated given that the raw data exist.

Going through your example it should look like this:

README
dataset_description.json
participants.tsv
sub-kim001/
   anat/
      sub-kim001_T1w.nii.gz
   func/
      sub-kim001_task-rest_bold.nii.gz
      sub-kim001_task-rest_bold.json
   fmap/
      sub-kim001_fieldmap.nii.gz
      sub-kim001_fieldmap.json
      sub-kim001_magnitude.nii.gz

You’ll notice that I did not include derived data (more about this later) and included additional JSON files. They will need to include scanning parameters and metadata. You can read what need and can be in sub-kim001_task-rest_bold.json here, similarly for sub-kim001_fieldmap.json here. I would adjust your pipelines/tutorials to rely on this metadata when performing operations such as field unwarping (this is one of the reasons why BIDS exists - so the parameters crucial for processing are part of the dataset).

Now for the derived data. Currently BIDS only specifies that derivatives should go into the derivatives/<pipeline_name> subfolder, but there is no naming convention. This is how it would look like

README
dataset_description.json
participants.tsv
sub-kim001/
sub-kim002/
derivatives/
   rs_processing/
      sub-kim001/
          <pipeline outputs go here>

You will notice that each pipeline has its own directory. This enables creation of many different types and versions of derived data for each dataset.

It’s worth noting that we are working on adding support for processed data into BIDS. This effort is quite advanced and you can read about it in BIDS Derivatives RC1. That spec is not set in stone and some details might change in the future. However, if you were to follow it this is how your outputs would look like:

README
dataset_description.json
participants.tsv
sub-kim001/
   anat/
      sub-kim001_T1w.nii.gz
   func/
      sub-kim001_task-rest_bold.nii.gz
      sub-kim001_task-rest_bold.json
   fmap/
      sub-kim001_fieldmap.nii.gz
      sub-kim001_fieldmap.json
      sub-kim001_magnitude.nii.gz
derivatives/
   rs_preprocessing/
      sub-kim001/
         anat/
            sub-001_desc-biascorr_T1w.nii.gz
            sub-001_desc-biascorr_T1w.json
            sub-001_desc-skullstripped_T1w.nii.gz
            sub-001_desc-skullstripped_T1w.json

I’m not sure what is exacly inside the FEAT folder so I cannot comment on the details.

I hope this helps!

Hi Chris,

Thanks so much for your thorough reply! I hope many people will find it helpful for their BIDS-related questions.
In reply to our specific folder hierarchy, we are indeed mixing raw and derived data, but rest assured, we keep our raw data somewhere else as well.

The specific files we are working with here are meant for cluster analysis. For that same reason, we have a data folder with subfolders for participants (e.g., all subfolders are called sub-00x, etc.), but no other folders/files in this directory, so looping through with scripts is easy. Although I am a proponent of sharing data (and what have you) and I believe that this should be done with excellent documentation (because the data is only worth something if it can actually be understood), and although I wholeheartedly agree that having standards for this is a good thing, I fear that for the current purposes (a manual for a resting-state pipeline) the BIDS format is (not yet) a good fit.
I am going to read up on making json files though (never had them aside from when converting dicoms), do you happen to have a BIDS-manual for that (one that is really in concrete easy to follow steps for the nitwits, because I must admit the BIDS spec file is too technical for me and perhaps for most researchers who are trying to do the best they can in the time they have, but who are ultimately your intended BIDS compliers)? Anything that would make doing this more accessible is welcome.

Can you help me with one last thing: I wonder whether making a ‘mix’ where some of our foldernames are BIDS-compliant and some things are not (like we don’t have json files) is in this case is a good or a bad thing: will it give the appearance of being compliant only then to disappoint? Basically what we are working with needs to be moved into a folder named derivatives/rs_preprocessing, only we would then leave the BIDS format within the sub-kim001 folder:

derivatives/
rs_processing/
sub-kim001/
sub-kim001_task-rest_bold.nii.gz
sub-kim001_T1w.nii.gz
anat/
sub-001_desc-biascorr_T1w.nii.gz
sub-kim001_task-rest_preprocessing1.feat
sub-kim001_task-rest_preprocessing2.feat
sub-kim002/
etc.
26
As you can see, feat makes it’s own folders which you can name, and because lower level analysis are at the participant level we have placed them in the participant folder. I am curious to see how the BIDS support for processed data will develop, please let me know if I can be of any assistance.

Cheers,

Anna
ps. added a screenshot because tabs did not stay in the post (I will read up on embedding like you did in your posts as well!)

I understand, but I would consider iterating over participants rather than individual files when submitting jobs. This strategy leads to fewer and longer jobs (less time spend by the scheduler setting up the jobs). This scheme is well supported by BIDS Apps.

In most cases you might not have to create JSON files manually - if you are using (directly or through a tool like heudiconv) dcm2niix those can be automatically created if a specific command line flag is passed. However, if you are looking for a more down to earth introduction to BIDS check out https://github.com/bids-standard/bids-starter-kit which includes tutorials, examples and many other useful resources.

It depends. From a human readability perspective adhering to your own fork of a well adopted standard might be better than coming up with something completely new if you describe what the differences are. From a machine readability (think analysis pipelines or scripts taking BIDS datasets as inputs) strictly adhering to standards is the only way to go.

Have a look at https://docs.google.com/document/d/17ebopupQxuRwp7U7TFvS6BH03ALJOgGHufxK8ToAvyI/edit and leave a comment on the document when something is not clear or needs changing. Thanks!