Uploading multiple BIDS datasets into a single project

datalad
#1

Hi,

We have created a multi-site quantitative MRI dataset, part of it is currently hosted on OSF. There is a single folder, which encloses 32 sub-folders (one for each site). Each sub-folder is a BIDS-compatible dataset.

We would like to use the nice features of datalad with this dataset (e.g., being able to fetch individual subjects, track changes, etc.), however it is unclear how we can combine all those BIDS datasets into a single project. We would have to have easy “one-liner” commands for our users to be able to download all the datasets at once. Our original plan was to upload these data on OpenNeuro (and reference it through datalad), however there doesn’t seem be a feature allowing to upload multiple BIDS datasets into a single project (see below a copy/paste of the ticket opened at OpenNeuro).

If anyone has some tips for us, that would be appreciated,

Thanks!
Julien

Copy/paste of the ticket with OpenNeuro (sorry for the messy format, but pdf upload is not possible here, and OpenNeuro tickets are private…)

Hi Alexandru,
Thank you for your quick reply and information. If there is a natural way to split the datasets up (if applicable, potentially under a common domain investigated for each or some other defining feature that separates the different sets). The datasets can all reference each other in the references and links field of the dataset_description.json file (https://bids-specification.readthedocs.io/en/stable/03- modality-agnostic-files.html#dataset_descriptionjson). That's correct, OpenNeuro expects the datasets are in the BIDS format and pass validation before uploading.
Regarding the datalad perspective for aggregation, I am not sure if this is possible. It may be good to raise it with Datalad.
Thank you for submitting a feature upvote suggestion for this!
Thank you, Franklin
Ticket: https://openneuro.freshdesk.com/helpdesk/tickets/246

On Mon, 6 May at 2:28 PM , Alexandru Foias <alexandru.foias@polymtl.ca> wrote:
Hi Franklin,
Our specific study include multiple datasets with different subjects. The idea would be to aggregate all the datasets under a single project, but my understanding is that the open neuro platform expects a bids dataset when you upload it. Please correct me if i'm wrong.
Is there another solution to aggregate the datasets in a single project from the datalad perspective ?
Thanks,
Alexandru

From: "OpenNeuro" <support@openneuro.freshdesk.com> To: "alexandru foias" <alexandru.foias@polymtl.ca>
Sent: Monday, May 6, 2019 1:19:13 PM
Subject: Re: multiple dataset in a single project
Hi Alexandru,
Thank you for your message. Unfortunately, we do not currently have this feature supported. May I ask what your use case is? Are you trying to download multiple datasets at the same time?
Thank you, Franklin

Ticket: https://openneuro.freshdesk.com/helpdesk/tickets/246
On Mon, 6 May at 12:31 PM , Alexandru Foias <alexandru.foias@polymtl.ca> wrote:
Hi,
I would like to know if it's possible to compile multiple datasets in a single project.
Regards,
Alexandru
#2

This is not related to the datalad issue, but perhaps you should consider representing this data as a single BIDS dataset. As you can read at https://bids-specification.readthedocs.io/en/stable/05-longitudinal-and-multi-site-studies.html#option-2-combining-sitescenters-into-one-dataset

Alternatively you can combine data from all sites into one dataset. To identify which site each subjects comes from you can add a site column in the participants.tsv file indicating the source site. This solution allows you to analyze all of the subjects together in one dataset. One caveat is that subjects from all sites will have to have unique labels. To enforce that and improve readability you can use a subject label prefix identifying the site. For example sub-NUY001 , sub-MIT002 , sub-MPG002 etc. Remember that hyphens and underscores are not allowed in subject labels.

After all this dataset is only 2.4Gb and will be presented in a single paper (and I assume you will analyze data from all sites together). This approach will help with OpenNeuro upload.

2 Likes
#3

If you do decide to go the multi-dataset route, then datalad can help you here.

Supposing you have a set of datasets with urls $DS1, $DS2, and $DS3:

datalad create dsmeta
cd dsmeta
datalad install -d . $DS1 $DS2 $DS3

This will not be a BIDS dataset (though I think we do need to consider meta-datasets in the near future), but it does allow you to aggregate groups of BIDS datasets in a version-controlled manner.

2 Likes
#4

Thanks a lot for your quick and helpful replies @ChrisGorgolewski and @effigies.

We did consider the single-BIDS-data approach, but one limitation was the (slight) difficulty in sharing a subset of data coming from a single site only (easier if it’s already an isolated standalone BIDS dataset). But reading from your answers, it seems like the benefits of having a single BIDS multi-site dataset overcome the disadvantages. We will go that route then :slight_smile:

Again, thanks a lot for your help, and for maintaining these awesome technologies!
Julien

1 Like