Adding failed QC scans to BIDS dataset

We are converting to BIDS a dataset that was QCed several years ago. Initially, we did not include the failed QC scans, because we thought they are useless anyway. But we noticed some failed QC scans are still good for some purposes (the structure of interest is very good and sharp). So, following the example of a BIDS dataset provided by one of our colleagues, we would like to add the failed QC data but mark them with BAD* preffix, i.e., sub-control01_BADT1w.nii.gz.

We know these pseudo-modalities need to be added to .bidsignore. But we wanted to ask the community if the various pipelines (i.e., fmriprep, qsiprep, freesurfer) are smart enough not to go pick up these bad files during standard processing. This would occur if a pipeline uses globbing to find e.g. all *T1W* in a folder. We plan to do this for all the modalities we have, including bold and dwi data, so the question is rather general if this approach of including failed QC scans can work without compromising bids apps.

Thank you.

The usual approach for BIDS apps is to specifically ask for the T1w suffix, which means it will expect _ before and . after. I can confidently say that fMRIPrep will not use BADT1w, and I strongly suspect qsiprep will be the same (@mattcieslak?) since they reused a lot of fMRIPrep code. Looking at the FreeSurfer BIDS App, every “T1w” is of the form _T1w.nii*, which should be safe as well.

Sounds great. Thanks @effigies

QSIPrep will also honor the bidsignore, but this seems like a risky strategy. Our group has recently been using datalad to keep a branch with all the bad data in it and a main branch with just the good, fully bids-valid data. If anyone really wants to get the bad data they can check out that branch

1 Like

To be clear, fMRIPrep does not honor .bidsignore. It simply will not find these images because they don’t match the patterns we look for.

We use datalad too, and it’s quite a good idea to keep the whole dataset including failed QC in a different branch. But the concept of branches in datalad was a bit unclear with errors when merging remotes from different places if I remember correctly. On the other hand, we are preparing data mostly for shipping out, we don’t use much the data ourselves. I tried using datalad to export a 1.3Tb dataset on a USB drive and it took nearly a month (this happened in August). So, using datalad branches can be useful locally, but I don’t plan to ship datalad datasets to collaborators so they can switch branches as needed. Maybe datalad will get better in the future, but for now the main use would be local. Still, keeping a “clean” and an “dirty” branch from a QC perspective is a good idea.

@mattcieslak just to understand better your workflow.

If you change something in the “clean” branch, then you would need to merge changes to the “dirty” branch each time, right? Otherwise the two branches would not differ just in the presence of failed QC data but also in other stuff. So, keeping two branches will need some maintenance to keep them synced on the portion of data they have in common, correct?

In our group’s workflow we create a version of the BIDS data that includes all the images, including images we don’t plan on using, and commit to a branch. Then we check out the main branch, delete all the images that won’t be used, then commit and never change the BIDS data again. The outputs go in separate datalad datasets.

1 Like

I guess our use is a bit different then. We are in the process of cleaning and improving these datasets, and until that is done we keep datalad-saving. I think once we have a good final dataset including bad scans we might create a “clean” branch.
Pinging @ins0mniac2