Exclude a specific scan using regexp in BIDS filter file?

A collaborator has recently sent us a BIDS dataset with 3 diffusion series. Unfortunately, we cannot process the third, due to reduced FOV:

  1. sub-01_dwi.nii.gz
  2. sub-01_acq-single_dwi.nii.gz
  3. sub-01_acq-multib1000b2000_dwi.nii.gz

I would like to keep all 3 in the raw bids dataset, but exclude the 3rd scan from processing (QSIprep).

I have tried the following BIDS filter files, but they either fail to exclude the 3rd dataset or select only the second. How might I improve these to only select the 1st and 2nd series, specifically with regex (still new to regex)?

Thought a negative lookahead would work here. Selected all 3 series.

{
    "dwi": {
        "acquisition": (?!multib1000b2000),
        "regex_search": "true"
    }   
} 

Thought including "single" and null for acquisition would select 1 and 2. Only selected 2.

{
    "dwi": {
        "acquisition": ["single", null]
    }   
} 

You can try adding */dwi/*multi* to a .bidsignore file, assuming all the files you want to ignore have “multi” in them exclusively.

Sadly, this seems to only affect the BIDS validator. QSIprep still reads in the *multi* acquisition.

  1. Download pybids
  2. Create a pybids database: pybids layout <bids_root> <database_dir> --no-validate --index-metadata , making sure that the .bidsignore from before that instructs to ignore those multi scans is still present
  3. Pass the database_dir into QSIPrep with the --bids-database-dir argument

Unfortunately, the pybids database option did not exclude the *multi* file.

.bidsignore:

*/dwi/*multi*

pybids call:

pybids layout /home/user/BIDS/ \
    /home/user/BIDSdb/ \
    --no-validate --index-metadata

qsiprep was called using the --bids-database-dir option:

singularity run /home/user/Images/qsiprep_0.14.3.sif \
    /home/user/BIDS/ \
    /home/user/BIDS/derivatives \
    participant --participant-label sub-01 \
    --bids-filter-file /home/user/BIDS/code/filters/bids_filter.json \
    --bids-database-dir /home/user/BIDSdb/ \
    --output_resolution 1.7

Might there be an issue with where I saved the sqlite database generated?

I’m not totally sure why those files aren’t being excluded, but I can tell you how I’d go about solving this problem.

In our group we typically process each subject on their own and combine the results after all the subjects have finished. The BIDS data exists somewhere on a network drive, then when a job is sent out to the scheduler, it copies a single subject to a local scratch directory and runs qsiprep on the local copy. If you do something like this, you can just delete the *multi* scans from your local BIDS copy before you run qsiprep. This also avoids the huge time drain of letting pybids index everything at the beginning because qsiprep will see a BIDS input with a single subject.

This solutions seems the simplest, thank you!

Hi,

I was hoping to follow up on your comment about copying subjects to a local scratch directory for processing. Currently, we create a temporary bids directory with a symlink pointing to the subject’s raw directory. For handling of derivative datasets, I realize that some pipelines have data-files present that they use for handling some of their operations (e.g. pyAFQ creating a tract-profiles spreadsheet).

I was wondering if your group has come along any solutions you could recommend for making sure part of the temporary bids-directory you create for each subject also points to relevant derivative datasets?
My naive approach is:

  1. Recreate the base directory tree of derivatives
  2. Symlink subject-specific directory for each pipeline directory
  3. Symlink any file or directory which does not contain '^sub-.*$'

Hi,

This is a bit vague for me to understand completely, but…

You can create a derivatives folder in the temporary bids folder that just contains the preprocessed derivatives for that subject. I have done this with qsiprep/pyAFQ, as well as post-processing of fMRIPrep data too, for what it’s worth. pyAFQ in particular has a ParticipantAFQ api that is meant for single-subject processing.

Best,
Steven

Thank you for the suggestion! For the management of files such as derivatives/afq/tract_profiles.csv, do you manually merge these after processing?

I haven’t done a full analysis with pyafq, so I cannot say for sure whether a merge would make sense with its group analysis API.

I guess the other option would be to just run single-subject analyses first, then run a group-analysis after all subjects finish.