Recommendations for Generating PyBIDS Layout on a Large-ish Dataset

Hello,

I’m working on a longitudinal study and our dataset is growing quickly (hooray!) However, I’ve noticed that as we acquire more data, it is taking longer and longer for BIDS Apps like fmriprep to get going. My guess is that fmriprep and many other programs start by creating a layout of the BIDS directory (using PyBIDS or a similar tool).

For fmriprep, we are using version 22.0.2 in a singularity image (we strongly prefer to keep this version because of the longitudinal nature of our data). I’m using the flags --skip-bids-validation and --bids-filter-file to try and speed up the BIDS directoy layout creation, but there doesn’t seem to be any speed up.

My next idea was to try generate the default fmriprep layout as a file using the PyBIDS command: pybids layout /PATH/TO/BIDS/DIRECTORY/ /PATH/TO/DATABASE/FILE --no-validate --index-medata Unfortunately, I’m several hours in and have yet to generate a file. I was thinking I could automate a procedure to create one of these files every night as we acquire more data. But now I’m skeptical if it could even run in a few hours at night.

Our database consists of ~150 subjects who have each completed between 1-3 multimodal imaging sessions (modality folders for anat, func, fmap, pet, etc.) This is a large dataset by many standards, but I assumed BIDS apps could run on much larger datasets (e.g. HCP). I’m eager to hear from others about similar problems? Perhaps I’m missing a simple solution? Thanks!

1 Like

Right away one thing I can say is that in PyBIDS v0.16.2, we implemented some speeds ups for indexing. It’s not clear to me if your version of fMRIPrep uses this version. Can you check?

That said, it’s still a known issue that PyBIDS can get slow with large datasets.

We have some somewhat active plans to replace the PyBIDS indexing engine with one that is more performant, but there hasn’t been much bandwidth to work on it.

Indexing meta-data definitely slows down PyBIDS simply because many files have to be opened.

@effigies may have some fMRIPrep specific guidance.

Perhaps you can run fMRIPrep just one the new subjects, and limit the layout creation to those subjects?

1 Like

Thanks for your reply! My version of fmriprep (22.2.0) is using PyBIDS v0.15.1

I am using the --participant flag in fmriprep and filtering for a specific subject and session using the --bids-filter-file flag. So I think I should be limiting analysis to just the new subjects. But it appears to still create the layout for the entire dataset, (simply based on how long it takes).

Sorry, I mean PyBIDS 0.16.2! That’s probably a big reason alone.