I’m currently analyzing a dataset, and I would like for each step of the analysis to be completely automated, so that I could publish this dataset and have the analysis be replicated exactly. (I’m not editing freesurfer surfaces, so I shouldn’t actually need to do a lot of interacting with the pipeline.)
The BIDS framework and all of its apps make that pretty simple – I run heudiconv on a list of subjects to get BIDS directories. I run MRIQC and fmriprep to check data quality and do preprocessing. Then I have a nipype modeling script, et cetera.
The thing I’m struggling with is this: in most large projects with a bunch of subjects, you’ve got some one-off subjects that you need to exclude. Maybe there’s an excessive amount of motion and the data is garbage. Maybe there’s a run where the projector turned off midway through. Maybe there’s one subject with a really unfortunately slice prescription that you don’t want to include in group analysis because the intersection of his mask with the other masks excludes too much data.
How is this documented and managed? Is there a BIDS standard for this? I’d like to keep the data in the dataset that we upload, and even if I didn’t, I wouldn’t want to deal with this in the heudiconv heuristics file (“if TRs = 128 and task = ‘faces’, process it, unless it’s the 2nd run of subject 8, or the 3rd run of subject 10, or …”)
Ideally, there’d be something like an “excluded runs” file somewhere so that there was documentation of the bad runs in a standardized place, and also so that by the time modeling scripts were active, they could intelligently exclude running first-level models on garbage data.
The core of this problem is that “exclusion of runs” is your particular interpretation of quality of the data which is dependent on what tools you used to asses it and what you are planning to use the data for. So the answer which runs to keep will differ from one person to another and from one analysis to another (T1w scans with some motion could be good as intermediate coregistration target, but not good for cortical thickness measurements).
At the moment the spec does not specify how to do this, but you can do the following:
Add a known issues section to the README describing what you found problematic about specific runs
Is this something that has been implemented/could be accessed when running fMRIPREP? For example, I have some subjects who have two T1’s, and one of them has much more motion distortion than another, so I’d like to have FMRIPREP only use the good T1 rather than averaging them together. How could I point fMRIPREP to that column in order to decide which T1 to use?
I know I could just put a number of specific subject file names into the .bidsignore file, but that doesn’t seem to be the best long-term solution. We have a similar situation with some rest scans, where we want to exclude one of two rest scans when we saw that the subject fell asleep.
This would be an interesting new feature. However, FMRIPREP currently does a robust averaging of T1w images that excludes outlier voxels. Check if the output volume looks good. Perhaps manual exclusion is not necessary.
I agree (and I’m not sure if FMRIPREP uses .bidsignore anyway). A solution based on a specific column in _scans.tsv would probably be best.
Thanks, @ChrisGorgolewski! I’ll check how it runs on a few people with that situation.
As for the rest scans (wanting to ignore specific scans where subjects fell asleep), I’ll see if the .bidsignore workaround does anything at all. I think it is looked at at some point, as I used it to ignore fieldmap scans that I don’t want to use (before I realized there was a flag for that). But the column idea seems a much better solution longterm.
I wanted to post a question here as it relates to the original post. I have a BIDS data set that includes multiple runs of multiple tasks and multiple rest scans. There’s one task at the moment that I’m interested in having fmriprep conduct the pre-processing, but it’s still a little fuzzy to me on how to get fmriprep to ignore specific scans. I can move the files I’m not interested in or copy and then pare down the root bids directory to contain only the scans I’d like to process, but I’d like to know what other options are out there before proceeding. Any thoughts on this would be greatly appreciated.
Hi @tsalo, please forgive my ignorance on this issue, but I have to be missing something here. If I use the --task-ID flag with MID as the argument, I don’t get any of the expected output. No processing of functional data, but I do get _desc-preproc_T1w .json and .nii.gz files in the anat folder of fmriprep/sub- output directory. This approach took about 30 minutes to complete so I’m sure something isn’t right.
If I remove the --task-ID flag, then I get the expected output. All anatomical and functional scans are processed, normalized, etc. The problem though is that I have rest data with multiple runs, three different tasks with two runs each and I’m only interested in processing one task. So processing the whole enchilada takes a bit longer and takes up un-necessary disk space.
It seems as if the freesurfer step isn’t running since that’s one that takes up quite a bit of time. I’ve looked at the usage notes for the version that I’m using and I’m not seeing anything that may be missing. I see there’s an --anat-only, but that doesn’t sound right to include. What am I missing from this?
MID data path:
Each MID task has two runs and each *bold.nii.gz file is accompanied by a name-matched .json file. I’m starting to wonder if it has something to do with how the func, anat, fmap folders are nested in a session directory.
Thanks @tsalo! I ran that same subject with the --sloppy command and it ran just fine. I was hoping that it would just run quicker, but what ended up happening was that it didn’t stop after about 30 minutes like before. I then tried eliminating the final backslash and that worked too. When I ran it with the --sloppy flag it didn’t have a back slash at the end, so my guess at this point is that the backslash was the culprit. I had included the backslash originally as a way to double check the command in the terminal before running it, but guess it throws things off.
Thanks again for the help.