Bids-database-dir ValueError

ColleenHughes · January 4, 2022, 5:56pm

My labmate just brought up the good point that pybids might be indexing the nonBIDSified folders for each subject that have many QSM dicoms and nii files for scan types not in BIDS. So I wonder if there’s a way to tell pybids to ignore those files (they are already in bidsignore), which might speed up the process.

Steven · January 4, 2022, 6:30pm

pybids should ignore files in bidsignore too. With all of those non bids-compliant files, I think it would be okay to try the single-subject bids directory. It would go something like this (adapt to your needs):

workdir=/path/to/scratch #Set this as your scratch directory
mkdir -p $workdir/${subj}_db/${subj}/func #Make the single-subject directory in scratch space
mkdir -p $workdir/${subj}_db/${subj}/anat
cp $bids_root_dir/dataset_description.json $workdir/${subj}_db/ #Copy dataset_description
cp $bids_root_dir/$subj/func/* $workdir/${subj}_db/${subj}/func #Copy functional files
cp $bids_root_dir/$subj/anat/* $workdir/${subj}_db/${subj}/anat #Copy anatomical files

Then, in the fMRIPrep command, simply substitute the input directory argument (previously occupied by $bids_root_dir) with the single subject temporary directory ($workdir/${subj}_db/).

Does this make sense?

ColleenHughes · January 4, 2022, 7:07pm

Yes that makes sense. Will give that a shot on a few subjects.

Steven · January 4, 2022, 7:17pm

Also make sure that the scratch directory is mounted, and that you do not pass in the pybids database

ColleenHughes · January 6, 2022, 10:59pm

Just following up to say that treating individual subjects as their own BIDS directories works as a solution to the issue of indexing on a large dataset taking a long time. Thanks for your help, @Steven

ColleenHughes · January 7, 2022, 6:00pm

Hi, a question resulting from this this suggestion. I removed --fs-no-reconall and specified pre-run freesurfer outputs using --fs-subjects-dir. I see how that changed some of the registration methods, which is fine. However, the output reports from the same command list freesurfer reconstruction as “pre-existing directory” or “run by fmriprep” inconsistently across subjects.

As far as I can tell, there are no new freesurfer outputs in derivatives or in the pre-run directory for the latter case. And I didn’t see anything in the log suggesting reconall was run (each took ~1 hour). I was not keeping the tmp files in a working directory, but testing that now with subjects from each case. Last, the pre-run freesurfer file trees for subjects from both cases look similar except in the freesurfer tmp dir (tree_fs_sub-0005.txt (13.3 KB) is “pre-exist” and tree_fs_sub-0010.txt (15.7 KB) is “run by fmriprep”). This thread had a similar question. All in all, safe to assume the pre-run freesurfer outputs were being used for all subjects regardless of what the report states?

Steven · January 7, 2022, 6:21pm

Hmm, that is strange. I don’t know what is causing the difference but I think its safe to assume FS outputs are being used given the short run times. Just to be sure, check to see if a sourcedata folder is created in those single subject scratch directories. In 21.0.0+ that is the new default where FS outputs are written to. If nothing is created, then I’d be more confident FS was not rerun and old outputs were used as intended.

ajschadler · January 7, 2022, 7:02pm

Hate to hijack this thread:
what is the reasoning behind the convention of mounting a path to the docker root dir?
i.e. docker run --rm -v /home/hamilton/my_dataset/:/data -v /home/hamilton/my_dataset/derivatives:out <some_image>

why not:
docker run --rm -v /home/hamilton/my_dataset:/home/hamilton/my_dataset <some_img>?

Steven · January 7, 2022, 7:35pm

No problem! Part of it is it makes for easier documentation on the website. The names “data” and “out” are pretty clear indicators of what should go there. Second, and I’ve only seen this once, some commands will fail if the path is too long (no reason why), so renaming it like in the first way is a way of preventing that. Finally, some people might want their outputs in a different drive (although I wouldn’t recommend it since that’s likely not BIDS complaint). Whenever I run Singulairty commands, I lean towards just mounting the BIDS root and not renaming it, which as discussed earlier in the thread is better for pybids databases.

Also, in your second reply, you don’t need the :/…. after specifying what drive you want mounted if you are going to rename it the same thing in docker.

ajschadler · January 7, 2022, 7:42pm

Good thing to know. Been moving to using full paths with docker ever since I started the move to singularity (often use both packages on different work stations), so thank you for reassurance that doing so is not bad form!

True, though it got confusing since /out should be /data/derivatives in BIDS (unless the assumption is that researchers will do processing in some scratch directory then merge outputs into a full BIDS-dataset).

ColleenHughes · January 7, 2022, 9:18pm

Interesting update, I re-ran those two subjects (sub-0005 and sub-0010) and both now say “pre-existing directory” for the freesurfer reconstruction. No “sourcedata” directory and the jsons in the working directories have the right path to the pre-run freesurfer data, as would be expected. Oddly, this is the second time I’ve observed that re-running fmriprep changes the freesurfer status (no prior working directory, new output location for derivatives). Will try to run a few more folks to see if I can get a working directory on someone who has the “run by fmriprep” status, as it would not be ideal to re-run the ~1/4-1/3 of our 200 subjects for whom this occurs.

ColleenHughes · January 11, 2022, 5:50pm

Just a quick update that I am not seeing the “freesurfer reconstruction: run by fmriprep” status on new subjects - it correctly identifies the status as “pre-existing directory”. The changes I made were to reset my derivatives directory and save the working directory. I don’t see a sourcedata directory or other freesurfer outputs in either and the runtime was ~1 hour so I feel okay saying that fmriprep is using the pre-run freesurfer outputs correctly.