Bids-database-dir ValueError

Actually…I just added pybids in and it threw the ‘missing dataset_description.json’ error. Going to come back to this tomorrow.

docker run -ti --rm --user $(id -u) \
-v $bids_root_dir:/data \
-v $bids_deriv_dir:/out \
-v $fs_dir:/fssubjects \
-v $pybids_dir:/pybids_db \
-v $FS_LICENSE:/license.txt \
nipreps/fmriprep:21.0.0 \
/data /out \
participant \
--participant-label $subj \
--skip-bids-validation \
--bids-database-dir /pybids_db \
--fs-license-file /license.txt \
--md-only-boilerplate \
--fs-subjects-dir /fssubjects/sub-${subj} \
--output-spaces T1w \
--ignore slicetiming \
--dummy-scans 4 \
--skull-strip-t1w force \
--me-output-echos \
--low-mem

More things:

  1. fs-subjects-dir should point to the folder where all Freesurfer folders live. That is, it should contain a folder called $SUBJECTNAME that contains that subject’s freesurfer outputs. So in your case, --fs-subjects-dir /fssubjects should be appropriate, leaving out the subject specific part.
  2. Again, if you are using a pybids database, do not rename the mounts in the -v commands.

Make sure $bids_root_dir points to the full BIDS root directory, and the pybids_dir is made using the same pybids version that fMRIPrep uses, and is made on the same full BIDS root directory, then try the following:

docker run -ti --rm --user $(id -u) \
-v $bids_root_dir \
-v $bids_deriv_dir \
-v $fs_dir \
-v $pybids_dir \
-v $FS_LICENSE:/license.txt \
nipreps/fmriprep:21.0.0 \
$bids_root_dir $bids_deriv_dir \
participant \
--participant-label $subj \
--skip-bids-validation \
--bids-database-dir $pybids_dir \
--fs-license-file /license.txt \
--md-only-boilerplate \
--fs-subjects-dir $fs_dir \
--output-spaces T1w \
--ignore slicetiming \
--dummy-scans 4 \
--skull-strip-t1w force \
--me-output-echos \
--low-mem

Okay, I will re-run the full BIDS directory with pybids 0.14.0 (will take a couple hours) then try this solution ensuring that $bids_root_dir and pybids_dir are based on the same full directory. Thank you very much for the quick and helpful replies!

1 Like

Same ‘missing dataset_description.json’ error if I used the 0.14.0 pybids full database and the direct calls to my paths. The dataset is bids valid, but contains many different types of scans and some that are not yet included in bids (put in bids ignore file). So, if I don’t pre-index it hangs for a very very long time on that stage - I’ve actually not had it start fmriprep when I run just sub-0196 but using the full directory path. I’ll let the not pre-indexed code run for a full 24 hours just to see if it ever gets going. But if that happens each time, it’s not practical. I suppose I can do the temporary single subject dirs as a workaround, if needed.

Can you confirm the dataset description is not in the bidsignore file? Can you also print the outputs of the tree Command run on the bids data directory roof, and then on a single subject?

Yes, the dataset description file is not in bidsignore.

Here is the file tree for sub-0196 (using the -I flag to ignore a directory containing non-BIDS formatted files e.g., QSM dicoms that are in bidsignore).

collhugh@albany:/lbc/lbc1/PREVENT_AD$ tree PREVENT_AD_BIDS/sub-0196 -I 'nonBIDSified'
PREVENT_AD_BIDS/sub-0196
└── ses-01
    ├── anat
    │   ├── sub-0196_ses-01_acq-MPRAGE_T1w.json
    │   ├── sub-0196_ses-01_acq-MPRAGE_T1w.nii
    │   ├── sub-0196_ses-01_acq-space_T2w.json
    │   ├── sub-0196_ses-01_acq-space_T2w.nii
    │   ├── sub-0196_ses-01_FLAIR.json
    │   ├── sub-0196_ses-01_FLAIR.nii
    │   ├── sub-0196_ses-01_inv-1_MP2RAGE.json
    │   ├── sub-0196_ses-01_inv-1_MP2RAGE.nii
    │   ├── sub-0196_ses-01_inv-2_MP2RAGE.json
    │   ├── sub-0196_ses-01_inv-2_MP2RAGE.nii
    │   ├── sub-0196_ses-01_T1map.json
    │   ├── sub-0196_ses-01_T1map.nii
    │   ├── sub-0196_ses-01_UNIT1.json
    │   └── sub-0196_ses-01_UNIT1.nii
    ├── dwi
    │   ├── sub-0196_ses-01_dir-PA_dwi.bval
    │   ├── sub-0196_ses-01_dir-PA_dwi.bvec
    │   ├── sub-0196_ses-01_dir-PA_dwi.json
    │   └── sub-0196_ses-01_dir-PA_dwi.nii
    ├── fmap
    │   ├── sub-0196_ses-01_dir-AP_epi.json
    │   └── sub-0196_ses-01_dir-AP_epi.nii
    └── func
        ├── sub-0196_ses-01_task-rest_run-01_echo-01_bold.json
        ├── sub-0196_ses-01_task-rest_run-01_echo-01_bold.nii
        ├── sub-0196_ses-01_task-rest_run-01_echo-02_bold.json
        ├── sub-0196_ses-01_task-rest_run-01_echo-02_bold.nii
        ├── sub-0196_ses-01_task-rest_run-01_echo-03_bold.json
        └── sub-0196_ses-01_task-rest_run-01_echo-03_bold.nii

5 directories, 26 files

And here is the file tree one level deep for the bids data dir:

collhugh@albany:/lbc/lbc1/PREVENT_AD$ tree PREVENT_AD_BIDS -I 'nonBIDSified' -L 1
PREVENT_AD_BIDS
├── allsubs.txt
├── dataset_description.json
├── fsissues
├── ignore_derivatives
├── README.md
├── scripts
├── sub-0002
├── sub-0005
├── sub-0010
├── sub-0015
├── sub-0018
├── sub-0020
├── sub-0026
├── sub-0032
├── sub-0033
├── sub-0037
├── sub-0039
├── sub-0040
├── sub-0041
├── sub-0047
├── sub-0051
├── sub-0054
├── sub-0062
├── sub-0079
├── sub-0080
├── sub-0082
├── sub-0085
├── sub-0086
├── sub-0101
├── sub-0108
├── sub-0112
├── sub-0125
├── sub-0128
├── sub-0139
├── sub-0157
├── sub-0159
├── sub-0160
├── sub-0165
├── sub-0166
├── sub-0172
├── sub-0173
├── sub-0174
├── sub-0177
├── sub-0179
├── sub-0181
├── sub-0184
├── sub-0186
├── sub-0187
├── sub-0190
├── sub-0191
├── sub-0194
├── sub-0196
├── sub-0198
├── sub-0203
├── sub-0212
├── sub-0225
├── sub-0228
├── sub-0230
├── sub-0235
├── sub-0243
├── sub-0249
├── sub-0252
├── sub-0254
├── sub-0263
├── sub-0264
├── sub-0268
├── sub-0271
├── sub-0276
├── sub-0278
├── sub-0279
├── sub-0282
├── sub-0284
├── sub-0287
├── sub-0292
├── sub-0297
├── sub-0304
├── sub-0308
├── sub-0311
├── sub-0317
├── sub-0319
├── sub-0333
├── sub-0336
├── sub-0343
├── sub-0347
├── sub-0348
├── sub-0349
├── sub-0350
├── sub-0353
├── sub-0354
├── sub-0360
├── sub-0363
├── sub-0365
├── sub-0366
├── sub-0371
├── sub-0374
├── sub-0375
├── sub-0376
├── sub-0377
├── sub-0380
├── sub-0381
├── sub-0384
├── sub-0387
├── sub-0390
├── sub-0391
├── sub-0392
├── sub-0393
├── sub-0394
├── sub-0395
├── sub-0403
├── sub-0404
├── sub-0409
├── sub-0414
├── sub-0425
├── sub-0427
├── sub-0431
├── sub-0432
├── sub-0442
├── sub-0448
├── sub-0453
├── sub-0457
├── sub-0458
├── sub-0461
├── sub-0462
├── sub-0469
├── sub-0473
├── sub-0476
├── sub-0488
├── sub-0492
├── sub-0494
├── sub-0501
├── sub-0504
├── sub-0505
├── sub-0509
├── sub-0510
├── sub-0520
├── sub-0521
├── sub-0527
├── sub-0528
├── sub-0529
├── sub-0531
├── sub-0534
├── sub-0537
├── sub-0538
├── sub-0541
├── sub-0544
├── sub-0546
├── sub-0550
├── sub-0552
├── sub-0555
├── sub-0569
├── sub-0572
├── sub-0574
├── sub-0580
├── sub-0587
├── sub-0588
├── sub-0589
├── sub-0595
├── sub-0598
├── sub-0599
├── sub-0602
├── sub-0603
├── sub-0606
├── sub-0607
├── sub-0608
├── sub-0609
├── sub-0611
├── sub-0622
├── sub-0631
├── sub-0632
├── sub-0633
├── sub-0634
├── sub-0648
├── sub-0649
├── sub-0653
├── sub-0663
├── sub-0665
├── sub-0668
├── sub-0669
├── sub-0672
├── sub-0674
├── sub-0681
├── sub-0688
├── sub-0694
├── sub-0696
├── sub-0703
├── sub-0706
├── sub-0709
├── temp
└── weirdsubjs

186 directories, 3 files

And the contents of bidsignore

/sub-*/nonBIDSified/**
/scripts/**
/temp/**
/weirdsubjs/**
/ignore_derivatives/**
/fsissues/**
/allsubs.txt/

My labmate just brought up the good point that pybids might be indexing the nonBIDSified folders for each subject that have many QSM dicoms and nii files for scan types not in BIDS. So I wonder if there’s a way to tell pybids to ignore those files (they are already in bidsignore), which might speed up the process.

pybids should ignore files in bidsignore too. With all of those non bids-compliant files, I think it would be okay to try the single-subject bids directory. It would go something like this (adapt to your needs):

workdir=/path/to/scratch #Set this as your scratch directory
mkdir -p $workdir/${subj}_db/${subj}/func #Make the single-subject directory in scratch space
mkdir -p $workdir/${subj}_db/${subj}/anat
cp $bids_root_dir/dataset_description.json $workdir/${subj}_db/ #Copy dataset_description
cp $bids_root_dir/$subj/func/* $workdir/${subj}_db/${subj}/func #Copy functional files
cp $bids_root_dir/$subj/anat/* $workdir/${subj}_db/${subj}/anat #Copy anatomical files

Then, in the fMRIPrep command, simply substitute the input directory argument (previously occupied by $bids_root_dir) with the single subject temporary directory ($workdir/${subj}_db/).

Does this make sense?

Yes that makes sense. Will give that a shot on a few subjects.

Also make sure that the scratch directory is mounted, and that you do not pass in the pybids database

1 Like

Just following up to say that treating individual subjects as their own BIDS directories works as a solution to the issue of indexing on a large dataset taking a long time. Thanks for your help, @Steven

Hi, a question resulting from this this suggestion. I removed --fs-no-reconall and specified pre-run freesurfer outputs using --fs-subjects-dir. I see how that changed some of the registration methods, which is fine. However, the output reports from the same command list freesurfer reconstruction as “pre-existing directory” or “run by fmriprep” inconsistently across subjects.

As far as I can tell, there are no new freesurfer outputs in derivatives or in the pre-run directory for the latter case. And I didn’t see anything in the log suggesting reconall was run (each took ~1 hour). I was not keeping the tmp files in a working directory, but testing that now with subjects from each case. Last, the pre-run freesurfer file trees for subjects from both cases look similar except in the freesurfer tmp dir (tree_fs_sub-0005.txt (13.3 KB) is “pre-exist” and tree_fs_sub-0010.txt (15.7 KB) is “run by fmriprep”). This thread had a similar question. All in all, safe to assume the pre-run freesurfer outputs were being used for all subjects regardless of what the report states?

Hmm, that is strange. I don’t know what is causing the difference but I think its safe to assume FS outputs are being used given the short run times. Just to be sure, check to see if a sourcedata folder is created in those single subject scratch directories. In 21.0.0+ that is the new default where FS outputs are written to. If nothing is created, then I’d be more confident FS was not rerun and old outputs were used as intended.

Hate to hijack this thread:
what is the reasoning behind the convention of mounting a path to the docker root dir?
i.e. docker run --rm -v /home/hamilton/my_dataset/:/data -v /home/hamilton/my_dataset/derivatives:out <some_image>

why not:
docker run --rm -v /home/hamilton/my_dataset:/home/hamilton/my_dataset <some_img>?

No problem! Part of it is it makes for easier documentation on the website. The names “data” and “out” are pretty clear indicators of what should go there. Second, and I’ve only seen this once, some commands will fail if the path is too long (no reason why), so renaming it like in the first way is a way of preventing that. Finally, some people might want their outputs in a different drive (although I wouldn’t recommend it since that’s likely not BIDS complaint). Whenever I run Singulairty commands, I lean towards just mounting the BIDS root and not renaming it, which as discussed earlier in the thread is better for pybids databases.

Also, in your second reply, you don’t need the :/…. after specifying what drive you want mounted if you are going to rename it the same thing in docker.

Good thing to know. Been moving to using full paths with docker ever since I started the move to singularity (often use both packages on different work stations), so thank you for reassurance that doing so is not bad form!

True, though it got confusing since /out should be /data/derivatives in BIDS (unless the assumption is that researchers will do processing in some scratch directory then merge outputs into a full BIDS-dataset).

1 Like

Interesting update, I re-ran those two subjects (sub-0005 and sub-0010) and both now say “pre-existing directory” for the freesurfer reconstruction. No “sourcedata” directory and the jsons in the working directories have the right path to the pre-run freesurfer data, as would be expected. Oddly, this is the second time I’ve observed that re-running fmriprep changes the freesurfer status (no prior working directory, new output location for derivatives). Will try to run a few more folks to see if I can get a working directory on someone who has the “run by fmriprep” status, as it would not be ideal to re-run the ~1/4-1/3 of our 200 subjects for whom this occurs.

Just a quick update that I am not seeing the “freesurfer reconstruction: run by fmriprep” status on new subjects - it correctly identifies the status as “pre-existing directory”. The changes I made were to reset my derivatives directory and save the working directory. I don’t see a sourcedata directory or other freesurfer outputs in either and the runtime was ~1 hour so I feel okay saying that fmriprep is using the pre-run freesurfer outputs correctly.

1 Like