Clarifying file reading in BIDS-validator vs. fMRIprep

For my own understanding, because this seems to have happened on every single one of the subjects in a large dataset, but the fMRIprep outputs look fine. Does the bids-validator require something different than fMRIprep for files? I thought initially this might have been related to this or this, but it looks like these were resolved long before I ran into this, and none of these files are symlinked. This is using fMRIprep 1-4-1rc5, for continuity with other parts of this large dataset processed with this version.

Edit: I noticed after combing through the logs a bit more that this did not happen with the first several hundred subjects in a given BIDS dir, but then occurred consistently for everyone afterwards. Wondering if it actually is related to the issue above with not closing the files or something, because my logs do end abruptly with OSError: handle is closed and ulimit -Hn gives 4096 as expected.

I get an error during BIDS validation for every subject about

	1: [ERR] We were unable to read this file. Make sure it contains data (file size > 0 kB) and is not corrupted, incorrectly named, or incorrectly symlinked. (code: 44 - FILE_READ)
		... and 8 more files having this issue (Use --verbose to see them all).

However, fMRIprep continues and runs fine, and I get all the expected outputs. They were converted with Heudiconv. This happens both with files with lenient permissions and those that were converted by another user that are read-only. Is it something weird with Singularity and symlinks?

File permissions:

total 8408
-rwxrwxr-x 1 utooley mackey_group   24138 Jul 22 19:03 sub-NDARINVGBXJFCXY_acq-dMRI_dir-AP_run-04_epi.json
-rwxrwxr-x 1 utooley mackey_group 1518323 Jul 18 09:23 sub-NDARINVGBXJFCXY_acq-dMRI_dir-AP_run-04_epi.nii.gz
-rwxrwxr-x 1 utooley mackey_group   24135 Jul 22 19:03 sub-NDARINVGBXJFCXY_acq-dMRI_dir-PA_run-03_epi.json
-rwxrwxr-x 1 utooley mackey_group 1509001 Jul 18 09:23 sub-NDARINVGBXJFCXY_acq-dMRI_dir-PA_run-03_epi.nii.gz
-rwxrwxr-x 1 utooley mackey_group   18530 Jul 22 19:03 sub-NDARINVGBXJFCXY_acq-fMRI_dir-AP_run-02_epi.json
-rwxrwxr-x 1 utooley mackey_group  534005 Jul 18 09:23 sub-NDARINVGBXJFCXY_acq-fMRI_dir-AP_run-02_epi.nii.gz
-rwxrwxr-x 1 utooley mackey_group   18534 Jul 22 19:03 sub-NDARINVGBXJFCXY_acq-fMRI_dir-AP_run-06_epi.json

Log file attached.

log.txt (909.3 KB)

1 Like

@rwblair any thoughts?

1 Like

From #675:

Yes, it is indeed hitting the user open file limit. And I managed to resolve the issue. Once I increased my user file limit from the default 4096 to 10000, the validator works fine.

Is there any way that you can attempt to increase this limit or are you on an HPC? I’ve personally had to bump this limit when testing the validator locally.

Hi @rwblair,

I’m on an HPC :disappointed_relieved: . BUT, it doesn’t seem to be affecting the downstream running of fMRIprep…so I think I’m fine ignoring this error (correct me if I’m wrong, folks).

My underlying question was why the BIDS validator doesn’t close files afterwards, given the previous issue on Github about it? Or is it simply that the BIDS dir has gotten too big at that point (once I start writing files in the loop) to validate simultaneously in the several running jobs on the HPC?

I’m not positive on the exact why, I’m going to look at a bit closer, we thought it would take care of this class of issue. No promises on a quick solution though.

Out of curiosity how many files are in the dataset? one way to see this is by running find . -type f | wc -l in the dataset directory. I can try to find a similarly large dataset for testing.

It’s a large dataset…running find . -type f | wc -l is hanging too long at the moment, but I can make a guesstimate at the approximate number of files: 700 subjects x (2 anats + associated .jsons+ 4 funcs and associated .jsons + ~8 fmaps and .jsons=28), so about 19600 files? Not sure that’s exactly right, but somewhere in the ballpark.

But that doesn’t include the fmriprep outputs that are now in the derivatives folder.

Actually, find just finished in one folder and gave 278801, including the fmriprep outputs that were written.

I’ve also hit this error (code: 44 - FILE_READ) with bids-validator (v1.4.2) when working with large datasets (no symlinks, files seem fine). In the past, I was able to circumvent this by mounting the server volume locally using sshfs, then increasing my local ulimit -n (or using launchctl limit maxfiles on MacOS) to something like 10000. I came back to this post because this stopped working recently—not sure why… maybe just because my dataset has been increasing in size? However, including the --ignoreNiftiHeaders flag in the bids-validator call seems to circumvent the problem for now.

Thanks for the tip about --ignoreNiftiHeaders, Sam! I just posted on that Github issue (here) last week, because I ran into this issue again despite it supposedly having been fixed. Some discussion on the issue implies that how you run the bids-validator might make a difference–not sure if that might yield insight into why it stopped working recently.