Using BIDS schema to parse datasets

mkoculak · February 17, 2025, 3:04pm

I am writing code to parse BIDS datasets and noticed that there is a shift towards more “deterministic” parsing based on the YAML/JSON schema (at least that is how I understood information around the new major version of BIDS-validator).
So I was looking into using the JSON schema to “generate the internals” of my library but I it seems that it is not possible to get all the structure that is in the specification from the schema?

My particular example concerns metadata for tabular entities: my original code was based on version 1.8.0, but since then the Delimiter field was added. However, I cannot find in the schema which fields are allowed in these metadata files.

Am I thinking correctly about the purpose of the schema or is it mostly designed to be used for validation of the elements rather than encoding the whole structure in detail?

tsalo · February 17, 2025, 3:41pm

The long-term goal is to encode everything in the specification in the schema. There are, however, some elements that are harder to translate into the schema than others, so the main focuses have been on (1) generating tables and filename patterns in the specification from the schema and (2) validating BIDS datasets using the schema.

There have been attempts to use the schema to query datasets (e.g., ancp-bids). The only one that I think is actively maintained is bids2table, but I haven’t actually used it, so I don’t know if it uses everything in the schema or not.

It does look like this is missing from the schema. The Delimiter field is defined as a metadata field (see here), but there’s currently no rule specifying where it can be used. It looks like the schema doesn’t yet cover metadata fields for columns in tabular files, but I think that’s just something the maintainers haven’t gotten around to yet, rather than something that isn’t planned for the schema.

mkoculak · February 17, 2025, 4:18pm

Thanks for the response! This matches my impression, but I haven’t been up to date with BIDS developments, so couldn’t be sure (and didn’t want to rewrite everything if I would be reading the schema wrong).
The direction looks great, but I guess for now I still will have to set up the structure manually. But as you point out, this will be helpful for creating new and validating datasets.