BIDS: getting the list of metadata fields for a tsv file from the schema?

Hi!

I’m trying to use the BIDS schema for generating BIDS datasets, with a good amount of metadata stored in an electronic lab notebook… For this, I need to know “in which BIDS tsv file(s) should a given metadata field go?”, and I’d like to do this with the BIDS schema…

After some (heavy :wink: ) thinking, I think this would be solved if I get the answer to the following question: in the BIDS specs available on the website (let’s say here for the data summary files: Data summary files - Brain Imaging Data Structure 1.10.1), for each tsv file, there is a table listing the metadata fields (~columns) that go into that tsv file. How could I access (or reconstruct) such table from the BIDS schema?

(I’ve looked here and there in the schema, as well as in bidsschematools, but I didn’t manage to get the answer)

Thanks,

Sylvain

For TSV files, the column definitions can be found in objects.columns and the rules are in rules.tabular_data.<category>.<table>.columns.

The relevant doc is here: BIDS Schema description - Valid fields for definitions

Incidentally, I would be glad to get some feedback on what were sticking points in learning to use the schema, what docs you wish existed, etc.

Thanks! I missed that…

I guess the overall challenge to use the bids schema is its intrinsic complexity, which makes the information somehow difficult to find; but the doc you all wrote seems really good, it’s just vast! Here, in this particular case, I think that:

  • in the schema itself, I could have found the info myself by looking a bit better…
  • in the doc (that you linked), I have to admit that the title “Valid fields for definitions” may be a bit too vague… but I’m not sure I can provide a good suggestion at this point (I have to digest its content :wink: )

Other than that, I have one question: why is there a capital letter in the schema at the beginning of the category in rules.tabular_data.<category> (Participants, Samples, Scans etc.), whereas everything seems to be fully in lowercase everywhere else…? Is there a purpose / a function / a match with something else where this capitalization exists?

Anyhow, thanks for your help!

Cheers,

Sylvain

1 Like

Thanks for the feedback! Please feel free to open issues/PRs with any suggestions you have and tag them with schema. I’ll also see updates to this thread.

It’s probably time to break into separate pages. It’s currently just a rendering of bids-specification/src/schema/README.md at master · bids-standard/bids-specification · GitHub.

Fair, but the headings are: Rule files - Sidecar and tabular data rules - Valid fields for definitions.

Generally, under rules, you have <path>.<rulename>.<rule>. path is snake case, rulename is CamelCase, and rule contents are (I think) snake case.

I think this arose because the schema started as what is now objects.metadata, which are JSON fields that BIDS uses CamelCase for. So when we started writing other files, we kept doing the same thing. This isn’t universal, though; since columns are snake_case in BIDS, the entries in objects.columns are, too.

That said, you shouldn’t attach a semantic meaning to the capitalization. It’s just a name for the rule object.

1 Like

Hi @effigies ! I’m following up on this, and in particular I’m coming back on your wish for suggestions on how to improve the doc of BidsSchemaTools… What would be great is to provide code snippets that would allow answering practical questions using the schema… This is actually what I’m trying to do at the moment, and this is fairly difficult (probably because of my lack of skills :wink: )… Anyhow, for my initial question (title of the thread), I came up (fairly easily) with:

import bidsschematools as bst
import bidsschematools.schema
local_schema_path = '/XXX/YYY/schema.json'
bs = bst.schema.load_schema(local_schema_path)
bs.rules.tabular_data.modality_agnostic.Participants.columns.to_dict().keys()

This give me the list of columns in the participants.tsv file:

dict_keys(['participant_id', 'species', 'age', 'sex', 'handedness', 'strain', 'strain_rrid', 'HED'])

If I want to know which of these columns is required / recommended / optional, the information is clearly present in what follows, but is not at the same level in the schema for participant_id vs. the other columns…

bs.rules.tabular_data.modality_agnostic.Participants.columns.to_dict()

gives…

{'participant_id': {'level': 'required',
  'description_addendum': 'There MUST be exactly one row for each participant.\n'},
 'species': 'recommended',
 'age': 'recommended',
 'sex': 'recommended',
 'handedness': 'recommended',
 'strain': 'recommended',
 'strain_rrid': 'recommended',
 'HED': 'optional'}

Is this intended? I find this not very logical, neither practical :wink:, but there’s probably a good reason for this… Anyhow, how would you get the info easily (I guess this is a python question :wink: )?

other questions I will start working on are e.g:

  • for a given data modality (e.g microscopy), what is the list of files (tsv, json, sidecars) that can/should be included next to the microscopy data files (bs.objects.files.to_dict().keys() gives me a piece of answer for the top level files and directories; I need to select only the files, and get the other ones that are not at the top level)? and what is the status (mandatory or not) of each file?

  • for a given metadata item (e.g participant_id, or NumericalAperture, which don’t have exactly the same status…), in what file(s) should it go?