How to write complex validator rules/checks?

I want to write a validator rule that expresses the following:

if electrodes.tsv exists, and it has a column “coordinate_system”, make sure that every unique value in that column exists as a top-level key in the coordsystem.json file

I’ve read through BIDS schema description - BIDS Schema Tools documentation and searched through the codebase for examples, but so far haven’t found any checks that make reference to values in a tabular column. Is this possible? Is there a reference (other than the schema description page linked to) for this validation syntax? Is there a way, during validation, to enter a debugger and query interactively the properties of the available objects?

NB: this relates to BEP042 and I’m totally open to the possibility that our proposed (ab)use of coordsystem.json either may not be accepted, or if accepted, may not be possible to validate fully. I’m not interested in reviews of that design choice here; my goal is to learn more about how to write the rules/checks (even if the rule/check I end up with doesn’t end up getting utilized).

You can’t do it right now, so you’ll need to add to the schema.

The check will need to happen either on a coordsystem.json or an electrodes.tsv. If this is anything like iEEG, then I think there’s supposed to be a 1-to-1 correspondence, so it makes relatively little difference. It’s a question of which is easier and whether you consider the error to be in coordsystem.json or electrodes.tsv.

If you are working from electrodes.tsv, you already have access to a columns variable (see bids-specification/src/schema/meta/context.yaml at master · bids-standard/bids-specification · GitHub), so columns.coordinate_system would get you those values. You also have access to associations.coordsystem, but that only has a path attribute. You would need to expand it to have an attribute involving the keys, such as:

       coordsystem:
         description: 'Coordinate system file'
         type: object
         required: [path]
         additionalProperties: false
         properties:
           path:
             description: 'Path to associated coordsystem file'
             type: string
+          keys:
+            description: 'Keys in coordsystem.json'
+            type: array
+            items:
+              type: string

Then you need some kind of value check. intersects(columns.coordinate_system, associations.coordsystem.keys) would get you the values, but it is not currently specified to guarantee unique values. We might need a new function unique(). Then we could write:

length(intersects(
  unique(columns.coordinate_system),
  associations.coordsystem.keys
)) == length(unique(columns.coordinate_system))

cc @rwblair for thoughts

In general this makes sense to me. This need for unique is something that has come up a few times and I’d be happy to see it added to the schema language. This pattern is verbose for what is essentially an enum type check, but every way to simplify the rules I’ve thought of punts complexity out of the schema and into the validators and isn’t as general as what’s recommended here.

OK, after banging my head against this for a few days, I’m back with more questions. Here’s what I’m doing:

  • I’ve done what @effigies suggested above (you can see in this commit it’s basically verbatim from above, except for adding keys to the list of required fields)
  • I’ve re-exported the schema like this: uv run bst export > src/schema.json (from root of bids-specification clone)
  • I’m running the local version of the validator like this:
    /opt/bids/validator/local-run --schema file:///opt/bids/spec/src/schema.json /opt/bids/examples/emg_concurrentIndependentUnits (partly so that I could benefit from bids-standard/bids-validator#243, and partly so I could try my hand at adding the unique function. But things are failing without unique, when I’m on main branch of validator clone)
  • here’s the coordsystem.json file from the dataset I’m running the validator on. You’ll see it has 4 top-level keys, so associations.coordsystem.keys ought to be a 4-item list (not null)
  • I have a new file src/schema/rules/checks/emg.yaml that looks like this:
    EMGCheckThatPassesButShouldnt:
    issue:
      code: EMG_XFAIL
      message: Shows that associations.coordsystem.keys is null.
      level: error
    selectors:
      - datatype == "emg"
      - suffix == "electrodes"
    checks:
      - associations.coordsystem.keys == null
    
    This check passes. Dummy checks (like 1 == 2) placed after it in the file fail as expected.

Any idea what I’m doing wrong here? I’ll also repeat my prior question: is there a way, during validation, to enter a debugger and query interactively the properties of the available objects in the context? Having that would make this so much easier.

For a quick and dirty look at the context I add a conditional and console.log in src/validators/bids.ts around line 126, after the await context.asyncLoads() and after the loop that runs the per context checks but before the summary update. Might look something like this:

    if (context.file.path.endsWith("sub-01_recording-highDensity_electrodes.tsv")) {
      console.log(context)
    }

Its a mess that’ll show a representation of the schema itself and the dataset level context, etc. But if you know what you’re looking for it you can narrow it down in the print.

I saw :

  entities: { sub: "01", recording: "highDensity" },
  datatype: "emg",
  suffix: "electrodes",
  extension: ".tsv",
  modality: "emg",
  sidecar: {},
  associations: {
    coordsystem: {
      path: "/sub-01/emg/sub-01_recording-highDensity_coordsystem.json"
    }
  },

By default association objects configured only in the schema only are only populated with a path. To add more things to the association an entry in associationLookup inside src/schema/associations.ts needs to be filled out.

In my current repo associations.ts wasn’t importing loadJSON so I had to add that:

import { loadJSON } from '../files/json.ts'

Then I added something like this after the channels entry:

  coordsystem: async (file: BIDSFile, options: { maxRows: number }): Promise<Channels> => {
    const keys = Object.keys(await loadJson(file, options.maxRows)
      .catch((e) => {
        return new Map()
      }))
    return {
      path: file.path,
      keys: keys,
    }
  },

And that got EMG_XFAIL to start showing up for me.

So your rule was good! It was a poorly/not documented procedure in amending the validator that is in your way.

1 Like