Continous BIDS multi-site protocol compliance

bpinsard · April 27, 2024, 6:30pm

Hi neurostars!

I am investigating the possibility to write a simple tool to check MRI protocol compliance for continuous integration of new MRI session in a multi-site/multi-vendor BIDS dataset.

I looked into GitHub - Open-Minds-Lab/mrQA: mrQA: tools for quality assurance in medical imaging datasets, including protocol compliance but that doesn’t seem to exactly fit our needs.
Are there existing tools that I am not aware of that have similar function?
I thought about integrating it in the heudiconv heuristic, but I wouldn’t have access to all tags, and would make it more complex.

I would like a tool much like the BIDS validator that would check which scanner was used for a new session (using Manufacturer/ManufacturersModelName/StationName/SoftwareVersion) and then apply a set of rules on a subset of the fields of the sidecar json of each series from the new session.
It should also check that all required sequences are present, while allowing some sequences to be set as optional.

Research led me to json-schema and pydantic for schema validation. The idea: with a set of session from each scanner in the study, one could run a config tool that pre-generates a json-schema ( with conditional sub-schema dependent on site/model/manufacturer). This tool would select common json tags that should be validated (logically, sequence related ones) and create a set of rules, allowing approximate numerical matching if necessary. The resulting schema json could then be edited to tweak details before being pushed in “production”. For instance, one might want to manually set the minimum number of TRs for a fMRI run (would require dcmstack fields).
Additional tags could configure the required/optional sequences and/or how many of these are required (for instance, if one wants 3 fmap runs, one for each task).

It would result in an ignored hidden folder within the BIDS structure (eg .bids-check/) that would follow the regular bids structure with json containing the schema (eg .bids-check/sub-ref/ses-{pre,post}/{anat,func}/{**entities}_{suffix}.json), Ideally the schema could be queried with pybids.

Then a new BIDS session could be validated against the schema and give a report of mismatch, and returns an error code if any mismatch occurs. For each new sidecar, the tool would find the corresponding schema with BIDS entities to validate against.
The idea would be to integrate this tool as a CI action that runs on PR adding new sessions to a BIDS datalad (aside bids-validator or other tests).

Before going further, I wanted to make sure it makes sense, if there are any design flaws, or if it is missing anything? (eg. This doesn’t cover bvecs/bvals compliance, which would require a separate validation.)

Thanks for your valuable feedback!

Cheers,

Basile

Steven · April 27, 2024, 6:52pm

Hi @bpinsard,

Perhaps check out GitHub - PennLINC/CuBIDS: Curation of BIDS (CuBIDS): A sanity-preserving software package for processing BIDS datasets.. (The repo has links to documentation and the associated paper).

Best,
Steven

bpinsard · April 27, 2024, 11:27pm

Hi @Steven,
Thanks for adding this important reference! I had indeed found it in my searches and completely forgot to list it here, my bad.
The group subcommand seems to have the same approach as mrQA, which is to assess dataset heterogeneity and is related but not exactly what I am looking for, unless there are ways to use the API that I am not aware of.
The main feature I am looking for is the continuous integration, rather than retrospectively checking an existing dataset.
However cuBIDS integration with datalad is a very nice feature.

Best,
b_

yarikoptic · May 5, 2024, 12:16pm

Another related project I have heard of is GitHub - harvard-nrg/mrverify: MRI scan parameter verification but it I don’t think it would fit the bill here too.

I kept thinking about the fact that BIDS has a schema now, and kept wondering what/if it could be extended/augmented to provide such additional checks so that indeed a standard bids-validator, given additional (see recent Formalize concept/specification of the "BIDS Extensions" · Issue #74 · bids-standard/bids-2-devel · GitHub : " Formalize concept/specification of the “BIDS Extensions”) or just patched bids schema could perform those additional validations… You mention pybids, but I wondered if at least selection mechanisms present in bids-specification’s schema could may be suffice to provide “selectors”. See e.g. examples of selectors in bids-specification/src/schema/README.md at 0950f6df9e9c6d598fcc50e269f350066d78b7a0 · bids-standard/bids-specification · GitHub and there more in standard itself.

bpinsard · May 5, 2024, 6:00pm

Thanks Yarik for these pointers to very interesting discussions and for mrverify which I was not aware of. It is relevant, though not exactly what I am trying to achieve (and some of it seems somewhat specific to the MRI models present at that specific MR unit).

I started to craft a quick-n-dirty proof-of-concept tool here GitHub - UNFmontreal/forbids, mainly to check if it can generate and use json schema to validate BIDS metadata values.
The advantage of json schema being is that it can validate in many languages/libraries, (even maybe in a browser in JS). The type of validation it can do is pretty basic but should be sufficient, except maybe for stuff like checking FoV/orientations/…
There are a bunch of sequence tags which are easily identifiable to remain stable across sessions and can be extracted to automagically generate a schema that can then be tweaked by hand (e.g one wants to allow bold duration to be from x to y TRs).
These autoextracted tags can be preset in a general config file (eg. forbids/src/forbids/config/mri_tags.json at main · UNFmontreal/forbids · GitHub for a basic one).
These config files could preset some tags for numerical approximate match, or other default non-equal matching (not implemented yet).
One would run generate that schema at the beginning of a study (let say from final pilot data at each site), and then integrate it in the datalad dataset.

An example of a schema (.forbids/sub-ref/anat/sub-ref_UNIT1.json) that the very limited config auto-generates for now.

{
  "type": "object",
  "properties": {
    "EchoTime": {
      "type": "number",
      "const": 0.00151
    },
    "ManufacturersModelName": {
      "type": "string",
      "const": "Prisma_fit"
    },
    "PixelBandwidth": {
      "type": "integer",
      "const": 745
    },
    "RepetitionTime": {
      "type": "integer",
      "const": 4
    },
    "dcmmeta_shape": {
      "type": "array",
      "prefixItems": [
        {
          "type": "integer",
          "const": 176
        },
        {
          "type": "integer",
          "const": 192
        },
        {
          "type": "integer",
          "const": 192
        }
      ],
      "items": false,
      "minItems": 3,
      "maxItems": 3
    },
    "global__": {
      "type": "object",
      "properties": {
        "const": {
          "type": "object",
          "properties": {
            "SequenceName": {
              "type": "string",
              "const": "*tfl3d1_16"
            }
          },
          "required": [
            "SequenceName"
          ]
        }
      },
      "required": [
        "const"
      ]
    }
  },
  "required": [
    "EchoTime",
    "ManufacturersModelName",
    "PixelBandwidth",
    "RepetitionTime",
    "dcmmeta_shape",
    "global__"
  ],
  "$schema": "http://json-schema.org/draft/2020-12/schema#"
}

The idea is that these regular JSON-schemas (that can be used to validate the json sidecar with matching entities in filenames) would also contain additional rules to validate:

how many (min/max) runs are expected (from 0 for optional to any value for required series)
validate the ordering of the series/runs in a session (can be critical for tasks in functional runs) though a after parameters, using AcquisitionTime (but needs to be smarter for midnight scans, maybe using _scans.tsv).
in theory this should work not-only for MRI but for other BIDS modalities with instrument metadata that are meaningful to validate.

I went with pybids as this is what is most familiar to me, but any other tools that can query BIDS should be fine.

Of course, if that type of validations could be covered by future BIDS specs and/or validator that would be awesome, and I would rather not dev and maintain another tool.

yarikoptic · May 5, 2024, 7:25pm

Quick related follow-up up: consider https://linkml.io/ to describe model since from it you can produce json schema and pydantic and …

raamana · May 7, 2024, 5:45pm

Hi @bpinsard, from what i understand from your original question, your needs are straightforward to achieve with MRdataset (recently published in JOSS) by applying your desired constraints on the intuitive data structure it returns to you, that makes complex audits of the dataset underneath very easy. Happy to talk more and help you as needed.

paper pdf with some insight : Journal of Open Source Software: MRdataset : A unified and user-friendly interface to medical imaging datasets

it may not be apparent for you, but mrQA internally depends on MRdataset

bpinsard · May 7, 2024, 6:35pm

Thanks @raamana for your input!

MRDataset is indeed an interesting library to access MRI data stored in different formats. However, for the current usecase, I want to focus on BIDS and there is limited documentation/api-ref/ of MRDataset to integrate it in another tool.

The decision to focus on BIDS is to not only cover MRI data but any data (current or prospectively) covered by BIDS: M/EEG, NIRS, PET, physio, eyetracking, …
By focusing on BIDS, I was able to code the base of the tool in a few hundreds lines and 3 dependencies (pybids, jsonschema, apischema) maintained by large communities.

In the end, unless I am mistaken, MRDataset would give access to a subset of BIDS
MRI specific metadata (which pyBIDS can do too with standardized BIDS querying), while the validation logic has to be handled separately anyway (which you are doing in MRQA). My goal here is to avoid the bulk of validation logic (comparing number, str, regex, range, …) to be coded in python (or another programming language), but rather rely on a well-supported data validation standard that could be read and reused with any tool/language that rely on the BIDS schema => higher interoperability, future-proofing.

The fact that all metadata in BIDS are JSON makes it really easy to apply a standard like JSON-schema to validate it without requiring format-specific loader such as MRDataset. The logic for selecting the schema is based on BIDS entities.
The multi-site/multi-scanner/multi-vendor/scanner-upgrade case can easily be covered by JSON schema itself with schema unions with discriminators.
Relying on such a supported and evolving schema also benefit from fixes and future potential new features without much efforts.

Of course this tool is still at a draft stage, and I hope community engagement, such as this discussion, can help make better choices than what I can envision.

Best,

raamana · May 7, 2024, 10:35pm

I follow you - the reason we had to develop mrQA and MRdataset from scratch was that we realized early on 1) BIDS won’t be available for many imaging studies until much later (often months if not years), and 2) that we needed to work on DICOMs straight off the scanner both to catch the issues immediately and 3) get the most comprehensive information about imaging studies that NifTI and BIDS JSON won’t always encode or retain.

While our software seem simple at a higher level, we had to develop a lot of validation logic to deal with many idiosyncratic issues with differences in sequences, vendors and modalities etc. MRdataset was developed partly to be able to convert parameter units between different vendors which is tricky (subtle issue that can be overlooked in basic checks on parameter compliance).

That said, MRdataset offered support for BIDS from the beginning (happy to walk you through the codebase) - our paper on protocol compliance about to appear in Neuroinformatics was validated on over 20 BIDS datasets.

Anyways, if our design doesn’t help/work for your specific needs, I understand, and wish you the best.

raamana · May 7, 2024, 10:56pm

As I understand it most medical imaging data and modalities are created in DICOM to start with - which makes it most inclusive across vendors, sequences, and modalities, esp. for continuous monitoring and reducing data loss. Perhaps there is other good reasons to focus on BIDS for protocol compliance.

bpinsard · May 21, 2024, 3:59pm

After playing with 3 multi-site/vendor datasets, I learned a bunch of things, and iterated on the strategy to create the schemas:

one can now use a “final pilot dataset” with 1 session from all scanners/instruments to pre-generate json schemas. For each unique set of BIDS entities, the init command will try to group by coarser to finer level, eg. Manufacturer/Model/ReceiveCoil, and optionally create scanner-specific or software-version-specific schemas. So you might end up with a schema per manufacturer for 1 series, or a more granular one for another.
existing dataset with “harmonized protocol” show unexpected variability of some parameters, showing the relevance of this tool. Some of the variability can be accounted by variations of the conversion software, which can be easily controlled, some might be scanner software patches, others are just spurious manual changes/setup.
some changes that we optionally want to control are modified automatically by the instrument (for instance receive coil elements based on the FoV if coil auto-select mode is set, this parameter is not in jsons, but in CSA headers.), so one might need to modify the pre-generated schema to allow more lenient check.

There is still much work to be done on the reporting of errors when validating.

raamana · May 21, 2024, 4:52pm

I don’t fully understand your needs but I’d say the more conversions and steps you add (e.g., with schemas and BIDS etc), the more dependencies and errors you create, adding to complexity and maintenance burden. At the risk of repeating myself, focusing on DICOM solves that issue without loss of info (as that’s the most amount of info one can ever get, and that’s available right away without any conversion).

regarding ReceiveCoilElements, we did some work on it to support it (to deal with some idiosyncracies) and there were some surprises (people switch coils even for the same project, and PIs may not even be aware of it)

bpinsard · May 21, 2024, 7:00pm

Regarding Coil Elements changing for a set protocol without human operations, this is exactly what I was talking about. For Siemens, when Coil Select Mode (last line in printout excerpt) is set to “On - AutoCoilSelect”, the system will select the elements which are the most sensible to use to image the FoV selected. This will likely be influenced by head/body-size and position relative to the head-(neck) or spine coil. I mainly observed that for a c-spine protocol, where anatomical variations influencing position might be more prominent. You can find that parameter value in the proprietary CSA / MRPhoenix headers in the dicoms, but not in regular dicom tags.