Cheet Sheet needed for understanding BIDS validation rules

I am trying to implement a heudiconv heuristic for converting a diverse set of MRI DICOMS. The only method for testing that they are valid BIDS is with the bids-validator and the error messages regarding directory and file naming conventions do not provide details. I.e. the error messages say that the file name is not valid BIDS, now go fix it.

The JSON file pasted below includes the regex that does the testing, and is the most detailed way to understand what exactly BIDS requires and prohibits. The BIDS spec document should have prose description of that, but it’s spread out over multiple sections and not sufficiently prescriptive.

Would it be possible to develop a cheat sheet for checking what precisely are the rules of directory and file naming. It doesn’t need to include reasons for why those rules exist. Just show the rules as a guide for developers (who may or may not have any background in MRI and neuroimaging) working with this new format.

Note!: This is not meant as a criticism of anyone or any work that has been done. It is a request and an offer to contribute to a new resource that can be of use to the BIDS community.

The file below and these other files contain the rule of the bids-validator : https://github.com/bids-standard/bids-validator/blob/cff50f9c8e5e4afcba3543bb2e733591148f6b9c/bids_validator/rules/top_level_rules.json

{
  "func_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:recording-[a-zA-Z0-9]+_)?task-[a-zA-Z0-9]+(?:_acq-[a-zA-Z0-9]+)?(?:_rec-[a-zA-Z0-9]+)?(?:_run-[0-9]+)?(?:_echo-[0-9]+)?(@@@_func_top_ext_@@@)$",
    "tokens": {
      "@@@_func_top_ext_@@@": [
        "_bold.json",
        "_sbref.json",
        "_events.json",
        "_events.tsv",
        "_physio.json",
        "_stim.json",
        "_beh.json"
      ]
    }
  },

  "anat_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:acq-[a-zA-Z0-9]+_)?(?:rec-[a-zA-Z0-9]+_)?(?:run-[0-9]+_)?(@@@_anat_suffixes_@@@).json$",
    "tokens": {
      "@@@_anat_suffixes_@@@": [
        "T1w",
        "T2w",
        "T1map",
        "T2map",
        "T1rho",
        "FLAIR",
        "PD",
        "PDT2",
        "inplaneT1",
        "inplaneT2",
        "angio",
        "SWImagandphase",
        "T2star",
        "FLASH",
        "PDmap",
        "photo"
      ]
    }
  },

  "dwi_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:acq-[a-zA-Z0-9]+_)?(?:rec-[a-zA-Z0-9]+_)?(?:run-[0-9]+_)?dwi.(?:@@@_dwi_top_ext_@@@)$",
    "tokens": {
      "@@@_dwi_top_ext_@@@": ["json", "bval", "bvec"]
    }
  },
  "eeg_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?task-[a-zA-Z0-9]+(?:_acq-[a-zA-Z0-9]+)?(?:_proc-[a-zA-Z0-9]+)?(?:@@@_eeg_top_ext_@@@)$",
    "tokens": {
      "@@@_eeg_top_ext_@@@": [
        "_eeg.json",
        "_channels.tsv",
        "_photo.jpg",
        "_coordsystem.json"
      ]
    }
  },
  "ieeg_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?task-[a-zA-Z0-9]+(?:_acq-[a-zA-Z0-9]+)?(?:_proc-[a-zA-Z0-9]+)?(?:@@@_ieeg_top_ext_@@@)$",
    "tokens": {
      "@@@_ieeg_top_ext_@@@": [
        "_ieeg.json",
        "_channels.tsv",
        "_electrodes.tsv",
        "_photo.jpg",
        "_coordsystem.json"
      ]
    }
  },
  "meg_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?task-[a-zA-Z0-9]+(?:_acq-[a-zA-Z0-9]+)?(?:_proc-[a-zA-Z0-9]+)?(?:@@@_meg_top_ext_@@@)$",
    "tokens": {
      "@@@_meg_top_ext_@@@": [
        "_meg.json",
        "_channels.tsv",
        "_photo.jpg",
        "_coordsystem.json"
      ]
    }
  },
  "multi_dir_fieldmap": {
    "regexp": "^\\/(?:acq-[a-zA-Z0-9]+_)?(?:dir-[a-zA-Z0-9]+_)epi.json$"
  },

  "other_top_files": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:recording-[a-zA-Z0-9]+_)?(?:task-[a-zA-Z0-9]+_)?(?:acq-[a-zA-Z0-9]+_)?(?:rec-[a-zA-Z0-9]+_)?(?:run-[0-9]+_)?(@@@_other_top_files_ext_@@@)$",
    "tokens": {
      "@@@_other_top_files_ext_@@@": ["physio.json", "stim.json"]
    }
  }
}

Interesting idea - how would such cheat sheet differ from a list of regular expressions?

More human readable.

It’s a good idea, but I think it would benefit from some examples. BTW have you seen the path templates in the spec? (such as https://github.com/bids-standard/bids-specification/blob/master/src/04-modality-specific-files/01-magnetic-resonance-imaging-data.md#anatomy-imaging-data)

I had not seen that, and yes that format is just right for a cheat sheet.

Copying that example here (source):

sub-<participant_label>/[ses-<session_label>/]
    anat/
        sub-<participant_label>[_ses-<session_label>][_acq-<label>][_ce-<label>][_rec-<label>][_run-<index>]_<modality_label>.nii[.gz]
        sub-<participant_label>[_ses-<session_label>][_acq-<label>][_ce-<label>][_rec-<label>][_run-<index>][_mod-<label>]_defacemask.nii[.gz]

Now I will try to square that with the corresponding code in the bids-validator, to determine if the above covers all cases of allowable content, or is something missing.

1

This regex in file_level_rules.json defines rules for files in bottom-level folder for the anat modality.

"anat": {
    "regexp": "^\\/(sub-[a-zA-Z0-9]+)\\/(?:(ses-[a-zA-Z0-9]+)\\/)?anat\\/\\1(_\\2)?(?:_acq-[a-zA-Z0-9]+)?(?:_rec-[a-zA-Z0-9]+)?(?:_run-[0-9]+)?_(?:@@@_anat_suffixes_@@@).(@@@_anat_ext_@@@)$",
    "tokens": {
      "@@@_anat_suffixes_@@@": [
        "T1w",
        "T2w",
        "T1map",
        "T2map",
        "T1rho",
        "FLAIR",
        "PD",
        "PDT2",
        "inplaneT1",
        "inplaneT2",
        "angio",
        "SWImagandphase",
        "T2star",
        "FLASH",
        "PDmap",
        "photo"
      ],
      "@@@_anat_ext_@@@": ["nii.gz", "nii", "json"]
    }
  },

This can be re-written as:

/sub-<participant_label>/[ses-<session_label>/]anat/1[_2][_acq-<acquisition_label>][_rec-<rec_label>][_run-<run_label>][_](T1w|T2w|T1map|T2map|T1rho|FLAIR|PD|PDT2|inplaneT1|inplaneT2|angio|SWImagandphase|T2star|FLASH|PDmap|photo).(nii.gz|nii|json)

This path would pass the regex test:

/sub-21/ses-1/anat/1_acq-21_rec-3_run-1_T1w.json

@ChrisGorgolewski It seems there are some inconsistencies between this version and the version above. Have I re-written it accurately?

2

This regex in top_level_rules.json defines rules of the directory hierarchy:

 "anat_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:acq-[a-zA-Z0-9]+_)?(?:rec-[a-zA-Z0-9]+_)?(?:run-[0-9]+_)?(@@@_anat_suffixes_@@@).json$",
    "tokens": {
      "@@@_anat_suffixes_@@@": [
        "T1w",
        "T2w",
        "T1map",
        "T2map",
        "T1rho",
        "FLAIR",
        "PD",
        "PDT2",
        "inplaneT1",
        "inplaneT2",
        "angio",
        "SWImagandphase",
        "T2star",
        "FLASH",
        "PDmap",
        "photo"
      ]
    }
  },

This can be re-written as:

(coming soon)

It seems references to subexpressions in the original regexp tripped your translation a little. This is the correct answer:

/sub-<participant_label>/[ses-<session_label>/]anat/sub-<participant_label>[_ses-<session_label>][_acq-<acquisition_label>][_rec-<reconstruction_label>][_run-<run_index>]_(T1w|T2w|T1map|T2map|T1rho|FLAIR|PD|PDT2|inplaneT1|inplaneT2|angio|SWImagandphase|T2star|FLASH|PDmap|photo).(nii.gz|nii|json)
1 Like

Would it also be helpful to replace some of the regexp’s with POSIX character classes? e.g.,
[a-zA-Z0-9] would become [:alnum:].

Similarly, [a-zA-Z] would be [:alpha:]; [0-9] would be [:digit:]; etc.

This would improve readability I think, although it is an extra layer of obfuscation for those not familiar with those definitions.

1 Like