Cheet Sheet needed for understanding BIDS validation rules

tedstrauss · October 19, 2018, 8:17pm

I am trying to implement a heudiconv heuristic for converting a diverse set of MRI DICOMS. The only method for testing that they are valid BIDS is with the bids-validator and the error messages regarding directory and file naming conventions do not provide details. I.e. the error messages say that the file name is not valid BIDS, now go fix it.

The JSON file pasted below includes the regex that does the testing, and is the most detailed way to understand what exactly BIDS requires and prohibits. The BIDS spec document should have prose description of that, but it’s spread out over multiple sections and not sufficiently prescriptive.

Would it be possible to develop a cheat sheet for checking what precisely are the rules of directory and file naming. It doesn’t need to include reasons for why those rules exist. Just show the rules as a guide for developers (who may or may not have any background in MRI and neuroimaging) working with this new format.

Note!: This is not meant as a criticism of anyone or any work that has been done. It is a request and an offer to contribute to a new resource that can be of use to the BIDS community.

The file below and these other files contain the rule of the bids-validator : https://github.com/bids-standard/bids-validator/blob/cff50f9c8e5e4afcba3543bb2e733591148f6b9c/bids_validator/rules/top_level_rules.json

{
  "func_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:recording-[a-zA-Z0-9]+_)?task-[a-zA-Z0-9]+(?:_acq-[a-zA-Z0-9]+)?(?:_rec-[a-zA-Z0-9]+)?(?:_run-[0-9]+)?(?:_echo-[0-9]+)?(@@@_func_top_ext_@@@)$",
    "tokens": {
      "@@@_func_top_ext_@@@": [
        "_bold.json",
        "_sbref.json",
        "_events.json",
        "_events.tsv",
        "_physio.json",
        "_stim.json",
        "_beh.json"
      ]
    }
  },

  "anat_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:acq-[a-zA-Z0-9]+_)?(?:rec-[a-zA-Z0-9]+_)?(?:run-[0-9]+_)?(@@@_anat_suffixes_@@@).json$",
    "tokens": {
      "@@@_anat_suffixes_@@@": [
        "T1w",
        "T2w",
        "T1map",
        "T2map",
        "T1rho",
        "FLAIR",
        "PD",
        "PDT2",
        "inplaneT1",
        "inplaneT2",
        "angio",
        "SWImagandphase",
        "T2star",
        "FLASH",
        "PDmap",
        "photo"
      ]
    }
  },

  "dwi_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:acq-[a-zA-Z0-9]+_)?(?:rec-[a-zA-Z0-9]+_)?(?:run-[0-9]+_)?dwi.(?:@@@_dwi_top_ext_@@@)$",
    "tokens": {
      "@@@_dwi_top_ext_@@@": ["json", "bval", "bvec"]
    }
  },
  "eeg_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?task-[a-zA-Z0-9]+(?:_acq-[a-zA-Z0-9]+)?(?:_proc-[a-zA-Z0-9]+)?(?:@@@_eeg_top_ext_@@@)$",
    "tokens": {
      "@@@_eeg_top_ext_@@@": [
        "_eeg.json",
        "_channels.tsv",
        "_photo.jpg",
        "_coordsystem.json"
      ]
    }
  },
  "ieeg_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?task-[a-zA-Z0-9]+(?:_acq-[a-zA-Z0-9]+)?(?:_proc-[a-zA-Z0-9]+)?(?:@@@_ieeg_top_ext_@@@)$",
    "tokens": {
      "@@@_ieeg_top_ext_@@@": [
        "_ieeg.json",
        "_channels.tsv",
        "_electrodes.tsv",
        "_photo.jpg",
        "_coordsystem.json"
      ]
    }
  },
  "meg_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?task-[a-zA-Z0-9]+(?:_acq-[a-zA-Z0-9]+)?(?:_proc-[a-zA-Z0-9]+)?(?:@@@_meg_top_ext_@@@)$",
    "tokens": {
      "@@@_meg_top_ext_@@@": [
        "_meg.json",
        "_channels.tsv",
        "_photo.jpg",
        "_coordsystem.json"
      ]
    }
  },
  "multi_dir_fieldmap": {
    "regexp": "^\\/(?:acq-[a-zA-Z0-9]+_)?(?:dir-[a-zA-Z0-9]+_)epi.json$"
  },

  "other_top_files": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:recording-[a-zA-Z0-9]+_)?(?:task-[a-zA-Z0-9]+_)?(?:acq-[a-zA-Z0-9]+_)?(?:rec-[a-zA-Z0-9]+_)?(?:run-[0-9]+_)?(@@@_other_top_files_ext_@@@)$",
    "tokens": {
      "@@@_other_top_files_ext_@@@": ["physio.json", "stim.json"]
    }
  }
}

ChrisGorgolewski · October 19, 2018, 9:25pm

Interesting idea - how would such cheat sheet differ from a list of regular expressions?

tedstrauss · October 22, 2018, 1:46pm

1

This regex in file_level_rules.json defines rules for files in bottom-level folder for the anat modality.

"anat": {
    "regexp": "^\\/(sub-[a-zA-Z0-9]+)\\/(?:(ses-[a-zA-Z0-9]+)\\/)?anat\\/\\1(_\\2)?(?:_acq-[a-zA-Z0-9]+)?(?:_rec-[a-zA-Z0-9]+)?(?:_run-[0-9]+)?_(?:@@@_anat_suffixes_@@@).(@@@_anat_ext_@@@)$",
    "tokens": {
      "@@@_anat_suffixes_@@@": [
        "T1w",
        "T2w",
        "T1map",
        "T2map",
        "T1rho",
        "FLAIR",
        "PD",
        "PDT2",
        "inplaneT1",
        "inplaneT2",
        "angio",
        "SWImagandphase",
        "T2star",
        "FLASH",
        "PDmap",
        "photo"
      ],
      "@@@_anat_ext_@@@": ["nii.gz", "nii", "json"]
    }
  },

This can be re-written as:

/sub-<participant_label>/[ses-<session_label>/]anat/1[_2][_acq-<acquisition_label>][_rec-<rec_label>][_run-<run_label>][_](T1w|T2w|T1map|T2map|T1rho|FLAIR|PD|PDT2|inplaneT1|inplaneT2|angio|SWImagandphase|T2star|FLASH|PDmap|photo).(nii.gz|nii|json)

This path would pass the regex test:

/sub-21/ses-1/anat/1_acq-21_rec-3_run-1_T1w.json

@ChrisGorgolewski It seems there are some inconsistencies between this version and the version above. Have I re-written it accurately?

2

This regex in top_level_rules.json defines rules of the directory hierarchy:

 "anat_top": {
    "regexp": "^\\/(?:ses-[a-zA-Z0-9]+_)?(?:acq-[a-zA-Z0-9]+_)?(?:rec-[a-zA-Z0-9]+_)?(?:run-[0-9]+_)?(@@@_anat_suffixes_@@@).json$",
    "tokens": {
      "@@@_anat_suffixes_@@@": [
        "T1w",
        "T2w",
        "T1map",
        "T2map",
        "T1rho",
        "FLAIR",
        "PD",
        "PDT2",
        "inplaneT1",
        "inplaneT2",
        "angio",
        "SWImagandphase",
        "T2star",
        "FLASH",
        "PDmap",
        "photo"
      ]
    }
  },

This can be re-written as:

(coming soon)

ChrisGorgolewski · October 23, 2018, 6:03pm

tedstrauss:

This regex in file_level_rules.json defines rules for files in bottom-level folder for the anat modality.

"anat": {
    "regexp": "^\\/(sub-[a-zA-Z0-9]+)\\/(?:(ses-[a-zA-Z0-9]+)\\/)?anat\\/\\1(_\\2)?(?:_acq-[a-zA-Z0-9]+)?(?:_rec-[a-zA-Z0-9]+)?(?:_run-[0-9]+)?_(?:@@@_anat_suffixes_@@@).(@@@_anat_ext_@@@)$",
    "tokens": {
      "@@@_anat_suffixes_@@@": [
        "T1w",
        "T2w",
        "T1map",
        "T2map",
        "T1rho",
        "FLAIR",
        "PD",
        "PDT2",
        "inplaneT1",
        "inplaneT2",
        "angio",
        "SWImagandphase",
        "T2star",
        "FLASH",
        "PDmap",
        "photo"
      ],
      "@@@_anat_ext_@@@": ["nii.gz", "nii", "json"]
    }
  },

This can be re-written as:

/sub-<participant_label>/[ses-<session_label>/]anat/1[_2][_acq-<acquisition_label>][_rec-<rec_label>][_run-<run_label>][_](T1w|T2w|T1map|T2map|T1rho|FLAIR|PD|PDT2|inplaneT1|inplaneT2|angio|SWImagandphase|T2star|FLASH|PDmap|photo).(nii.gz|nii|json)

This path would pass the regex test:

/sub-21/ses-1/anat/1_acq-21_rec-3_run-1_T1w.json

@ChrisGorgolewski It seems there are some inconsistencies between this version and the version above. Have I re-written it accurately?

It seems references to subexpressions in the original regexp tripped your translation a little. This is the correct answer:

/sub-<participant_label>/[ses-<session_label>/]anat/sub-<participant_label>[_ses-<session_label>][_acq-<acquisition_label>][_rec-<reconstruction_label>][_run-<run_index>]_(T1w|T2w|T1map|T2map|T1rho|FLAIR|PD|PDT2|inplaneT1|inplaneT2|angio|SWImagandphase|T2star|FLASH|PDmap|photo).(nii.gz|nii|json)

cwatson · November 5, 2018, 1:24am

Would it also be helpful to replace some of the regexp’s with POSIX character classes? e.g.,
[a-zA-Z0-9] would become [:alnum:].

Similarly, [a-zA-Z] would be [:alpha:]; [0-9] would be [:digit:]; etc.

This would improve readability I think, although it is an extra layer of obfuscation for those not familiar with those definitions.