BIDS "continuous integration"

We are looking into implementing some testing on the BIDS datasets we acquire and run after each session acquisition and automatic conversion to BIDS.
The session would be converted using heudiconv+datalad on a separate branch, opening a PR, then other commits would add behavioral files (or physio, eyetracking,…) to that branch.

The idea would be to then have so GH actions such as:

  • running the BIDS validator
  • running dataset specific basic tests (checking MRI parameters, checking if the events file contains the expected info, eg. number of events…, maybe raise errors if high number of missing participant’s responses…)
  • one could imagine having automatic QC on the images themselves (eg. check brain coverage of fMRI, gross artifacts), and being able to check these images in the browser (likely on a self-hosted CI service).

  • And then have the dataset maintainer to review these PRs and fix/settle the errors raised by the tests.

Is that something that was already explored by others? @eknahm
I think there could be some a set of generic tests that could be deployed for any BIDS dataset acquisition and help flag errors as early as possible in the acquisition process and also track the fixes applied.
I never played with CI on GH, so my knowledge on this is superficial.



Overall sounds “fun” but not sure how easy to setup something flexible etc. Some aspects to keep in mind came to mind right away. Might comment more later on:

  • conflicts are unavoidable to some files such as paricipants.tsv right upon merging any new subject PR. would be a bit annoying to fix. Might want to have some helper bot which would automagically update PR providing conflict resolution for such files
  • github and its CI might be “suboptimal” since it would
    • impose resource limits on how long processing could run
    • access to data might be tricky etc
      So might be worth looking into establishing a local gitlab instance and its CI for the mission?
  • in principle, if we listen to YODA principles, all QA results etc should be done in a dataset which would use hediconv-ed one. So setup might be a bit “tricky” to setup such tandems of PRs across datasets if going for github/gitlab setup
  • some actions I foresee might be needed for the bot to understand/do on that PR
    • drop or remove some files (e.g. __dup ones from reproin)
1 Like