Recommendations for Dog Behavior BIDS Dataset Structure

Hello everyone, I am a master’s student seeking support in the creation of a dataset structure for storing dog behavioral data in BIDS format.

At the moment in my university lab we have built the software for acquiring the following kinds of data for each dog subject:

  1. ecg data: tsv.gz format.
  2. gyroscope data: tsv format.
  3. video data: 4 video files per session in mp4 format. Each video is obtained by a camera located in a different corner of the experiment room.
  4. audio data: 2 audio files per session in wav format. Each audio is obtained by a different microphone. One is attached to the dog and the other to the owner.
  5. annotation from veterinaries and owner: tsv format. This file has 3 columns: timestamp, valence, arousal.

We are planning add eeg measurements of the dogs in the near future, so it would be great to already put all our data into a BIDS ready dataset.

The problem is that even after having read through all the relevant documentation I still don’t know where to put audio, video and ecg data.

As far as I have understood from the documentation, and by using the BIDS validator at this link, audio and video data unfortunately cannot be put in beh/.
Putting audio and video data in sourcedata/ does not seem right to me since we will be actively using them as part of some of our analysis using neural networks.

At the same time, putting the veterinary and owner labels (events) and ecg data (physio) in beh/ without the video data does not look right to me. The reason is that the events have been labeled using the video data, not the ecg data, and I fear it could cause confusion.
Also putting ecg data and events in motion/ with the gyroscope data does not look right to me for the same reason.

Since I don’t think that putting the ecg data in beh/ or motion/ would be correct, and I don’t know where to put video and audio data, the only option I see is to create a BIDS-like dataset that follows the BIDS style, but is not compatible with it, using the following structure:

DOG_BEHAVIOR_DATASET/
├── .bidsignore
├── dataset_description.json
├── README.md
├── CITATION.cff
├── LICENSE
├── CHANGES
├── participants.tsv (I have added the species column as per documentation)
├── participants.json
├── acq-owner_events.json
├── acq-expert01_events.json
├── acq-expert02_events.json
├── acq-expert02_events.json
└── sub-01/
    └── ses-scen01/
        ├── sub-01_ses-scen01_scans.tsv
        ├── sub-01_ses-scen01_scans.json
        ├── video/
        │   ├── sub-01_ses-scen01_task-treat_acq-owner_events.tsv
        │   ├── sub-01_ses-scen01_task-treat_acq-expert01_events.tsv
        │   ├── sub-01_ses-scen01_task-treat_acq-expert02_events.tsv
        │   ├── sub-01_ses-scen01_task-treat_acq-expert03_events.tsv
        │   ├── sub-01_ses-scen01_task-treat_recording-cam1_video.mp4
        │   ├── sub-01_ses-scen01_task-treat_recording-cam1_video.json
        │   ├── sub-01_ses-scen01_task-treat_recording-cam2_video.mp4
        │   ├── sub-01_ses-scen01_task-treat_recording-cam2_video.json
        │   ├── sub-01_ses-scen01_task-treat_recording-cam3_video.mp4
        │   ├── sub-01_ses-scen01_task-treat_recording-cam3_video.json
        │   ├── sub-01_ses-scen01_task-treat_recording-cam4_video.mp4
        │   └── sub-01_ses-scen01_task-treat_recording-cam4_video.json
        ├── audio/
        │   ├── sub-01_ses-scen01_task-treat_recording-dogmic_audio.wav
        │   ├── sub-01_ses-scen01_task-treat_recording-dogmic_audio.json
        │   ├── sub-01_ses-scen01_task-treat_recording-ownermic_audio.wav
        │   └── sub-01_ses-scen01_task-treat_recording-ownermic_audio.json
        ├── ecg/
        │   ├── sub-01_ses-scen01_task-treat_recording-ecg_physio.tsv.gz
        │   └── sub-01_ses-scen01_task-treat_recording-ecg_physio.json 
        └── motion/
            ├── sub-01_ses-scen01_task-treat_tracksys-gyro_motion.tsv
            ├── sub-01_ses-scen01_task-treat_tracksys-gyro_motion.json
            ├── sub-01_ses-scen01_task-treat_tracksys-gyro_channel.tsv
            └── sub-01_ses-scen01_task-treat_tracksys-gyro_channel.json

What do you think, is this a solid structure? Am I losing some details and it would be possible to organize this data in a BIDS dataset? Perhaps did I missed some new BEP? Could put everything in beh/ regardless be a good idea?

Also, for multi-camera setups, is ‘recording-cam1’ the preferred entity, or should I use ‘acq-’?

I would like to get this right since other lab members count on me, so please share your feedback if you can.

Also please edit the dataset structure for giving visual feedback.
I used this website https://tree.nathanfriend.com/ for creating it, using this input you can get it on the website and edit it.

DOG_BEHAVIOR_DATASET
  .bidsignore
  dataset_description.json
  README.md
  CITATION.cff
  LICENSE
  CHANGES
  participants.tsv
  participants.json
  acq-owner_events.json
  acq-expert01_events.json
  acq-expert02_events.json
  acq-expert02_events.json
  sub-01
    ses-scen01
      sub-01_ses-scen01_scans.tsv
      sub-01_ses-scen01_scans.json
      video
        sub-01_ses-scen01_task-treat_acq-owner_events.tsv
        sub-01_ses-scen01_task-treat_acq-expert01_events.tsv
        sub-01_ses-scen01_task-treat_acq-expert02_events.tsv
        sub-01_ses-scen01_task-treat_acq-expert03_events.tsv
        
        sub-01_ses-scen01_task-treat_recording-cam1_video.mp4
        sub-01_ses-scen01_task-treat_recording-cam1_video.json
        sub-01_ses-scen01_task-treat_recording-cam2_video.mp4
        sub-01_ses-scen01_task-treat_recording-cam2_video.json
        sub-01_ses-scen01_task-treat_recording-cam3_video.mp4
        sub-01_ses-scen01_task-treat_recording-cam3_video.json
        sub-01_ses-scen01_task-treat_recording-cam4_video.mp4
        sub-01_ses-scen01_task-treat_recording-cam4_video.json
      audio
        sub-01_ses-scen01_task-treat_recording-dogmic_audio.wav
        sub-01_ses-scen01_task-treat_recording-dogmic_audio.json
        sub-01_ses-scen01_task-treat_recording-ownermic_audio.wav
        sub-01_ses-scen01_task-treat_recording-ownermic_audio.json
      ecg
        sub-01_ses-scen01_task-treat_recording-ecg_physio.tsv.gz
        sub-01_ses-scen01_task-treat_recording-ecg_physio.json 
      motion
        sub-01_ses-scen01_task-treat_tracksys-gyro_motion.tsv
        sub-01_ses-scen01_task-treat_tracksys-gyro_motion.json
        sub-01_ses-scen01_task-treat_tracksys-gyro_channel.tsv
        sub-01_ses-scen01_task-treat_tracksys-gyro_channel.json

Hello everyone, I’m writing this post asking for feedback on a dataset structure that I intend to adopt.

Context
In our lab we are doing a study on dog behavior. In this study each dog is a subject, and each session corresponds to a different scenario. One session could be the owner taking to the dog and petting it for 5 minutes, another one could be the owner giving some treat to the dog etc.

At the moment we have built an experimental setup for acquiring the following kinds of data for each dog subject:

  1. ecg + accelerometer + gyroscope data: saved in single file tsv format with timestamp as first column.
  2. video data: 4 video files per session in mp4 format. Each video is obtained by a camera located in a different corner of the experiment room.
  3. audio data: 2 audio files per session in wav format. Each audio is obtained by a different microphone. One is on the dog and the other one is on the owner.
  4. annotation from veterinarians and owner: the annotations made by one person are a tsv format file. This file has 3 columns: timestamp, valence, arousal. Since there are more persons doing the annotation there would be multiple annotation files.

Some lab members are planning to add eeg measurements of the dogs in the future, so I think it would be great to already start using the BIDS standard as soon as possible. Beacuse of this I went through all the documentation online with the goal of creating an appropriate dataset structure for our data.

Proposed dataset
The conclusion I reached is that all the data we are currently gathering should be put inside the /beh folder.
The reason is that at this page it is written the following: “In addition to logs from behavioral experiments performed alongside imaging data acquisitions, one MAY also include data from experiments performed with no neural recordings. The results of those experiments MAY be stored in the beh”

Based on this I have came up with the following dataset structure (for simplicity the json sidecars for some of the files are not listed here):

DOG_BEHAVIOR_DATASET/
├── .bidsignore
├── dataset_description.json
├── README.md
├── CITATION.cff
├── LICENSE
├── CHANGES
├── participants.tsv
├── participants.json
└── sub-01/
    ├── sub-01_sessions.tsv
    ├── sub-01_sessions.json
    └── ses-scen01/
        ├── sub-01_ses-scen01_scans.tsv
        ├── sub-01_ses-scen01_scans.json
        └── beh/
            ├── sub-01_ses-scen01_task-treat_acq-owner01_events.tsv
            ├── sub-01_ses-scen01_task-treat_acq-expert01_events.tsv
            ├── sub-01_ses-scen01_task-treat_acq-expert02_events.tsv
            ├── sub-01_ses-scen01_task-treat_acq-expert03_events.tsv
            ├── ...
            ├── sub-01_ses-scen01_task-treat_recording-cam1_video.mp4
            ├── sub-01_ses-scen01_task-treat_recording-cam2_video.mp4
            ├── sub-01_ses-scen01_task-treat_recording-cam3_video.mp4
            ├── sub-01_ses-scen01_task-treat_recording-cam4_video.mp4
            ├── sub-01_ses-scen01_task-treat_recording-dogmic_audio.wav
            ├── sub-01_ses-scen01_task-treat_recording-ownermic_audio.wav
            └── sub-01_ses-scen01_task-treat_acq-ecg+acc+gyro_beh.tsv

Questions

  1. Do you think this is a good structure? Am I losing some details or perhaps I missed some new BEP?
  2. Video and audio data cannot be put on beh/, should I make the validator ignore them using the .bidsignore file? (Putting audio and video data in sourcedata/ does not seem appropriate to me since we will be using them directly, and not as source data that has to be processed, for some of our analysis using neural networks.)
  3. The ecg+motion tsv file has the timestamp in the first column, and related data in the other columns. This file should have the suffix _beh right?
  4. Since we are working with a multi-camera setup, is ‘recording-cam1’ the preferred entity, or should I use ‘acq-’?