Hello everyone, I am a master’s student seeking support in the creation of a dataset structure for storing dog behavioral data in BIDS format.
At the moment in my university lab we have built the software for acquiring the following kinds of data for each dog subject:
- ecg data: tsv.gz format.
- gyroscope data: tsv format.
- video data: 4 video files per session in mp4 format. Each video is obtained by a camera located in a different corner of the experiment room.
- audio data: 2 audio files per session in wav format. Each audio is obtained by a different microphone. One is attached to the dog and the other to the owner.
- annotation from veterinaries and owner: tsv format. This file has 3 columns: timestamp, valence, arousal.
We are planning add eeg measurements of the dogs in the near future, so it would be great to already put all our data into a BIDS ready dataset.
The problem is that even after having read through all the relevant documentation I still don’t know where to put audio, video and ecg data.
As far as I have understood from the documentation, and by using the BIDS validator at this link, audio and video data unfortunately cannot be put in beh/.
Putting audio and video data in sourcedata/ does not seem right to me since we will be actively using them as part of some of our analysis using neural networks.
At the same time, putting the the veterinary and owner labels (events) and ecg data (physio) in beh/ without the video data does not look right to me. The reason is that the events have been labeled using the video data, not the ecg data, and I fear it could cause confusion.
Also putting ecg data and events in motion/ with the gyroscope data does not look right to me for the same reason.
Since I don’t think that putting the ecg data in beh/ or motion/ would be correct, and I don’t know where to put video and audio data, the only option I see is to create a BIDS-like dataset that follows the BIDS style, but is not compatible with it, using the following structure:
DOG_BEHAVIOR_DATASET/
├── .bidsignore
├── dataset_description.json
├── README.md
├── CITATION.cff
├── LICENSE
├── CHANGES
├── participants.tsv (I have added the species column as per documentation)
├── participants.json
├── acq-owner_events.json
├── acq-expert01_events.json
├── acq-expert02_events.json
├── acq-expert02_events.json
└── sub-01/
└── ses-scen01/
├── sub-01_ses-scen01_scans.tsv
├── sub-01_ses-scen01_scans.json
├── video/
│ ├── sub-01_ses-scen01_task-treat_acq-owner_events.tsv
│ ├── sub-01_ses-scen01_task-treat_acq-expert01_events.tsv
│ ├── sub-01_ses-scen01_task-treat_acq-expert02_events.tsv
│ ├── sub-01_ses-scen01_task-treat_acq-expert03_events.tsv
│ ├── sub-01_ses-scen01_task-treat_recording-cam1_video.mp4
│ ├── sub-01_ses-scen01_task-treat_recording-cam1_video.json
│ ├── sub-01_ses-scen01_task-treat_recording-cam2_video.mp4
│ ├── sub-01_ses-scen01_task-treat_recording-cam2_video.json
│ ├── sub-01_ses-scen01_task-treat_recording-cam3_video.mp4
│ ├── sub-01_ses-scen01_task-treat_recording-cam3_video.json
│ ├── sub-01_ses-scen01_task-treat_recording-cam4_video.mp4
│ └── sub-01_ses-scen01_task-treat_recording-cam4_video.json
├── audio/
│ ├── sub-01_ses-scen01_task-treat_recording-dogmic_audio.wav
│ ├── sub-01_ses-scen01_task-treat_recording-dogmic_audio.json
│ ├── sub-01_ses-scen01_task-treat_recording-ownermic_audio.wav
│ └── sub-01_ses-scen01_task-treat_recording-ownermic_audio.json
├── ecg/
│ ├── sub-01_ses-scen01_task-treat_recording-ecg_physio.tsv.gz
│ └── sub-01_ses-scen01_task-treat_recording-ecg_physio.json
└── motion/
├── sub-01_ses-scen01_task-treat_tracksys-gyro_motion.tsv
├── sub-01_ses-scen01_task-treat_tracksys-gyro_motion.json
├── sub-01_ses-scen01_task-treat_tracksys-gyro_channel.tsv
└── sub-01_ses-scen01_task-treat_tracksys-gyro_channel.json
What do you think, is this a solid structure? Am I losing some details and it would be possible to organize this data in a BIDS dataset? Perhaps did I missed some new BEP? Could put everything in beh/ regardless be a good idea?
Also, for multi-camera setups, is ‘recording-cam1’ the preferred entity, or should I use ‘acq-’?
I would like to get this right since other lab members count on me, so please share your feedback if you can.
Also please edit the dataset structure for giving visual feedback.
I used this website https://tree.nathanfriend.com/ for creating, using this input you can get it on the website and edit it.
DOG_BEHAVIOR_DATASET
.bidsignore
dataset_description.json
README.md
CITATION.cff
LICENSE
CHANGES
participants.tsv
participants.json
acq-owner_events.json
acq-expert01_events.json
acq-expert02_events.json
acq-expert02_events.json
sub-01
ses-scen01
sub-01_ses-scen01_scans.tsv
sub-01_ses-scen01_scans.json
video
sub-01_ses-scen01_task-treat_acq-owner_events.tsv
sub-01_ses-scen01_task-treat_acq-expert01_events.tsv
sub-01_ses-scen01_task-treat_acq-expert02_events.tsv
sub-01_ses-scen01_task-treat_acq-expert03_events.tsv
sub-01_ses-scen01_task-treat_recording-cam1_video.mp4
sub-01_ses-scen01_task-treat_recording-cam1_video.json
sub-01_ses-scen01_task-treat_recording-cam2_video.mp4
sub-01_ses-scen01_task-treat_recording-cam2_video.json
sub-01_ses-scen01_task-treat_recording-cam3_video.mp4
sub-01_ses-scen01_task-treat_recording-cam3_video.json
sub-01_ses-scen01_task-treat_recording-cam4_video.mp4
sub-01_ses-scen01_task-treat_recording-cam4_video.json
audio
sub-01_ses-scen01_task-treat_recording-dogmic_audio.wav
sub-01_ses-scen01_task-treat_recording-dogmic_audio.json
sub-01_ses-scen01_task-treat_recording-ownermic_audio.wav
sub-01_ses-scen01_task-treat_recording-ownermic_audio.json
ecg
sub-01_ses-scen01_task-treat_recording-ecg_physio.tsv.gz
sub-01_ses-scen01_task-treat_recording-ecg_physio.json
motion
sub-01_ses-scen01_task-treat_tracksys-gyro_motion.tsv
sub-01_ses-scen01_task-treat_tracksys-gyro_motion.json
sub-01_ses-scen01_task-treat_tracksys-gyro_channel.tsv
sub-01_ses-scen01_task-treat_tracksys-gyro_channel.json