Is it BIDS valid to merge raw EEG files?

We have a large EEG data set with (among others) resting state data that are recorded before and after a 2 hours neurocognitive testing. All is within one recording session. The data set comprises about 800 recordings. For clarity, I think it would be helpful to simply combine the segments with eyes closed and eyes open or before and after the experimental blocks into one file and provide them with triggers. Given the number of files, this would be easier for anyone who wants to work with the data set. But does this contradict the BIDS principles?

1 Like

Let me ping some BIDS folks that have more EEG experience than me.

  • I would leave the files as separate, and I would fine that clearer than having to describe that you “merged” (concatenated) several raw data files that weren’t continuous before.
  • You’d also be introducing a discontinuity (a “break”) in the data at the concatenation point, which would of course be fine if documented …
  • but I don’t see the reduction in “to be shared files” as a big benefit (it wouldn’t reduce the size of the dataset either).

I’m not sure about that.

I think you’d be sacrificing clarity for saving dataset users a few lines of code (loading the data and concatenating it themselves) – IMHO it wouldn’t be worth it.

I agree with Stefan and would also not merge them.

The BrainVision data format is the only format that has an explicit natural description for segments of different length (as their recording system allows writing-to-disk to be paused) with the “New Segment” marker. In all other file formats the segment boundary and jump in the data would have to be documented with a numerical trigger of some sort. Of course - besides the representation in the EEG file itself - you could also document the segment boundary in the events.tsv, but then you limit proper processing to BIDS-aware software, whereas a nice feature is that BIDS does allow old software still to work on the data.

In your specific case, I think I would also use the task entity to document in each filename what the relation of the files is, like task-eyesopen and task-eyesclosed, or task-before and task-after.


Thanks a lot for your answers.
I will go for the separate files to make the data as accessible as possible.

One more question. If a file is missing for a subject. Should we skip the entire subject or write those data that are available.
This is in particular important since we have two waves of data collection. in the first one 650 participants are recorded in the second (5 years interval) only 150. If completeness of the datasets play a role 1 would make 2 data sets. The first with the first wave, the second with those subjects that have a complete recording yet.

The more complex a (longitudinal) design, the more likely it is that you will have missing data somewhere. Better share what you have and have the recipient of the data figure out which subjects can be included for a follow-up analysis. Perhaps the data that is missing for you is not needed for the follow up analysis, and then it would be a shame if the whole subject were missing.

In your case the two waves can probably be represented as two sessions. If I were interested in the longitudinal effect, I would select subjects that have ses-01 and ses-02, but for other analysis I might restrict myself to ses-01, or ses-02, or even simply concatenate all recordings from ses-01 and ses-02.

1 Like