Benchmarking transfer across unpaired neural spike-train datasets

How should we benchmark transfer between unpaired neural spike-train datasets?

I would like to ask for advice and feedback on how to benchmark transfer between multineuronal spike-train datasets recorded under different experimental conditions.

In many neuroscience datasets, recordings differ in preparation, species, brain region, recording technology, behavioral state, or experimental protocol. There is usually no one-to-one correspondence between neurons across datasets. This makes it difficult to directly compare or transfer neural population activity across experiments.

We recently studied one concrete example of this broader problem: bidirectional transfer between unpaired in vitro and in vivo multineuronal spike trains. We formulated this as a time-resolved neural-domain transfer task between sparse binary population spike-train sequences.

The associated study uses an autoregressive Transformer with Dice loss for sparse neural event generation and evaluates performance using ROC-AUC, Precision–Recall curves, and PR-AUC / average precision.

Peer-reviewed article:

https://doi.org/10.3390/a19040305

Code:

Archived software release:

Hugging Face paper page:

I would be interested in feedback on:

  1. Suitable evaluation metrics for sparse neural event generation beyond ROC-AUC and PR-AUC.
  2. Benchmark designs for transfer between unpaired neural spike-train datasets.
  3. How to define successful transfer when neurons and recording conditions are not matched.
  4. Appropriate baselines, such as latent dynamics models, neural foundation models, LFADS-like models, or Neural Data Transformer-type models.
  5. Public datasets that could test transfer across preparations, species, brain regions, or behavioral states.

Any comments, suggestions, or related references would be very welcome.