How should we benchmark transfer between unpaired neural spike-train datasets?
I would like to ask for advice and feedback on how to benchmark transfer between multineuronal spike-train datasets recorded under different experimental conditions.
In many neuroscience datasets, recordings differ in preparation, species, brain region, recording technology, behavioral state, or experimental protocol. There is usually no one-to-one correspondence between neurons across datasets. This makes it difficult to directly compare or transfer neural population activity across experiments.
We recently studied one concrete example of this broader problem: bidirectional transfer between unpaired in vitro and in vivo multineuronal spike trains. We formulated this as a time-resolved neural-domain transfer task between sparse binary population spike-train sequences.
The associated study uses an autoregressive Transformer with Dice loss for sparse neural event generation and evaluates performance using ROC-AUC, Precision–Recall curves, and PR-AUC / average precision.
Peer-reviewed article:
https://doi.org/10.3390/a19040305
Code:
Archived software release:
Hugging Face paper page:
I would be interested in feedback on:
- Suitable evaluation metrics for sparse neural event generation beyond ROC-AUC and PR-AUC.
- Benchmark designs for transfer between unpaired neural spike-train datasets.
- How to define successful transfer when neurons and recording conditions are not matched.
- Appropriate baselines, such as latent dynamics models, neural foundation models, LFADS-like models, or Neural Data Transformer-type models.
- Public datasets that could test transfer across preparations, species, brain regions, or behavioral states.
Any comments, suggestions, or related references would be very welcome.