Kilosort cluster labels

I am interested in comparing the number of kilosort good units across different datasets. Is the kilosort label present in the IBL data? So far I have only seen the label in the clusters.metrics.pqt file, which, I believe, indicates how many of the three metrics criteria the cluster satisfies.

I was assuming that the total number of clusters in that clusters.metrics.pqt file was the number of total units outputted by Kilosort, but I suppose it could also be the number of kilosort good units instead.

Thanks!

Hello,

Yes the good units are those that have a value of 1 in the labels column in the clusters.metrics.pqt file. The value of 1 indicates that is passes all 3 of our metrics for a unit qc.

The total number of clusters is indeed the total output by kilosort regardless of quality.

Let me know if that isn’t clear :slight_smile:

1 Like

Hello,

Adding on what Mayo just wrote, we have finished a full reprocessing of the spike sorting for this brain wide map dataset, and one of the new addition is this ks2_label column to the metrics table.

It is not available as of today, but it will be a matter of a couple of weeks for this data to show up on the public database. I can suggest that you join the mailing list if you want to be advised as soon as the data is available !

Best,
Olivier

1 Like

Perfect, thanks to you both! I’ve joined the mailing list and will look forward to the ks2_label column being added to the metrics table

Hello, regarding the to-be-release datasets, we are considering whether to split the spikes from good units versus the bad ones into 2 distincts data files. Given that there are numerous spikes belonging to bad units, and that most users prefer to work with good units only, it will enable a faster data loading if we split the spikes from good units into a separate (smaller) file.
It will be explained in the data release notes as a change if we proceed with this new format, but I wanted to make sure you have your eyes open for this change since you are specifically interested in other units than just the good ones. Cheers

2 Likes

we are considering whether to split the spikes from good units versus the bad ones into 2 distincts data files. Given that there are numerous spikes belonging to bad units, and that most users prefer to work with good units only, it will enable a faster data loading if we split the spikes from good units into a separate (smaller) file.

This is an excellent idea, I do want to point out however that for certain analyses the bad units are still useful. For instance some decoding methods can use channels as features and may not want to rely on the current method of spike sorting adding biases to their analysis. Being able to readily relate then back to OG channel would be useful, groupby() on “peakchannel” for instance.

I did see this preprint Zhang, Y., He, T., Boussard, J., Windolf, C., Winter, O., Trautmann, E., … & Paninski, L. (2024). Bypassing spike sorting: Density-based decoding using spike localization from dense multielectrode probes. Advances in Neural Information Processing Systems , 36 . https://www.biorxiv.org/content/10.1101/2023.09.21.558869v1 , which seems to resolve many problems with current pipelines. Will you be using this or another method for the new labels?