# Determining the most relevant subpopulations for encoding a certain behavior

Say you have the spike rasters for a really large population of neurons and a behavioral variable, and you want to find out which neurons (not individually but as a population) are responsible for encoding that particular variable. How would you go about determining which sub-populations are “useful”.

An unsupervised approach would be PCA/FA and look at the loading of the components/factors and see which neurons are highly represented but this will explain the variability with the entire activity, not the ones related to the behavioral variable of interest. One option I can think of is an supervised alternative like PLS or Linear model with L1 penalty and looking at the loadings. Are they any better ways? Also everything I mentioned considers only linear relationships. How can we include non-linearities?

1 Like

I think dPCA could be a good tool. This method has two steps, 1) a supervised part where you can define the parameters of your task and 2) an unsupervised part that is like PCA. This is the link: https://machenslab.org/resources-code/

It depends what you mean by “are responsible”. If you mean that they have a causal role in a given behaviour, that’s a tricky question that’s going to be addressed during week 3 during the causality day (W3D3); it’s still an active subject of research.

If you mean that a neuron carries information about behaviour that is not fully redundant with respect to other neurons, than linear with L1 will help you. Unlike PCA (W1D5), which gives multiple sets of components which carry variance, you will get only one set of components. Extending to nonlinear scenarios, in particular you can use a deep net with an L1 penalty on weights (W3D4). When you get to a deep net, you end up having multiple set of components (filters) instead of just 1; these filters can all lie in a low-rank subspace, so your first layer ends up having a dimensionality-reduction flavor.

1 Like

I’ve read the E-life paper. But dPCA is good for finding latent variables w.r.t. different classes of tasks. The behavior I’m interested in is a continuous variable eg. Motor output. Is there a similar method available for continuous data?

If I remember well it is possible to use “continue” data (discretized) with dPCA. However, I think another option is LDA. With LDA you can decode the population activity based on the variables of your task. I hope I’ve been able to help.

1 Like

It is common in both neuroscience and in machine learning to run PCA/FA/ICA first to get a low-dimensional set of neural traces, and then use your favorite decoding framework to predict your variable of interest from these traces. You would then need to link these two steps to determine how a single neuron relates to the variable of interest, or you can skip this step altogether and just look at how population activities (i.e. PCA/ ICA) relate to the variable of interest. We know that single neurons don’t really impact behavior: it’s only populations of neurons working together that can have an impact, and dimensionality reduction can get that for you.

This approach also suggests a simple nonlinear extension: instead of PCA, try using t-SNE to reduce the dimensionality of your variables. You will then have to choose your decoder carefully, because the neural traces can no longer be assumed to be linearly related to behavior. So instead of linear classifiers, you may want to consider some simple nonlinear classifiers, like k-nearest neighbors .

3 Likes

May you show dPCA working code example? Github demo was applied to noise and quite incomparable. When i tried to apply dPCA to my data i got stimulus, time and mixing components with very same shape. Is it ok or i have mistaken somewhere?