Say you have the spike rasters for a really large population of neurons and a behavioral variable, and you want to find out which neurons (not individually but as a population) are responsible for encoding that particular variable. How would you go about determining which sub-populations are “useful”.
An unsupervised approach would be PCA/FA and look at the loading of the components/factors and see which neurons are highly represented but this will explain the variability with the entire activity, not the ones related to the behavioral variable of interest. One option I can think of is an supervised alternative like PLS or Linear model with L1 penalty and looking at the loadings. Are they any better ways? Also everything I mentioned considers only linear relationships. How can we include non-linearities?