Feature selection for ROI anlaysis at individual level

I am studying individual differences by doing mvpa for each ROI of each individual subject. I need some suggestions on feature selection. For now, I am just using univariate feature selection with cross-validation. I have 3 related questions:

  1. I know there are several different feature selection strategies to use. I’m wondering what are the pros and cons of each of the strategy in my case?

  2. In one of the previous posts, it is recommended to select the same amount of voxels in each ROI for better comparison. How necessary is that? If I have to do that, where in the algorithm should this be added? Could anyone give me a script example?

  3. How necessary it is to use regularization after feature selection?

1 Like
  1. It depends on what you final goal is… but Univariate selection is the most advisable procedure: it is fast, and yields good results for prediction.

  2. I think that this is because you want to compare prediction accuracy of the different ROIs. The problem is that if you use different number so of voxels you may trivially find that the regions with more voxels selected yields better prediction. Somehow number of voxels becomes a confound to interpret your results.

  3. Yes. Regularization, possibly a small one, is always better than no regularization.

Hi Bertrand,

Thank you for your answer! I have two follow-up questions.

  1. As I’ve mentioned, I want to look at individual differences across subjects. Shall I select the same amount of voxels for every ROI for every subject? If yes, how to decide the suitable amount of voxel?

  2. What do you mean by a ‘small’ regularization?


  1. If you don’t compare subjects to each other, I don’t see a strong reason to impose the same number of voxels across subjects. Yet, fixing this number simplifies your analysis and the description you can provide of your analysis.
  2. A small amount of regularization (small value of alpha parameter in scikit learn).
1 Like

Thanks for the clarification, Bertrand!

it is important to establish correspodence across subjects in the prediction task - so if you pick p ROIs (or the stats derived from them), you must feed ML model with features in the exact same order (which will require the same number of features). It is possible to have different number of voxels per ROI, but the stats you derive from can have the same dimensionality across the subjects. So keep that in mind.