Feature selection for classification with ROI masks

Hello everyone,
I am doing classification with the nilearn.decoding.Decoder() function,

cv_motor = LeaveOneGroupOut()
decoder_motor = Decoder(estimator='svc', mask=mask_img, cv=cv_motor, screening_percentile = 20, standardize='zscore_sample', scoring='accuracy')
decoder.fit(X_train, y_train, groups=groups)

I wish to apply feature selection for the decoder, to select those voxels within ROI masks (mean voxel n = 800) with higher scores. The argument “screening_percentile” here is “corrected according to volume of mask, relative to the volume of standard brain”, which should take the mask into account and take the set percentile within the mask.
Between screening_percentile = 20 and screening_percentile = 1, I got the same decoder.coef_.shape with the same amount of decoder.coef_ which is not zero, with the values changed minorly only after decimal points.

I tried to find other function to do this feature selection, e.g., sklearn.feature_selection.SelectPercentile.
I need to do the classification while applying:

  1. ROI mask (I can do this before fitting the decoder)
  2. LeaveOneGroupOut() cross-validation (I have 6 groups; one image for each condition (n=2) per group)
  3. z-score standardization
  4. feature selection based on either 20 percentile of the feature or 150 voxels for example

Question:

  • A. How can I check the difference between different screening_percentile with Decoder()?
  • B. Does “apply feature selection for the decoder to select those voxels within ROI masks with higher scores” sound a reasonable procedure?
  • C. Is there any way to code with a different function other than Decoder() while applying the needs stated above (1-4)? Or how can I apply something similar to .SelectPercentile() function after defining the decoder with Decoder()?

Any idea or suggestion please?
Many thanks.

Thanks for putting this topic forward.
First, if you have as little as 800 voxels, you probably don’t need a feature selection…
A. screening_percentile in Decoder is computed wrt the MNI brain regions, irrespective of the mask you provided
B Yes, although I conjecture that feature selection will not bring important benefits given the number of voxels you have.
C. You can set up a sklearn pipeline: simply use the NiftiMaskser to extract the data, and then you have Numpy arrays, you can use a standard sklearn pipeline.
Does that address your questions ?
Best,
Bertrand

Dear Bertrand,

Thank you very much for your reply.
Before closing this topic, I would like to double-check some points again, if I may.

A. Indeed you have mentioned the point that the “screening_percentile” computes the values relative to the volume of standard brain; however, the sentence “corrected according to volume of mask” makes me doubt it again.
B. I read a method from Rezk et al. 2020. Current Biology. that “An ANOVA-based feature selection was implemented on the training data in each cross-validation fold to identify the most informative 150 voxels”, regarding my voxel size is 2.69x2.69x2.7mm, I was wondering if it is possible to do the similar screening like this, e.g., top 20% of voxels within ROI with the highest scores.
The ROIs were defined by the thresholded functional motor localizer contrast t-maps, so supposes to keep some individual information.
I think another possibility is to use the 10mm sphere ROI around the peak. This seems also a more classic way.

Many thanks,
Tzuyi