Feature selections strategy

Mathieu · March 15, 2019, 8:08am

Hi everyone,

I have a question concerning the strategy to adopt with feature selection.

We ran a fMRI experiment with young adults (25-35) and adolescents (14-17). In this experiment, subjects were asked to observe social and transitive actions, we were interested in the development of the action observation network especially during adolescence.

We work with several a-proiri (functional) ROI that came from a previous meta-analysis ran in the team. Then we conducted univariate and multivariate analyses in those ROIs.

Each ROI are transformed from the MNI-space to the native space of each of my subject in order to make MVPA classification. However, I have ROIs with different number of voxels and I think it could biais the results.

First, I masked each ROI with the mask of grey matter (obtain from my segmentation; CAT12) in order to consider only voxels containing grey matter in the classification. But now, I want to make comparable my analyses across my subjects. Is it important to have the same number of voxels for each ROI? for each subject?

My question is: What is usually used in the litterature in order to compare ROIs with comparable voxels number across participants?

Do I have to select arbitrarily a number of voxels equally for each subject and ROI with the risk of supressing valuable information that is more widespread in certain ROI (e.g., Lateral Occipital Cortex)?

Is there a way to select the same number of voxels in each ROIs based on their “importance” on the classifier and how can I do that with nilearn?

Thank you very much for your helpful answers!

Mathieu

jAchtzehn · June 4, 2019, 1:13pm

Hallo Mathieu,

even though it has been a while I hope I can still help! I am actually in the exact same boat as you right now I cannot answer all of your questions as I am not very experienced with MVPA.

First, I masked each ROI with the mask of grey matter (obtain from my segmentation; CAT12) in order to consider only voxels containing grey matter in the classification. But now, I want to make comparable my analyses across my subjects. Is it important to have the same number of voxels for each ROI? for each subject?

Yes, I think it is good practice to have the same amount of voxels, at least at the individual level. Otherwise one could argue that the classifier is able to classify better in certain brain regions if it has more features (voxels) available.

Do I have to select arbitrarily a number of voxels equally for each subject and ROI with the risk of supressing valuable information that is more widespread in certain ROI (e.g., Lateral Occipital Cortex)?

Currently I am evaluating different feature selection methods. A good overview and implementation examples can be found here https://scikit-learn.org/0.18/modules/feature_selection.html#feature-selection
For example, you could do a recursive feature elimination (RFE) to find the optimal number of voxels for each ROI. You can also predefine a certain amount of voxels and the RFE will exclude voxels that fall below that threshold, based on there importance for the classifier. Just be aware that you need to alter your crossvalidation scheme when using these methods. You cannot do the feature selection and train on the same subset of the data, as this would artificially increase you classifier performance.