I am working with the new NiLearn decoding.Decoder wrapper function, and I cannot figure out when/how the ANOVA feature selection function is being implemented. My question is if the ANOVA is being run on the full data set first, BEFORE any of the cross-validation folds, OR is it being run on each cross-validation run separately? My understanding is that if the ANOVA is being run on all the data initially, this would introduce a peeking bias and inflate the accuracy, whereas if it is run on each fold individually this would avoid the bias of looking at the hold-out data. Thanks!
Hello, it uses the SelectPercentile from scikit-learn. You are entirely right that performing the feature selection on the whole data would be a mistake and result in overfitting. the Decoder object performs a feature selection for each fold, using only the training data
the feature selection happens here , in the
_parallel_fit function, and as you can see the selector is fitted using only X_train and y_train. the selector is a
sklearn.feature_selection.SelectPercentile, and its
score_func parameter is
sklearn.feature_selection.f_regression depending on whether the decoding task is a classification or regression problem
Perfect, thank you so much!