What's the correct way to implement nested cross-validation in nilearn across both decoder and estimator parameters?

In nilearn, one can do cross-validation on parameters for the estimator by setting the param_grid function.

For instance, here’s a brief example trying different regularization parameters for the SVR regressor:

regressor = DecoderRegressor(
estimator = 'svr',
param_grid = {'C':[0.04,0.2,1]},
standardize= True,cv=cv_inner, scoring="r2")

If you want to do nested cross-validation, then you can wrap this within a train/test split like:

for train, test in cv.split(...):
    train_X = train.X
    train_Y = train.Y
    test_score = regressor.score(test_X,test_y)

This will help us to get an unbiased test estimate across the whole dataset while also not double-dipping when selecting the right parameter (I may need to add one extra fit at the end across the training and test data, with the best selected C, to get a final unbiased estimate).

DecoderRegressor is a wrapper for estimators, but DecoderRegressor itself has parameters that can be tested, for instance, screening_percentile.

These cannot be simply passed to param_grid to test, because param_grid values are passed on to the estimator itself and are not directly used by DecoderRegressor to implement through its own parameters.

To search hyperparameters for the DecoderRegressor, typically one can do something like this Haxby example: Nilearn: Statistical Analysis for NeuroImaging in Python — Machine learning for NeuroImaging

Applying that example to my pseudocode above might look like:


for train, test in cv.split(...):
    train_X = train.X
    train_Y = train.Y
        for sp in [1,10,100]:
            test_score = regressor.score(test_X,test_y)

My question is, how to integrate these two approaches so you’re not doing more cross-validation than necessary?

Now the above is optimizing over both C and screening_percentile, but not in a consistent manner. The regressor object as defined above would include the search over three different values of C, and if passed through the above loop, is also trying three different values of sp.

But I’m not treating them entirely consistently. I’m getting a new test_score on the test data for each sp. In contrast, what’s happening to C is ‘under the hood’ so to speak, but it doesn’t use the outer loop test group to estimate its value. Rather, if I understand DecoderRegressor correctly, there’s an inner train/test split that is being used to split different values. screening_percentile_range should be treated the same way, but I don’t know how to do that without adding a third level of train/test split. So there are inconsistent approaches for C and for sp, but both can’t be right.

Or possibly my understanding of what’s going on inside DecoderRegressor is wrong. If it iterates through different values of param_grid, and within each, fits across the entire set, then perhaps treatment of screening_percentile and C are equivalent. But I don’t think that’s right because there’s some CV inside DecoderRegressor as well.

Anything I’m missing here, and if what I’ve done above is not correct, what is the correct way to do this?

Thx for raising the point.
The solution to your concern would be to rely on a Pipeline object (from sklearn.pipeline), and tunr the k parameter of anova-based feature selection.
The problem is that the current implementation of the Decoder does not support that. This is a bug imho.
Can you open an issue ? Thx,

I’ve just seen #2883 thx

Hi bthirion,

I reported #2883 narrowly as a problem with the documentation. But as I said in passing in #2883, as well as above, it would be helpful to be able to compare regressors and DecoderRegressor parameters as well as parameters via param_grid itself.

Should I open another issue as a feature request, or as a second bug?

Another solution might be just to avoid the Decoder object altogether, and access sklearn estimators directly.

Does nilearn have any accessible method that will do the pre-processing that the Decoder object does (e.g., transform from nifti into an array; standardize; feature selection) but then just return the raw data so that I could put it in an sklearn function myself?

Yes, you can always use a NiftiMasker to obtain your data, then run sklearn’s objects and nested cross-val “manually”. This was how it was done before we introduced the Decoder object.

No need. We have identified the problem I think.