When using nilearn DecoderRegressor, I get a "ConvergenceWarning", "Solver terminated early (max_iter=10000). Consider pre-processing your data with StandardScaler"

I have a nifti object train_X of shape (91, 109, 91, 2947) (i.e., 2947 images).

Each image in the nifti object has a numeric value associated with it in a list train_y, and I am training a nilearn.decoder.DecoderRegressor predictor on this image set.

from sklearn.model_selection import GroupKFold
cv_inner = GroupKFold(3)
regressor = DecoderRegressor(standardize= True,cv=cv_inner, scoring="r2")
regressor.fit(y=train_y,X=train_X,groups=train_groups)

When running, I get the following error:

/home/bsmith16/.conda/envs/neuralsignature/lib/python3.8/site-packages/sklearn/svm/_base.py:255: ConvergenceWarning: Solver terminated early (max_iter=10000).  Consider pre-processing your data with StandardScaler or MinMaxScaler.
warnings.warn('Solver terminated early (max_iter=%i).'
/home/bsmith16/.conda/envs/neuralsignature/lib/python3.8/site-packages/sklearn/svm/_base.py:255: ConvergenceWarning: Solver terminated early (max_iter=10000).  Consider pre-processing your data with StandardScaler or MinMaxScaler.
warnings.warn('Solver terminated early (max_iter=%i).'
/home/bsmith16/.conda/envs/neuralsignature/lib/python3.8/site-packages/sklearn/svm/_base.py:255: ConvergenceWarning: Solver terminated early (max_iter=10000).  Consider pre-processing your data     with StandardScaler or MinMaxScaler.
warnings.warn('Solver terminated early (max_iter=%i).'

It’s puzzling because I am passing in the standardize argument to the DecoderRegressor.

If I run the first 500 images only through the Decoder, I do not get the same problem, so perhaps it’s some kind of memory issue. But the system is managing to run.

Any ideas what could be going on here? Is this an unusually large dataset (should I be trying to mask it more than it is?)

1 Like

it’s not related to memory but to the convergence of the optimization algorithm that finds the estimator’s parameters.
the optimization can stop for 2 reasons (i) it reached its stopping criterion – it deems that it reached the minimum of the function it is trying to minimize (ii) it already performed a predefined number of iterations (10,000 in this case), so it gives up and stops so that it doesn’t keep running forever. In the case of (ii) it warns you that it eventually gave up (“terminated early”) without reaching its convergence criterion.

the standardscaler is just a suggestion of a way to preprocess the data that may improve convergence.
as you already did that, there are a few more things you can try:

  • use the screening_percentile to start with a univariate feature selection, so that the problem fed to the estimator is better posed (features have less dimensions)
  • increase the regularization so that the problem is better conditioned; you can use the param_grid of the Decoder to set the C of the svr
  • increase the maximum number of iterations (eg 100K instead of 10K), also with param_grid to set the max_iter
  • use a different estimator; eg the Ridge doesn’t use an iterative solver

Thank you for those suggestions! I’ll have a good look to see how I can implement.