Questions about MVPA with NiLearn

LCJ · May 25, 2020, 7:49am

Hello

I have applied MVPA using NiLearn to my data over the weekend and now I have some questions. Would greatly appreciate some feedback/further information/resources on these.

I have more features than examples. I think this must be common in MVPA? What are some of the issues/considerations with this? Which algorithms are appropriate other than SVC (assuming SVC is)?
What do we need to do about correlated features in MVPA? Voxel signals violate the independence assumption. Maybe if we use some data reduction technique this might also alleviate #1? But then it might render things less interpretable.
Is it ok to use less interpretable algorithms for classification, sacrificing ability to visualise the results? I found that a linear SVC kernel performed at ~chance level but changing this to RBF and tuning with grid search gave me almost 70% accuracy.

Thanks!

tbazeille · May 25, 2020, 10:14am

Hello ! Thanks for the great questions.

1 Yes it’s common, so usually that’s why we used regularized ML models anyways. To alleviate this problem the usual thing is to keep only some relevant voxels :

using priors such as masks i.e. keeping only a region of interest of the brain or at least only the cortex and not the background
with an feature selection scheme before you’re classifier (be it SVC or something else). This is a nilearn exemple using ANOVA and this section of documentation show you how to use other scikit-learn schemes.

2 Models usually do ok with correlated features, what they don’t like is correlated samples (images here). To tackle this, you’d better do MVPA after fitting a GLM or at least detrending your data.
Then I don’t have so much thoughts on interpretability and biaises of the decoding maps besides it’s a bit polemic. I think from a machine learning stand point, the decoding maps, although looking good lack clear interpretability in most of the cases because feature importance estimation may be biaised as you mentionned but also because decoders are often very constrained by an arbitrary regularization that acts as some some kind of supplementary hypothesis and should be accounted for if you were to interpret it.

3 This depends a lot on what you are trying to make your results say. If you want to conclude there is some signal that enable you to discriminate behavioral conditions in a certain regions, a non-interpretable model makes this point just as well (if the methodology is done well to avoid overfitting). Be careful though if you compare directly accuracies obtained from different ROI (as often done in litterature) these comparisons are not necessarily significant.

HTH

LCJ · May 26, 2020, 7:59am

Hi @tbazeille thank you for your insight.

I have been exploring this further and I wonder if I could get some further feedback on what I’ve done?

As suggested in the Mumford 2012 paper I fitted a GLM with a regressor for each trial and modelled the movement parameters as regressors of no interest. This gave me 82 beta images/trials. I also made an anatomical mask for an ROI which contained 4126 voxels. With these I read in each beta, masked it with my mask using NiftiMasker(), and then stored each in an array X of shape (82, 4126), and used the behavioural data to create my y array, which contained 47 examples of my positive and 35 of my negative class. I think this is correct but would like to confirm this.

After that I used the f-test feature selection as you suggested plus gridsearch CV. I did also try PCA-based feature selection, as well as no feature selection at all. Despite my efforts, after this I get strange results! I wonder if it’s to do with the small sample? I have experimented quite a bit with this but it’s always the same - gridsearch cross validation suggests that there is reasonable performance - usually between 58% and 76% AUC, but even when I take the best estimator from that result and re-run it I get trash, as can be seen with the ROC curve below. I pasted my code in case there’s something obvious I’m missing.

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.feature_selection import SelectPercentile, f_classif
from sklearn.svm import SVC

# Some defs for model selection
parameters = [{'C': [10, 100, 1000], 'kernel': ['linear']},
              {'C': [10, 100, 1000], 'kernel': ['rbf'], 'gamma': [0.00001, 0.0001, 0.001, 0.01, 0.1]}]
metric = 'roc_auc'
test_size = 0.3

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=1)

# Feature scaling
sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train)
X_test_sc = sc.transform(X_test)

# Selection
feature_selection = SelectPercentile(f_classif, percentile=5)
X_train_sc = feature_selection.fit_transform(X_train_sc, y_train)
X_test_sc = feature_selection.transform(X_test_sc)

#     pca = PCA(n_components = 4)
#     X_train_sc = pca.fit_transform(X_train_sc)
#     X_test_sc = pca.transform(X_test_sc)
#     explained_variance = pca.explained_variance_ratio_


# Classifier
clf = SVC(kernel='rbf', random_state=1)
clf.fit(X_train_sc, y_train)

# Find best parameters
grid_search = GridSearchCV(estimator=clf,
                           param_grid=parameters,
                           scoring=metric,
                           cv=10,
                           n_jobs=-1)

grid_search = grid_search.fit(X_train_sc, y_train)
best_score = grid_search.best_score_
best_parameters = grid_search.best_params_
print(f"Best {metric}: {best_score:.2f}%")
print("Best Parameters:", best_parameters)
print()
print(clf)
print()

# Re-run classification
clf = grid_search.best_estimator_
clf.fit(X_train_sc, y_train)
y_pred = clf.predict(X_test_sc)

# Evaluate
from sklearn import metrics
fpr, tpr, _ = metrics.roc_curve(y_test, y_pred)
print(metrics.roc_auc_score(y_test, y_pred))

# Plot
fig, ax = plt.subplots(figsize=(6,6))
ax.plot([0,1], [0,1], c='k', ls='dashed')
ax.plot(fpr, tpr, marker='o')
ax.set_xlabel('FPR')
ax.set_ylabel('TPR')
plt.show()

Of course this particular ROC curve is complete rubbish, and inspecting y_pred shows that it just predicted all positive. This is pretty representative of the results I’m getting, although it is also showing a few inverted ROC curves too, despite triple checking the return variables from roc_curve() and making sure it’s in the correct order. It’s a bit weird how the grid search and also cross validation is telling me that I’m getting reasonable performance when I’m clearly not.

I am left wondering whether this is simply due to my sample size, in which case I have at least learned a bit about MVPA in Python!

tbazeille · May 26, 2020, 1:43pm

Thanks for the precisions!

As suggested in the Mumford 2012 paper I fitted a GLM with a regressor for each trial and modelled the movement parameters as regressors of no interest. This gave me 82 beta images/trials. I also made an anatomical mask for an ROI which contained 4126 voxels. With these I read in each beta, masked it with my mask using NiftiMasker() , and then stored each in an array X of shape (82, 4126) , and used the behavioural data to create my y array, which contained 47 examples of my positive and 35 of my negative class. I think this is correct but would like to confirm this.

Sounds good even if the number of images is a bit small! Did you standardize your data when masking these images ? Usually this helps.

After that I used the f-test feature selection as you suggested plus gridsearch CV. I did also try PCA-based feature selection, as well as no feature selection at all.

I saw in your code that you only kept 5% of the 4000 voxels in your ROI. I think if is the ROI is relevant you may be filtering too much information out here and keeping a higher % might increase results (20 ?)

Didn’t had time to try your code or look it up in details just a few general comments :

Your split your 82 into 2/3 train (57) and validation set.
Then you do a grid search (using a 10-fold stratified kfold apparently) So models are fit on 9/10 of the train set and best performing parameter is evaluated on 1/10 of the train set so 6 images. So I think what is happening here :

results should have a lot of variability accross folds (you can check by plotting the cv scores)
overfitting will completely drive your parameter selection, hence you’ll choose a model that doesn’t generalize and have a validation score way lower than your training score.

Model with more expressivity are more prone to overfitting and it’s even worse when non-linearity so with as few samples I’d just try simple linear options : SVM with linear kernel without tuning the C (it’s not so sensitive usually) or logistic, ridge…
And with you do a grid search choose options with bigger tests either with cv=4 for exemple or use StratifiedShuffleSplit !

Hope these will help

LCJ · May 30, 2020, 4:03pm

Hi @tbazeille, thanks for the pointers, this is very helpful and makes a lot of sense.

Actually no I wasn’t doing this. There are some issues with the masking actually but it was going to be too much for a single post and I had forgotten about it. Here is the warning for a single beta.nii:

C:\Users\L\AppData\Roaming\Python\Python37\site-packages\nilearn\image\resampling.py:584: RuntimeWarning: NaNs or infinite values are present in the data passed to resample. This is a bad thing as they make resampling ill-defined and much slower.
  fill_value=fill_value)
C:\Users\L\AppData\Roaming\Python\Python37\site-packages\nilearn\image\resampling.py:591: RuntimeWarning: invalid value encountered in greater
  vmin = min(data.min(), 0)
C:\Users\L\AppData\Roaming\Python\Python37\site-packages\nilearn\image\resampling.py:592: RuntimeWarning: invalid value encountered in less
  vmax = max(data.max(), 0)
C:\Users\L\AppData\Roaming\Python\Python37\site-packages\nilearn\signal.py:61: UserWarning: Standardization of 3D signal has been requested but would lead to zero values. Skipping.
  warnings.warn('Standardization of 3D signal has been requested but '

There is something in here about standardization, but even without standardization I was receiving these warnings. I actually don’t know what this could be - the data plots correctly with the glass brains. I tried loading in the data with nibabel and when I view this as a numpy array it is all nan, which is quite confusing given that in my previous code I am extracting X and the data are there… I had assumed that it was just nan values around the outside of the image which would of course not contain any betas since there is no signal for those voxels. The betas themselves seemed totally fine. Do you have any ideas what might be causing that?

tbazeille · June 2, 2020, 2:39pm

Can you open an issue on Nilearn github for this ? I’ve no specific idea like this.

If you do, please try to provide a minimal example(few lines of code that make it possible for us to reproduce that), and a few images of yours since here it seems ot be something that is data specific.

Thanks

LCJ · June 2, 2020, 7:09pm

Ok, I’ve made a post here. Wasn’t able to attach any of the betas because github doesn’t want those filetypes but I uploaded a notebook with an example of some outputs and the error.

LCJ · June 3, 2020, 9:03am

Normally, if you run nilearn.image.clean_img() on the list of all images,this is equivalent to running it on a 4D image. This should do the trick. Can you confirm ?

I ran this and it seemed to work, but then when I used the get_fdata() method I was left with a bunch of nans once again like before. I checked the shape and it definitely processed all 82 files. First part of output:

array([[[[nan, nan, nan, ..., nan, nan, nan],
         [nan, nan, nan, ..., nan, nan, nan],
         [nan, nan, nan, ..., nan, nan, nan],
         ...,
         [nan, nan, nan, ..., nan, nan, nan],
         [nan, nan, nan, ..., nan, nan, nan],
         [nan, nan, nan, ..., nan, nan, nan]],

        [[nan, nan, nan, ..., nan, nan, nan],
         [nan, nan, nan, ..., nan, nan, nan],
         [nan, nan, nan, ..., nan, nan, nan],
         ...,
         [nan, nan, nan, ..., nan, nan, nan],
         [nan, nan, nan, ..., nan, nan, nan],
         [nan, nan, nan, ..., nan, nan, nan]],

But it definitely ran on all 82 images:

[In]: p.get_fdata().shape
[Out]: (79, 95, 79, 82)

I’m pretty sure that the nans are simply where SPM indicates it is the space around the head/brain - when I call numpy.nanmean() for example:

[In]: np.nanmean(p.get_fdata())
[Out]: -3.3625808977264304e-12

There is definitely data there, just looks like it’s padded with nans.

tbazeille · June 3, 2020, 9:13am

Thanks a lot for the replication and sorry for the bad advice LCJ. @bthirion, here is the corresponding thread for a nilearn issue you asked to close. On top of my mind, I don’t have a clue about his behavior which seems weird.

LCJ · June 3, 2020, 9:25am

No problem! I’m happy to email a couple of the beta files if it helps although looking at this it seems that any SPM12 beta would replicate this.

bthirion · June 3, 2020, 10:00am

Sorry for the back-and-forth btw Neurostars and Nilearn Issue tracker.

Could you please post a couple of images together with a snippet of code that you are using ? This will be much mere efficient.
Best,

LCJ · June 3, 2020, 10:19am

I made this for the github thread, maybe it’s easier to see everything there (these are the images/code I’d post anyway).

bthirion · June 3, 2020, 12:13pm

Use

p = image.clean_img(imgs=all_betas, ensure_finite=True)

LCJ · June 3, 2020, 1:52pm

Thank you! I was sure ensure_finite was True by default so I overlooked it. This now works smoothly with no errors.