Regularized Ensemble of models

ouerfelliclub · February 28, 2020, 2:20pm

Hello,
where i can find the code of this article :
https://hal.archives-ouvertes.fr/hal-01615015/document @GaelVaroquaux @bthirion
because i don’t understand the algorithm .
i have a matrix (nb_subject X nb_parcel) and i have a vector Y .
i want to apply the method used in the article to fit my model with the methode of ensembling.
thank you very much

bthirion · February 28, 2020, 9:59pm

Hi, this is now partly implemented in Nilearn Decoder object (only partly, because it does not have the clustering part if i’m not mistaken).
This is now in Nilearn master, but not yet released.

github.com

nilearn/nilearn/blob/master/examples/02_decoding/plot_haxby_full_analysis.py

"""
ROI-based decoding analysis in Haxby et al. dataset
=====================================================

In this script we reproduce the data analysis conducted by
Haxby et al. in "Distributed and Overlapping Representations of Faces and
Objects in Ventral Temporal Cortex".

Specifically, we look at decoding accuracy for different objects in
three different masks: the full ventral stream (mask_vt), the house
selective areas (mask_house) and the face selective areas (mask_face),
that have been defined via a standard GLM-based analysis.

"""

##########################################################################
# Load and prepare the data
# -----------------------------------

# Fetch data using nilearn dataset fetcher

This file has been truncated. show original

Best,

Bertrand

ouerfelliclub · February 29, 2020, 10:02am

thank you very much but how can i build feature-grouping matrix: Φ(j) using featureAgglomeration .
FeatureAgglomeration return a matrix with nb_subjects * n_parcels but the feature-grouping matrix must return nb_voxels * nb_parcels …

how can i do this please
thank you

ouerfelliclub · February 29, 2020, 11:27am

I try to to implement the function present in the article , please can you tell me if it is correct or no :

X_red : is the output of FeatureAgglomeration, X_red of shape (nb_subjects *
nb_parcels)
X : is the data of shape (nb_subjects * nb_voxels)

def _ensembling(X,X_red, y ):
          liste=[]
          Φ = X.T  @  X_red     # to get feature grouping matrix
          Φ = preprocessing.normalize( Φ, norm='l2')

          ridge = BayesianRidge()
          feature_selectionn = SelectPercentile(f_regression)
          anova_LR = Pipeline([('anova', feature_selectionn), ('ridge', ridge)])
          anova_LR.fit(X_red, y)

         grid = GridSearchCV(anova_LR, param_grid={'anova__percentile': [5, 10, 20]}, 
         verbose=1,cv=3,n_jobs=1)
         grid.fit(X_red, y)  # set the best parameters
    
         coef_ = grid.best_estimator_.steps[-1][1].coef_
         w_best_ = grid.best_estimator_.steps[0][1].inverse_transform(coef_.reshape(1, -1))
         w_aprox = w_best_ @  Φ.T
         liste.append(w_aprox)         
         return    liste

at the end, liste is of shape (b, nb_voxels)

and now we can call this function, b times and compute the average on b estimators

So, we will have the output is array of shape (1,nb_voxels)
Is it correct ?
thank you very much @bthirion

bthirion · March 1, 2020, 10:39pm

To generate Φ I’d rather rely on the labeling of the voxels, expand them in a one-hot encoding matrix, en then normalize the columns. your formula will be inaccurate f the number of samples is small.

Also, I’d use RidgeCV rather than BayesianRidge, as it is more reliable numerically.
Otherwise I think that you got the point.

We should expose such a method in Nilearn, but this is not going to happen in a close future.
Best,

Bertrand

ouerfelliclub · March 2, 2020, 2:50pm

Thank you very very much
i try to do this :
label is a list of n_parcellations array : label[0] contains the array of labels of the first parcellation , label[1] contains the labels of the second parcellation … {each array of shape (1,170006)}

def  Φ():
     Phi = []
     for i in range (n_parcellations):
          a = label[i]             # "a" of shape  (1,170006)
          b = np.zeros((a.size, a.max()+1))
          b[np.arange(a.size),a] = 1  
          Phi.append(b)        # to save all Φ of each parcellation
     return Phi

But i have directly a memory error when i excute this function (in the first iteration)
where is the problem ?
thank you @bthirion

bthirion · March 4, 2020, 8:47am

You should use a sparse matrix. Here is some code borrowed from https://github.com/nilearn/nilearn/blob/master/nilearn/regions/rena_clustering.py

n_voxels = len(labels)

incidence = coo_matrix(
        (np.ones(n_voxels), (labels, np.arange(n_voxels))),
        shape=(n_components, n_voxels), dtype=np.float32).tocsc()

inv_sum_col = dia_matrix(
        (np.array(1. / incidence.sum(axis=1)).squeeze(), 0),
        shape=(n_components, n_components))

incidence = inv_sum_col * incidence

the incidence matrix is Phi.

ouerfelliclub · March 7, 2020, 7:16pm

thank you very much ,
question please: RidgeCV(alphas=(…)) can replace line 7,8, and 9 to find the best model?
thank you