Replicating Nilearn Decoder Accuracy in Sklearn

NordicNarwhal · September 21, 2024, 2:41am

Hi All I’ve been stuck on this problem for a while. I can’t match the accuracy of the Nilearn Decoder in Sklearn no matter how hard I try. Is there something obvious I’m doing wrong? Nilean gives much higher accuracy scores.
Code below:


import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.model_selection import LeaveOneOut
from sklearn.metrics import accuracy_score

from nilearn.maskers import NiftiMasker

masker = NiftiMasker(
    mask_img=ffa_img,
    runs=final_df['run_value'],
    standardize="zscore_sample",
    memory="nilearn_cache",
    memory_level=1,
)
fmri_masked = masker.fit_transform(final_df['0_x'])

from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC

svc = LinearSVC(penalty="l1", C=c, dual=False)  # SVC with L1 regularization

fmri_masked.shape

# Step 1: Standardization (z-score standardization across samples)
# Masked fMRI data is already provided in `fmri_masked`, assumed to be of shape (n_samples, n_features)
X = fmri_masked  # The masked fMRI data (n_samples, n_features)
y = final_df['0_y'].values  # Classification labels (n_samples,)

# Step 2: Define the classifier (L1-penalized SVM)
svc = LinearSVC(penalty="l1", C=0.1, dual=False)

# Step 3: Define the cross-validation strategy (Leave-One-Out Cross-Validation)
cv = LeaveOneOut()  # This aligns with the original step

# Step 4: Manual cross-validation loop
# Initialize an array to hold results
accuracy_scores = []

# Step 5: Leave-One-Out Cross-Validation
for train_idx, test_idx in cv.split(X):
    # Standardize within the fold (on training data only)
    scaler = StandardScaler()
    
    # Extract training and test sets using the indices from the cross-validation split
    X_train = scaler.fit_transform(X[train_idx])  # Fit the scaler on training data
    X_test = scaler.transform(X[test_idx])        # Transform the test data using the same scaler

    y_train = y[train_idx]  # Extract the corresponding labels for training
    y_test = y[test_idx]    # Extract the corresponding labels for test
    
    # Step 6: Train the model on training data
    svc.fit(X_train, y_train)

    # Step 7: Predict the test set and evaluate accuracy
    y_pred = svc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    # Append the accuracy score to the list
    accuracy_scores.append(accuracy)

# Step 8: Calculate the mean accuracy across all folds
mean_accuracy = np.mean(accuracy_scores)

# Output the results
print(f"Mean accuracy using Leave-One-Out CV: {mean_accuracy:.3f}")

bthirion · September 21, 2024, 5:43pm

This is tricky indeed. Have you seen [DOC] Explaining how Decoder works by man-shu · Pull Request #4437 · nilearn/nilearn · GitHub ?
The added script does exactly what you wish.
Best,
Bertrand

NordicNarwhal · September 24, 2024, 4:54am

I still get wildly different results in Nilearn vs Sklearn.

Nilearn:
Decoder results with standardize=zscore_sample
Mean decoding accuracy: 0.705
Chance level: 0.5

Decoder results with standardize=False
Mean decoding accuracy: 0.675
Chance level: 0.5

Sklearn:
Mean score in mask with standardize=zscore_sample: 0.49
Mean score in mask: 0.515

NordicNarwhal · September 24, 2024, 5:11am

This wouldn’t be such a big deal but Nilearn’s Decoder also gives very high permutation test scores (aka when I permute the labels I get pretty much the same decoding accuracy. Seems wrong, but it’s impossible to debug because I can’t recreate the pipeline outside of Nilearn)

bthirion · September 24, 2024, 5:55am

Thx for the feedback.
Can you share the script you used to obtain this result ? This will help us better figure out what’s wrong.
Best,
Bertrand

NordicNarwhal · September 24, 2024, 11:45am


from sklearn.linear_model import LogisticRegressionCV
from nilearn import datasets
import numpy as np
import pandas as pd
from nilearn.image import index_img
from sklearn.model_selection import LeaveOneGroupOut
from nilearn.decoding import Decoder
from nilearn.maskers import NiftiMasker
from sklearn.preprocessing import LabelBinarizer
import itertools
from sklearn.metrics import roc_auc_score, accuracy_score
from tqdm import tqdm

# haxby_dataset = datasets.fetch_haxby()
# fmri_filename = haxby_dataset.func[0]
# mask_filename = haxby_dataset.mask_vt[0]

###############################################################################

# Load behavioral information
# behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=" ")
conditions = final_df["0_y"]
runs = final_df["run_value"]
# condition_mask = conditions.isin(
#     ["bottle", "cat", "chair", "face", "house", "scissors", "shoe"]
# )

# Apply mask to chunks, runs and func data
fmri_niimgs = final_df['0_x']


# print len and shape
print(len(conditions))
print(fmri_niimgs.shape[-1])
print(len(runs))
svc = LinearSVC(penalty="l1", C=.1, dual=False)  # dual=False when using 'l1' penalty with LinearSVC

####### NILEARN DECODING ####################################
Xs_nilearn = []
decoders_nilearn = []
cv = LeaveOneOut()
for standardize_true_false in ["zscore_sample", False]:
    # explicitly avoid screening even though providing a small mask has the same effect
    screening_percentile = 100
    decoder_nilearn = Decoder(
        estimator=svc,
        mask=ffa_mask,
        cv=cv,
        scoring="accuracy",
        standardize=standardize_true_false,
        n_jobs=None,  # parallelizing
        screening_percentile=screening_percentile,
        verbose=11,
    )
    decoder_nilearn.fit(fmri_niimgs, conditions, groups=runs)
    print("\n\n**************************")
    print(f"Nilearn decoding standardize={standardize_true_false}")
    print(
        "Mean decoding AU-ROC:",
        np.mean(list(decoder_nilearn.cv_scores_.values())),
    )
    print("**************************\n\n")

    Xs_nilearn.append(
        (
            decoder_nilearn.masker_.fit_transform(fmri_niimgs),
            standardize_true_false,
        )
    )
    decoders_nilearn.append(decoder_nilearn)

####### NILEARN MASKING WITH/WITHOUT STANDARDIZATION AND SKLEARN DECODING ####################################



# returns True for all splits


from sklearn.model_selection import cross_validate
Xs_sklearn = []
decoders_sklearn = []
for standardize_true_false in ["zscore_sample", False]:
    # mask data before decoding
    # using the standardize parameter (on/off) to match nilearn's behavior
    masker = NiftiMasker(
        mask_img=ffa_mask, standardize=standardize_true_false
    )
    X = masker.fit_transform(fmri_niimgs)
    y = np.array(conditions)
    groups = np.array(runs)

    logo = LeaveOneOut()

    # using one-vs-rest to closely match nilearn's behavior


    # Not using StandardScaler because it standaridizes across features
    # nilearn's zscore_sample standardizes across samples
    # scaler = StandardScaler()
    # if standardize_true_false:
    #     pipeline = Pipeline([("scaler", scaler), ("clf", clf)])
    # else:
    #     pipeline = Pipeline([("clf", clf)])

    decoder_sklearn = cross_validate(
        svc,
        X=X,
        y=y,
        groups=groups,
        cv=logo,
        n_jobs=12,
        scoring="accuracy",
        return_indices=True,
        return_estimator=True,
    )

    print(
        f"Mean score in mask {ffa_mask} with standardize={standardize_true_false}:",
        np.mean(decoder_sklearn["test_score"]),
    )

    X_to_append = X.copy()
    Xs_sklearn.append((X_to_append, standardize_true_false))
    decoders_sklearn.append(decoder_sklearn)


# check if the two standardization (or lack of them) gives similar results
for (X_nilearn, _), (X_sklearn, _) in zip(Xs_nilearn, Xs_sklearn):
    print(np.allclose(X_nilearn, X_sklearn))
    print(np.array_equal(X_nilearn, X_sklearn))
# returns True
#         True


# check cv fold indices
for decoder_nilearn, decoder_sklearn in zip(
    decoders_nilearn, decoders_sklearn
):
    for i in range(11):
        print("Split", i + 1)
        print(
            "Test indices equal?",
            np.array_equal(
                decoder_nilearn.cv_[i][1],
                decoder_sklearn["indices"]["test"][i],
            ),
        )
        print(
            "Train indices equal?",
            np.array_equal(
                decoder_nilearn.cv_[i][0],
                decoder_sklearn["indices"]["train"][i],
            ),
        )
# returns True for all splits

NordicNarwhal · September 27, 2024, 3:54am

Any help on this? It is a bit frustrating that the decoder can’t be replicated using sklearn.

bthirion · September 27, 2024, 8:08am

I can’t reproduce you script unless you share final_df.
Best,
Bertrand

NordicNarwhal · September 27, 2024, 8:55pm

Is there a place I could send that?

Also if it possible the issue has something to do with the high number of cross vals I am making?

bthirion · September 29, 2024, 9:15pm

You can send to me privately: bertrand.thirion@inria.fr
Best,
Bertrand

NordicNarwhal · October 1, 2024, 1:42am

Just sent to you! Thank you so much

bthirion · October 24, 2024, 7:54pm

See [BUG] `plot_haxby_understand_decoder.py` does not show that nilearn and sklearn coincincde when using penalty='l1' · Issue #4695 · nilearn/nilearn · GitHub
Best,
Bertrand

Remi-Gau · February 3, 2025, 9:58am

Note that this should have been fixed by this PR

github.com/nilearn/nilearn

[FIX] Do not set `score=0` if all `decoder.coef_ ==0`

nilearn:main ← man-shu:fix/decoder_baseline_score

opened 12:45PM - 27 Jan 25 UTC

man-shu

+186 -38

- Closes #4695  Changes proposed in this pull request: - [x] Do not set score=0 if all d ecoder.coef_ ==0 - [ ] throw deprecation message?? - [x] write test that does the comparison between nilearn and sklearn? - [x] another test to compare the regressors? - [x] don't expose adjust_screening_percentile in doc example