Decoding form beta maps (Decoder with LeaveOneGroupOut)

Hi everyone,

I am currently working on a decoding script and try to see if I can improve my decoding performance when training on betas from a first level GLM rather than raw data.

My approach is based on this tutorial section:
https://nilearn.github.io/dev/auto_examples/02_decoding/plot_haxby_glm_decoding.html#instantiate-and-run-firstlevelmodel

Here is the decoder definition:

from nilearn.decoding import Decoder

decoder = Decoder(
    estimator='logistic_l2',
    #mask=masks_scan1[mask], # this is a mask for various ROIs based on fmriprep
    #screening_percentile=5,
    cv=LeaveOneGroupOut(),
    scoring=scoring,
    standardize=standardize,
    n_jobs=n_jobs, # -1 means all CPU
)

decoder.fit(stat_maps_per_run_manually_masked, conditions_per_run, groups=runs_per_run)
classification_accuracy = np.mean(list(decoder.cv_scores_.values()))
chance_level = 1.0 / len(np.unique(conditions_per_run))

stat_maps_per_run_manually_masked:
A list with 102 beta maps. For every run, a map for every stimulus (6 runs with 17 stimuli)

conditions_per_run:
A list of strings corresponding to the label names of the beta maps (102 total)

runs_per_run:
A list with the run info. Basically 17 1s, 17 2s and so on for a total of 102 entries.

I receive surprisingly high decoding results, which makes me wonder if there is any information leakage in the cross validation or something similar?
Furthermore, my results are very different depending on setting standardization to True or False.

As a sanity check, I implemented the cross validation procedure manually. Here I get “normal” decoding results in good decoding from visual cortex (E.g. ~58% correct with a ~6% chance level).

Thank you for any input!

I don’t see anything spurious in what you do: the cross validation is indeed done in a leave-on-run out fashion.
I guess that standardize should be set to True to enhance accuracy.
When you compare with manually set cross-validation, you also use the decoder object with consistent options ?
Best,
Bertrand

Dear Bertrand,

thanks a lot for your reply.
Interestingly (and a little puzzling), depending on setting standardize to True or False I either get very high decoding results (if standardize = False) or decode at chance level (if standardize = True).
In case the standardize is set to False some parts remain reasonable like for example decoding in visual cortex works much better than in areas such as the Hippocampus. However, the overall results seem inflated.

Re manual cross validation:

No, I did not use the Decoder object, but tried to keep everything similar. Here my manual cross validation code:

############################
# manually set up train and test data cross val
############################
def extract_features_from_nifti_images(nifti_images):
    """
    Extract features from a list of Nifti1Image objects.
    Each image's data is flattened into a 1D array to serve as a feature vector.
    """
    features = []
    for img in nifti_images:
        data = img.get_fdata()
        flattened_data = data.reshape(-1)  
        flattened_data = flattened_data[~np.isnan(flattened_data)]
        features.append(flattened_data)
    return np.vstack(features)  

X = extract_features_from_nifti_images(stat_maps_per_run_manually_masked)
y = np.array(conditions_per_run)
groups = np.array(runs_per_run)

logo = LeaveOneGroupOut()
clf = LogisticRegression(solver='liblinear', penalty='l2')
scaler = StandardScaler()
scores = []

for train_index, test_index in logo.split(X, y, groups):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]

        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        clf.fit(X_train_scaled, y_train)

        score = clf.score(X_test_scaled, y_test)
        scores.append(score)

print("Cross-validation scores:", scores)
print(f"Mean score in mask {mask}:", np.mean(scores))

############################
# END manually set up train and test data cross val
############################

Re standardization:

In my manual implementation I transform the data based on the train set and apply the same to the test set. However, when trying manual standardization in combination with the Decoder object I was not sure how it standardizes the data? Specificallly if all data together, for each run, or for train and test separately? All in all I do not think this explains the differences explained above.

Thanks a lot for taking the time to help!

Best wishes
Fabian

The doc says: If standardize is True, the data are centered and normed: their mean is put to 0 and their variance is put to 1 in the time dimension.
I think that you reproduce well what is done in the decoder.
I find it a bit problematic that you get such high values with decoding. If you run your code on the nilearn data, do you reproduce the same behavior ?
@man-shu can you try and reproduce the decoding pipeline on Nilearn data ?
Best,
Bertrand

1 Like

Dear Bertrand,

I tried to reproduce the behavior on the Haxby dataset that is part of nilearn. However, I could not reproduce the same behavior (leaving me with the question where I go wrong in my own analyses).

My code:

from nilearn import datasets
from nilearn import plotting
from nilearn.image import mean_img
import numpy as np
import pandas as pd
from nilearn.image import index_img
from sklearn.model_selection import LeaveOneGroupOut
from nilearn.decoding import Decoder

haxby_dataset = datasets.fetch_haxby()
fmri_filename = haxby_dataset.func[0]
mask_filename = haxby_dataset.mask_vt[0]

###############################################################################

# Load behavioral information
behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=" ")
conditions = behavioral["labels"]
runs = behavioral["chunks"]
condition_mask = conditions.isin(['bottle', 'cat', 'chair', 'face', 'house', 'scissors', 'shoe'])

# Apply mask to chunks, runs and func data
fmri_niimgs = index_img(fmri_filename, condition_mask)
conditions = conditions[condition_mask]
runs = runs[condition_mask]

# print len and shape
print(len(conditions))
print(fmri_niimgs.shape[-1])
print(len(runs))

###############################################################################
cv = LeaveOneGroupOut()
for standardize_true_false in [True, False]:
    decoder = Decoder(
    estimator='logistic_l2',
    mask=mask_filename,
    cv=cv,
    scoring='accuracy',
    standardize=standardize_true_false,
    )
    decoder.fit(fmri_niimgs, conditions, groups=runs)
    print(f'Decoder results with standardize={standardize_true_false}')
    print("Mean decoding accuracy:", np.mean(list(decoder.cv_scores_.values())))
    print("Chance level:", 1.0 / len(np.unique(conditions)))```

Output of the code:

UserWarning: The provided image has no sform in its header. Please check the provided file. Results may not be as expected.
warnings.warn(
Decoder results with standardize=True
Mean decoding accuracy: 0.6645880574452003
Chance level: 0.14285714285714285

UserWarning: The provided image has no sform in its header. Please check the provided file. Results may not be as expected.
warnings.warn(
Decoder results with standardize=False
Mean decoding accuracy: 0.878684807256236
Chance level: 0.14285714285714285

Well, data are not born equal, and something that works on a dataset my fail on another one.
On our side, we need to check that using the decoder method is indeed equivalent to using sklearn’s methods.
Best,
Bertrand

Dear Bertrand,

I applied the manual cross validation to the Haxby dataset and receive different results to the Decoder object. Standardization had only little impact on the decoding.
With standardization score = 0.2248677248
Without score = 0.210317460

It would be great if you could post here if you have an intuition where the differences emerge.
Thanks a lot for taking the time trying to help!

Code

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from tqdm import tqdm

# Bring data into same format --> list of nii images
fmri_data = fmri_niimgs.get_fdata()
stat_maps_per_run_manually_masked = [fmri_data[..., i] for i in range(fmri_data.shape[-1])]

def extract_features_from_nifti_images(nifti_images):
    """
    Extract features from a list of Nifti1Image objects.
    Each image's data is flattened into a 1D array to serve as a feature vector.
    """
    features = []
    for img in nifti_images:
        data = img # .get_fdata()
        flattened_data = data.reshape(-1)  
        flattened_data = flattened_data[~np.isnan(flattened_data)]
        features.append(flattened_data)
    return np.vstack(features)  

for standardize_true_false in [True, False]:

    X = extract_features_from_nifti_images(stat_maps_per_run_manually_masked)
    y = np.array(conditions)
    groups = np.array(runs)

    logo = LeaveOneGroupOut()
    clf = LogisticRegression(solver='liblinear', penalty='l2')
    scaler = StandardScaler()
    scores = []

    for train_index, test_index in tqdm(logo.split(X, y, groups)):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]

        if standardize_true_false:
            X_train_scaled = scaler.fit_transform(X_train)
            X_test_scaled = scaler.transform(X_test)
            clf.fit(X_train_scaled, y_train)
            score = clf.score(X_test_scaled, y_test)
        else:
            clf.fit(X_train, y_train)
            score = clf.score(X_test, y_test)

        scores.append(score)

    print("Cross-validation scores:", scores)
    print(f"Mean score in mask {mask_filename} with standardize={standardize_true_false}:", np.mean(scores))

Thx ! We definitely need to look into that.
@man-shu LMK if you can.
Bertrand

Sorry, I missed the first mention. On it now!

Hi @FabiR

I found that you are not applying the haxby_dataset.mask_vt[0] mask to your fmri data in your sklearn approach. This masking is automatically done in nilearn approach when you provide mask_img in the Decoder class.

The lack of masking in sklearn approach is also apparent from its long run time (more voxels → more time to fit the decoder). It is still running for me: been >40 mins, nilearn one runs in a minute or 2, even when I parallelize it across the CV splits.

So, the large accuracy difference between nilearn vs. sklearn (0.6 vs 0.2) can simply be because of that.

When I applied the mask, I got similar accuracies with standardization for both approaches:
Nilearn, with standardization: 0.6636432350718064
Sklearn, with standardization: 0.6640211640211641
But without standardization – it is quite weird
Nilearn, without std.: 0.8784958427815571
Sklearn, without std.: 0.6031746031746033

So, the weird thing is why Nilearn decoder gives such high accuracy without the standardization. I do not have an answer yet. Still looking into it…

Here’s my code for your ref:

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from tqdm import tqdm

from nilearn import datasets
from nilearn import plotting
from nilearn.image import mean_img
import numpy as np
import pandas as pd
from nilearn.image import index_img
from sklearn.model_selection import LeaveOneGroupOut
from nilearn.decoding import Decoder
from sklearn.model_selection import cross_validate
from sklearn.pipeline import Pipeline
from nilearn.maskers import NiftiMasker

haxby_dataset = datasets.fetch_haxby()
fmri_filename = haxby_dataset.func[0]
mask_filename = haxby_dataset.mask_vt[0]

###############################################################################

# Load behavioral information
behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=" ")
conditions = behavioral["labels"]
runs = behavioral["chunks"]
condition_mask = conditions.isin(
    ["bottle", "cat", "chair", "face", "house", "scissors", "shoe"]
)

# Apply mask to chunks, runs and func data
fmri_niimgs = index_img(fmri_filename, condition_mask)
conditions = conditions[condition_mask]
runs = runs[condition_mask]

# print len and shape
print(len(conditions))
print(fmri_niimgs.shape[-1])
print(len(runs))

####### NILEARN DECODING ####################################
Xs_nilearn = []
cv = LeaveOneGroupOut()
for standardize_true_false in [True, False]:
    if standardize_true_false:
        standardize_true_false = "zscore_sample"
    # explicitly avoid screening even though providing a small mask has the same effect
    screening_percentile = 100
    decoder_nilearn = Decoder(
        estimator="logistic_l2",
        mask=mask_filename,
        cv=cv,
        scoring="accuracy",
        standardize=standardize_true_false,
        n_jobs=10,  # parallelizing
        screening_percentile=screening_percentile,
    )
    decoder_nilearn.fit(fmri_niimgs, conditions, groups=runs)
    print(f"Decoder results with standardize={standardize_true_false}")
    print(
        "Mean decoding accuracy:",
        np.mean(list(decoder_nilearn.cv_scores_.values())),
    )
    print("Chance level:", 1.0 / len(np.unique(conditions)))

    Xs_nilearn.append(
        (
            decoder_nilearn.masker_.fit_transform(fmri_niimgs),
            standardize_true_false,
        )
    )

####### NILEARN MASKING WITHOUT STANDARDIZATION AND SKLEARN DECODING, STANDARD SCALING ####################################

Xs_sklearn = []
for standardize_true_false in [True, False]:
    # mask data before decoding
    masker = NiftiMasker(mask_img=mask_filename, standardize=False)
    X = masker.fit_transform(fmri_niimgs)
    y = np.array(conditions)
    groups = np.array(runs)

    logo = LeaveOneGroupOut()
    clf = LogisticRegression(solver="liblinear", penalty="l2")
    scaler = StandardScaler()

    if standardize_true_false:
        pipeline = Pipeline([("scaler", scaler), ("clf", clf)])
    else:
        pipeline = Pipeline([("clf", clf)])

    decoder_sklearn = cross_validate(
        pipeline,
        X=X,
        y=y,
        groups=groups,
        cv=logo,
        n_jobs=8,
        scoring="accuracy",
    )

    print(
        f"Mean score in mask {mask_filename} with standardize={standardize_true_false}:",
        np.mean(decoder_sklearn["test_score"]),
    )

    if standardize_true_false:
        X_to_append = scaler.fit_transform(X)
    else:
        X_to_append = X.copy()
    Xs_sklearn.append((X_to_append, standardize_true_false))


# check if the two standardization (or lack of them) gives similar results
for (X_nilearn, _), (X_sklearn, _) in zip(Xs_nilearn, Xs_sklearn):
    print(np.allclose(X_nilearn, X_sklearn))
# returns False
#         True

1 Like

Hi,

sorry about missing the mask (and caused confusion as a result)!
Thanks a lot for taking the time looking into it!

Best wishes,
Fabian

Hi @FabiR ,

I am still not done with my investigation, but I opened an issue on nilearn’s github about this with a few more observations. We can post updates about our findings there from now on. Here’s the issue: [BUG] Discrepancy between decoding accuracies from nilearn's `Decoder` and pure sklearn implementation · Issue #4344 · nilearn/nilearn · GitHub

Himanshu

1 Like