MVPA-ROI + Permutation Testing

Hi All I have a perfectly balanced data set 50% condition 1 and 50% condition 2.

I am using an L1 norm with a LeaveOneOut Cross Val as my model. I am using FFA mask which has 223 voxels and have 4000 observations.

My accuracy is ~67.3%, which is great; but when I shuffle the Y values for permutation testing I get Permutation Test accuracies that are very close to 67%.

I’m a bit confused about how this is happening given that 223 voxels for 4000 datasets seems unlikely to overfit to this degree.

Model code below, is there something I’m doing wrong?


n_samples = len(df)

c = .01
# Create synthetic group labels (each sample is its own group)
groups = df['run_value']
cv = LeaveOneOut()
svc = LinearSVC(penalty="l1", C=c, dual=False)  # dual=False when using 'l1' penalty with LinearSVC

decoder = Decoder(
    estimator=svc,
    mask=ffa_mask,
    standardize="zscore_sample",
    cv=cv,
    scoring="accuracy",
)

# Fit the decoder
decoder.fit(df[0].values, df['0_y'].values, groups=groups)

# Output the results
print(f"C={c}")
# Accessing the cross-validation scores for each class
cv_scores_class_0 = decoder.cv_scores_[0]
cv_scores_class_1 = decoder.cv_scores_[1]

# Calculating the mean cross-validation accuracy for both classes
mean_score_class_0 = np.mean(cv_scores_class_0)
mean_score_class_1 = np.mean(cv_scores_class_1)
mean_score = np.mean([mean_score_class_0, mean_score_class_1])
print(f"Mean CV score for class 0: {mean_score_class_0}")
print(f"Mean CV score for class 1: {mean_score_class_1}")
print(f"Mean CV score: {mean_score}")

Hi All I have a perfectly balanced data set 50% condition 1 and 50% condition 2.

I am using an L1 norm with a LeaveOneOut Cross Val as my model. I am using FFA mask which has 223 voxels and have 4000 observations.

My accuracy is ~67.3%, which is great; but when I shuffle the Y values for permutation testing I get Permutation Test accuracies that are very close to 67%.

I’m a bit confused about how this is happening given that 223 voxels for 4000 datasets seems unlikely to overfit to this degree.

I simulated random data with the same 223 x 4000 structure and got an accuracy of ~50% as expected and yet when I permute my actual data I get accuracies that are around 67% or occasionally even higher. How can that be?

My understanding is that multi-collinearity would not affect model performance so I’m confused why there would be such large differences between my simulated dataset accuracy and my permuted dataset accuracy.