I’m running into difficulties trying to set up cross-validation for a surface-based search_light svm in nilearn. In each fold of a leave-one-run-out cv, I’d like to train on data from one set of data (A1 vs B1) then from the hold out run, test on a different set of data (A2 vs B2). Can anyone provide any advice or an example from nilearn/sklearn that works in search_light?
I’ve tried making my own cv (see below), which works when I run it once, but if I try to call it a second time (eg in a loop or when the search_light moves) I get a ‘list index out of range’ error:
import numpy as np
from sklearn import svm
from sklearn.model_selection import cross_val_score
group = np.array([1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9,10,10,10,10])
tmpX = np.random.sample((40,100))
y = [1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2]
class CustomCrossValidation:
@classmethod
def split(cls,
X: np.ndarray = None,
y: np.ndarray = None,
groups: np.ndarray = None):
assert len(X) == len(groups), (
"Length of the predictors is not"
"matching with the groups.")
for group_idx in range(groups.min(), groups.max()+1):
if group_idx <= groups.max()/2:
training_indices = np.where(
groups[groups<=groups.max()/2] != group_idx)[0]
test_indices = np.where(groups == group_idx)[0] + np.floor_divide(X.shape[0],2)
else:
training_indices = np.where(
groups[groups>groups.max()/2] != group_idx)[0] - np.floor_divide(X.shape[0],2)
test_indices = np.where(groups == group_idx)[0]
if len(test_indices) > 0:
yield training_indices, test_indices
## this cv gives the correct train/test splits
for train_index, test_index in CustomCrossValidation.split(tmpX, y, group):
print("TRAIN:", train_index, "TEST:", test_index)
## example using cross_val_scores fails, as does search_light with same error
cv = CustomCrossValidation.split(tmpX, y, groups)
scores = np.zeros(2)
for i in range(2):
clfa = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clfa, a, y, cv=cv)
EDIT/UPDATE
(nb fixed a couple of typos) If I do the following in a loop I don’t get the ‘list index out of range’ error:
cv = CustomCrossValidation()
for i in range(2):
clfa = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clfa, a, y, cv=cv.split(tmpX, y, group), groups=group)
acc_parcel[i] = scores.mean()
but it is not working with searchlight yet:
scores = search_light(X, y, estimator, adjacency, cv=cv.split(tmpX, y, group), groups=group, n_jobs=1)