Passing 4D Niimg-like object to scikit-learn pipeline leads to ValueError: Found input variables with inconsistent numbers of samples

JohannesWiesner · May 14, 2019, 12:03pm

I have a list of filepaths to 3D Nifti Files. I use nilearn’s resample_img to resample the images to a target affine. resample_img returns the files as a 4D Image. I then want to pass this object and the labels to my scikit-learn pipeline, however I get the error:

ValueError: Found input variables with inconsistent numbers of samples

I know why this error happens: scikit-learn (or more precise check_consistent_length()) compares the length of the first dimension of X (so the length of the x-axis, which is of course the wrong axis) and y (number of samples) and will throw the error since they are of unequal length. Is there any way to let the pipeline accept my 4D object? I guess if I could hack this ‘safety function’ everything would still work. My first transformer contains NiftiMasker which will automatically transform the 4D img to the right format (n_samples * n_features), so after the 4D file is transformed, X and y do have the right shape.

gilbertcane · June 28, 2021, 4:53pm

Sounds like the shapes of your labels and predictions are not in alignment. I faced a similar problem while fitting a linear regression model . The problem in my case was, Number of rows in X was not equal to number of rows in y. In most case, x as your feature parameter and y as your predictor. But your feature parameter should not be 1D. So check the shape of x and if it is 1D, then convert it from 1D to 2D.

x.reshape(-1,1)

Also, you likely get problems because you remove rows containing nulls in X_train and y_train independent of each other. y_train probably has few, or no nulls and X_train probably has some. So when you remove a row in X_train and the same row is not removed in y_train it will cause your data to be unsynced and have different lenghts. Instead you should remove nulls before you separate X and y.