I want to apply feature selection by using a binary mask. The binary mask is created by using a template file derived from a prior meta-analysis. I use a threshold so that ones and zeros in the mask file correspond with z-values in the template object above or below this threshold.
In the next step I want to use this mask to ‘cut out’ features in the data set and pass this ‘feature selected subset’ over to other following pipeline steps. Finally I want to run an estimator such as SVM to fit a model to the data and predict the outcome.
Both the mask building procedure and other following pipeline steps use parameters which can be treated as hyperparameters and thus could be optimized using nested cross-validation. For example one could vary the threshold value I just mentioned above.
The problem: scikit-learn’s build-in functions such as
GridSearchCV only accept two arguments (a feature set X and a label list y) but I want to a model building procedure which accepts three arguments namely X, y and the template file.
How can I (or better is it possible to) implement both the optimization of the mask building procedure and the following pipeline steps in one pipeline? In other words: Does scikit-learn contains build-in options for building a pipeline which takes more than just the feature set X and the label list y?