I was wondering if anyone could provide me some insight into the difference between fit and fit_transform for connectivity measure.
When trying to fit an SVM as in the example, the connectivity matricies must be produced at the time of running the SVM. If I run through all of my subjects files and store the connectivity matricies using fit_transform & python pickle, the results are drastically different to when I fit_transform the training set and fit the test set. https://nilearn.github.io/auto_examples/03_connectivity/plot_group_level_connectivity.html#sphx-glr-auto-examples-03-connectivity-plot-group-level-connectivity-py
Does anyone know of a way of storing these so that they do not need to be recalculated?
Some connectivity measure depend on the whole sample, hence fitting on subjects separately and on the whole group won’t give the same result (in particular for the tangent method).
You probably want to throw all the data to the fit_transform method.
An alternative is to fit only on a portion of the data (using .fit()), and then transform all other subjects (using .transform()) based on the fitted model.
Does that make sense ?
Thank you for your reply.
Based off your answer, I used fit_transform on all of the data for the Correlation and Covariance measures. Since tangent was a group measure, I had to perform fit_transform on the train set, and just fit on the test set.
For any future readers trying to store their matricies for Machine Learning uses, I would recommend pickleing the output of your Masker, and then for each run of your classifier you can recalculate the matricies. The actual calculation of the matricies is fast in comparison to the masking process.
For result caching, use joblib (the masker as a mem argument).