Canonical Correlation Analysis and New Data

Paul_Dhami · April 4, 2022, 9:27pm

Greetings Neurostars!

For some context, I am trying to follow the methods outline in this paper (https://www.nature.com/articles/nm.4246?TB_iframe=true&width=921.6&height=921.6)

Regarding my data, I have 2 datasets, one as a train dataset and the other as test dataset.

With my training dataset, I have 100 samples (people). Each person has clinical data (15 values) and biological data 4950 connectome values). I performed CCA with these 2 matrices, and then used the (15) canonical variates with ridge regression in a nest cross-validation framework. I get a performance I am happy with and would like to now test this on an external test dataset.

My question is now, how can I get my test dataset to be in the same canonical variate space as the training dataset? Can the canonical coefficients be used in some way? According to the paper, they did the following: “The 133 patients (n = 333 – 220 = 133) left out of the cluster-discovery set were assigned to one of the four clusters in a two-step process. First, the canonical coefficients estimated in the cluster-discovery set were used to calculate canonical variate (component) scores for the left-out subjects.” So it seems doable, but I am not sure how to. Any insight would be greatly appreciated!

Thank you.