Running PCA on structural lesion data

yizhwan · November 29, 2021, 9:19am

Hi everyone,

I have 283 subjects with tumours manually delineated on their T1 scans. I have binary masks of the these tumours and their scans in the same space (SRI24).

I would like to run a PCA on the lesions to determine if there is any underlying components of variance in the spatial locations of the lesions.

I am using Python. So far I have loaded each patient’s lesion and flattened each scan into a 1-dimensional vector (scan_data_1D). (i.e converting 240 x 240 x 115 scan into 1 x 8928001).

I then constructed a pandas dataframe of all the subjects data, resulting in a 283 rows x 8928001 columns dataframe.

I then try and fit the PCA using sckit-learn:

x = scan_data_df_reshaped.iloc[:, 0:len(scan_data_1D)].values
TC_pca = PCA(n_components = 0.8)
TC_pca.fit(x)

Unfortunately this is not working resulting in the kernal dying with no warning messages every time I run it.

Could this be due to a memory problem? If so, or otherwise, is there any other more effective approaches to go about doing this?

Many thanks,

Yizhou