I’ll make a start today by pointing out a few things as suggested in previous discussions:
if there is a typo, raise an issue on GitHub directly;
while our discussion can be very technical (which is brilliant!), remember that we as TAs are trying to clear things up so that we can serve students with more confidence.
For people who wish to understand where we can apply tsne and how to interpret the results, I found this article really helpful https://distill.pub/2016/misread-tsne/
Have to say the materials for today’s tutorials are really outstanding, kudos to the creators.
The only very small imperfection I’ve found is in the first video of T4, when talking about the example of what 1 would look like Alex says 0, but it’s clear from context that she’s talking about 1 so it’s fine.
I have a more technical question about PCA. In the tutorials it is shown that we can obtain PCA by computing the eigenvalues & eigenvectors. When you compute the eigenvalues of a matrix, you can end up with an eigenvalue that has an associated eigenspace of more than 1 dimension (ie a degenerate eigenvalue (not sure about the translation from French ).
My question is : Is it possible to obtain twice (or more) the same eigenvalues while computing PCA? To me it sounds very unlikely because this is experimental data but I am still asking and curious.
Yes, you can get two eigenvalues of the same value. That means that those two dimensions capture equal amounts of variance in the data. The degeneracy here would simply be in the ordering of components - the corresponding eigenvectors would still be distinct orthogonal axes.
Here’s one way to think about it: In the eigenvector basis, the covariance matrix is now diagonal. This diagonal matrix has variances for each component along the diagonal and zero off-diagonal covariance. Hence, the component scores are uncorrelated. Now ordering the components by the diagonal value is ordering by variance captured.
We were wondering about the fact that projX_noisy is asked to be computed using evectors and not evectors_noisy. However, we are not supposed to know the actual eigenvectors without noise in a “realistic” situation, because we wouldn’t have access to the original data. When we compute with the noisy eigenvectors it works almost as well, and actually makes more sense… am I wrong?
In the last interactive demo of W1D5 tutorial 2, Is it correct to say that we can have unequal eigen values in zero correlation coefficient condition as well (when the data distribution is elongated along x axis)
In other words, there is no relation between difference in eigen values and correlation coefficient right?
What is “realistic” depends on what an imagined use case is. Consider you have a large set of samples (clean or noisy), and you want to construct your denoising filters from that, which then you can take out into the wild to denoise a new noisy sample, even one by one. As an example with MNIST, you have many clean written numbers in your training dataset, but now you want a denoising filter you can apply to a single new image of a smudged address number on new piece of mail. So this would be a case you have large separate training set which you use to define your PCA weights as future denoising filters, and then later you have a smaller (even single data point) testing sample you want to denoise using the pre-defined filters.
If I’m understanding your question, then yes – you can have zero correlation coefficient but unequal eigenvalues. Consider if your 2D data were generated from a multivariate gaussian that has a diagonal covariance matrix [[var1, 0], [0,var2]] – so variance in x (var1, which is eigval 1) can be different than variance in y (var2, which is eigval 2)
I agree that in theory we can end up with a eigenvalue whose eigenspace has more than one dimension (and then we have two or more orthogonals eigenvectors that span this subspace). If we want to play a little bit with the math and consider the diagonalization of the covariance matrix (which is symmetric and thus diagonalizable), we would have
where P is the orthogonal matrix containing the eigenvectors and D is the diagonal matrix containing the eigenvalues. We can interpret these eigenvalues as the proportion of variance aligned with this direction and therefore a degenerative eigenvalues would mean that two orthogonal directions would have exactely the same variance explained (this is the exactly that bothers me here). Indeed, if we consider that the explained proportion variance along a given direction follows from a continuous unknown distribution between 0 and 1, which I don’t think is too restrictive, it would be very unlikely that
where we consider two solutions from the characteristic polynomial. If the data is obtained experimentaly it sounds very unlikely to me.
Did you already observed such degenerancy in experimental data? What kind of data/measurements could produce such an amazing results?
OK, the problem is actually worse than my prior post suggests, so please allow me to update… It’s not just that the order of the eigenvectors is unclear. It’s that if two eigenvalues are the same, then the associated two eigenvectors are undetermined to any rotation within the 2D plane they span, because any 2 orthonormal vectors in the plane are an eigenbasis is in that plane.
In practice this is a problem if you want to interpret or compare a specific eigenvector itself. It’s not just about the special case of exactly equal eigenvalues in an experimental dataset. Consider if your “true” or generative underlying data structure, you have two principal axes (eigenvectors) with similar variance (eigenvalues). Then generate a finite sample of points via sampling (potentially adding noise as well, but finite sampling alone can suffice). Now you do PCA on your data sample, trying to recover those generative axes from your experimental dataset. But the noise in data generation could result in those eigenvectors capturing a similar plane/subspace as the generative axes, but up to an arbitrary rotation of the PCA axes within that plane.