W1D5 tutorial content discussion

kevinwli · July 16, 2020, 12:40pm

I’ll make a start today by pointing out a few things as suggested in previous discussions:

if there is a typo, raise an issue on GitHub directly;
while our discussion can be very technical (which is brilliant!), remember that we as TAs are trying to clear things up so that we can serve students with more confidence.

So far I’ve found nothing worth posting here!

kshitijdwivedi93 · July 17, 2020, 7:05am

For people who wish to understand where we can apply tsne and how to interpret the results, I found this article really helpful https://distill.pub/2016/misread-tsne/

FedeClaudi · July 17, 2020, 8:13am

Have to say the materials for today’s tutorials are really outstanding, kudos to the creators.

The only very small imperfection I’ve found is in the first video of T4, when talking about the example of what 1 would look like Alex says 0, but it’s clear from context that she’s talking about 1 so it’s fine.

There’s also a small issue with the code of T1E1, which I’ve reported here

deantm4 · July 17, 2020, 8:41am

I came here to share this. It’s an incredible reference.
An article of a similar style for umap is:

It’s very similar to t-SNE, and seems to be used a bit in neuroscience lately, including some of the dataset notebooks.

decomite · July 17, 2020, 8:58am

I have a more technical question about PCA. In the tutorials it is shown that we can obtain PCA by computing the eigenvalues & eigenvectors. When you compute the eigenvalues of a matrix, you can end up with an eigenvalue that has an associated eigenspace of more than 1 dimension (ie a degenerate eigenvalue (not sure about the translation from French ).

My question is : Is it possible to obtain twice (or more) the same eigenvalues while computing PCA? To me it sounds very unlikely because this is experimental data but I am still asking and curious.

FedeClaudi · July 17, 2020, 9:36am

Hi,

What is a good intuition of why the eigenvectors/eigenvalues of the covariance matrix is the ‘solution’ to PCA?

johnmurray · July 17, 2020, 10:27am

Yes, you can get two eigenvalues of the same value. That means that those two dimensions capture equal amounts of variance in the data. The degeneracy here would simply be in the ordering of components - the corresponding eigenvectors would still be distinct orthogonal axes.

johnmurray · July 17, 2020, 10:31am

Here’s one way to think about it: In the eigenvector basis, the covariance matrix is now diagonal. This diagonal matrix has variances for each component along the diagonal and zero off-diagonal covariance. Hence, the component scores are uncorrelated. Now ordering the components by the diagonal value is ordering by variance captured.

FedeClaudi · July 17, 2020, 10:33am

That does help, thank you!

LeilaSalvesen · July 17, 2020, 11:42am

Hi!

T3-BonusEx6:

We were wondering about the fact that projX_noisy is asked to be computed using evectors and not evectors_noisy. However, we are not supposed to know the actual eigenvectors without noise in a “realistic” situation, because we wouldn’t have access to the original data. When we compute with the noisy eigenvectors it works almost as well, and actually makes more sense… am I wrong?

raphael.schween · July 17, 2020, 12:06pm

I second both the question and the intuition suggested.

Sophie_Laturnus · July 17, 2020, 12:18pm

Here is another great resource on how to use t-SNE. It is spun around transcriptomic data but the mechanisms apply generally: https://www.nature.com/articles/s41467-019-13056-x

Aakash · July 17, 2020, 12:46pm

In the last interactive demo of W1D5 tutorial 2, Is it correct to say that we can have unequal eigen values in zero correlation coefficient condition as well (when the data distribution is elongated along x axis)

In other words, there is no relation between difference in eigen values and correlation coefficient right?

johnmurray · July 17, 2020, 1:10pm

What is “realistic” depends on what an imagined use case is. Consider you have a large set of samples (clean or noisy), and you want to construct your denoising filters from that, which then you can take out into the wild to denoise a new noisy sample, even one by one. As an example with MNIST, you have many clean written numbers in your training dataset, but now you want a denoising filter you can apply to a single new image of a smudged address number on new piece of mail. So this would be a case you have large separate training set which you use to define your PCA weights as future denoising filters, and then later you have a smaller (even single data point) testing sample you want to denoise using the pre-defined filters.

farbakhsh · July 17, 2020, 1:16pm

I don’t understand it either.

johnmurray · July 17, 2020, 1:34pm

If I’m understanding your question, then yes – you can have zero correlation coefficient but unequal eigenvalues. Consider if your 2D data were generated from a multivariate gaussian that has a diagonal covariance matrix [[var1, 0], [0,var2]] – so variance in x (var1, which is eigval 1) can be different than variance in y (var2, which is eigval 2)

Aakash · July 17, 2020, 3:47pm

That’s right. I think that part of the tutorial is a bit misleading

johnmurray · July 17, 2020, 3:58pm

Thanks for the feedback! We’ll take a look for further refinement of the material.

decomite · July 17, 2020, 4:04pm

Thanks for your answer!!!

I agree that in theory we can end up with a eigenvalue whose eigenspace has more than one dimension (and then we have two or more orthogonals eigenvectors that span this subspace). If we want to play a little bit with the math and consider the diagonalization of the covariance matrix (which is symmetric and thus diagonalizable), we would have

$\Sigma = PDP^{-1}$

where P is the orthogonal matrix containing the eigenvectors and D is the diagonal matrix containing the eigenvalues. We can interpret these eigenvalues as the proportion of variance aligned with this direction and therefore a degenerative eigenvalues would mean that two orthogonal directions would have exactely the same variance explained (this is the exactly that bothers me here). Indeed, if we consider that the explained proportion variance along a given direction follows from a continuous unknown distribution between 0 and 1, which I don’t think is too restrictive, it would be very unlikely that

$\lambda_i = \lambda_j$

where we consider two solutions from the characteristic polynomial. If the data is obtained experimentaly it sounds very unlikely to me.

Did you already observed such degenerancy in experimental data? What kind of data/measurements could produce such an amazing results?

Thank you

johnmurray · July 17, 2020, 6:21pm

OK, the problem is actually worse than my prior post suggests, so please allow me to update… It’s not just that the order of the eigenvectors is unclear. It’s that if two eigenvalues are the same, then the associated two eigenvectors are undetermined to any rotation within the 2D plane they span, because any 2 orthonormal vectors in the plane are an eigenbasis is in that plane.

In practice this is a problem if you want to interpret or compare a specific eigenvector itself. It’s not just about the special case of exactly equal eigenvalues in an experimental dataset. Consider if your “true” or generative underlying data structure, you have two principal axes (eigenvectors) with similar variance (eigenvalues). Then generate a finite sample of points via sampling (potentially adding noise as well, but finite sampling alone can suffice). Now you do PCA on your data sample, trying to recover those generative axes from your experimental dataset. But the noise in data generation could result in those eigenvectors capturing a similar plane/subspace as the generative axes, but up to an arbitrary rotation of the PCA axes within that plane.

Does that help?