W1D3 tutorial content discussion

I was also really confused by the statement that the last term is a constant! But I think this is how it goes:

I believe here sigma^2 is also estimated via MLE, thus sigma^2 = \sum [(y - y_tilde)^2] / N (we have it immediately in the next paragraph), and if we plug this into the log-likelihood equation above, the two \sum [(y - y_tilde)^2] terms (in the numerator and denominator, respectively) cancel out with each other, thus the last term in MLE is a constant.

The first term is also not technically a constant, but since the number of data point n is the same across all models, and we only care about the relative difference, we can indeed drop it.

1 Like