W1D3 tutorial content discussion

nalewkoz · July 15, 2020, 12:07pm

L is equal to p if we treat both as functions of x, y, and theta. Of course, as you nicely explained, L(theta| y,x) is not equal to p(theta|y,x). This is why the MLE is in general different from MAP estimators.

Haris_Organtzidis · July 15, 2020, 12:09pm

This is a nice derivation of how we end up maximising likelihood, given the specific conditions for the prior. But do you disagree that it is a matter of notation? They point to the same pdf, right? Or do you mean that only the optimal value is the same after maximising both?

carsen.stringer · July 15, 2020, 12:14pm

yeah sorry I didn’t address that they also said L vs p, in this case they are using them interchangeably I believe

no they are not the same thing?
p(theta | y,x) = p(y | x,theta) * p(theta) / p(y)

but for maximization purposes they are the same

lingqiz · July 15, 2020, 12:20pm

I was also really confused by the statement that the last term is a constant! But I think this is how it goes:

I believe here sigma^2 is also estimated via MLE, thus sigma^2 = \sum [(y - y_tilde)^2] / N (we have it immediately in the next paragraph), and if we plug this into the log-likelihood equation above, the two \sum [(y - y_tilde)^2] terms (in the numerator and denominator, respectively) cancel out with each other, thus the last term in MLE is a constant.

The first term is also not technically a constant, but since the number of data point n is the same across all models, and we only care about the relative difference, we can indeed drop it.

Haris_Organtzidis · July 15, 2020, 12:35pm

I think the order the variables appear inside the parenthesis is the confusing bit.

L(θ|x,y), a likelihood function, is not equal to p(θ|x,y), a posterior function. They are related by Bayes rule as you presented it, but I think that L(θ|x,y) is equal, the very same pdf, as p(y|θ,x).

I am assuming people wanted to write L() with θ as first argument, as this is the parameter that you would usually optimise for, as we did today, but still find it a bit confusing.

carsen.stringer · July 15, 2020, 12:55pm

oooh that is the convention! I see here: https://en.wikipedia.org/wiki/Likelihood_function

marina · July 15, 2020, 1:01pm

this question came up in my pod today, and this thread is very helpful for understanding! is there any way I can link my students to this, or is it restricted to TAs?

lingqiz · July 15, 2020, 1:18pm

I don’t think there will be any objections to open up this thread also to students? @carsen.stringer Maybe Carsen can do it?

carsen.stringer · July 15, 2020, 1:31pm

i don’t see a way to maybe next time the discussion can happen in the main forum? also then maybe you’ll get help from the people who’ve actually done the content sorry I haven’t been more helpful

marina · July 15, 2020, 2:21pm

perhaps @kevinwli, as the original poster, can change the category?

kevinwli · July 15, 2020, 5:58pm

I changed the sub-category, you can now:)

myyim · July 15, 2020, 7:30pm

In tutorial 2 appendix, what is the stimulus likelihood function? How can x be decoded given y?

myyim · July 15, 2020, 7:32pm

I found the answers
[ http://pillowlab.princeton.edu/teaching/statneuro2018/slides/slides07_encodingmodels.pdf ]

jlivezey · July 15, 2020, 8:06pm

For the T2 Exercise 1 plot, the likelihood image won’t follow changes you make to sigma in the likelihood function. The plotting code uses its own likelihood calculation with sigma=1. You can set sigma in the plotting function.

We were confused why the plots stayed the same with different sigma!

nwbrantly · July 16, 2020, 1:34am

What are the Frequentist and Bayesian frameworks mentioned in the W1D3 outro video? Dr. Wei mentions that MLE is from the frequentist framework. Maximum a posteriori (MAP) estimation is MLE with the addition of a prior probability term (prior*p(params|data). Does this mean that the Frequentist framework is subsumed by the Bayesian framework? Are there good resources for understanding these “competing” frameworks?

lingqiz · July 17, 2020, 7:37pm

In a frequentist framework, the unknown parameter (let’s call it theta) is treated as a fixed, but unknown quantity. Since it’s fixed (deterministic), statement such as “theta falls in the confidence interval 95% of the time” doesn’t make any sense, because theta is either in the interval, or not. The probability in a frequentist framework comes from sampling: Imagine we repeat the experiment a few times (e.g., draw samples from a Gaussian) , the observation we get will be different each time, thus the MLE estimate and confidence interval will also jump around.

In a Bayesian framework, theta is a random variable (that’s why we can assign a prior distribution to it!), and you can freely use expression such as “the posterior distribution of theta”, or “theta has a 95% probability to be within interval…”.

I wouldn’t say one is subsumed or better/wrose than the other. There are contexts where one is more applicable (e.g., base rate neglact) though.

Hope this answers your question (at least a little bit)

nwbrantly · July 18, 2020, 2:17am

That was helpful! Thank you for clarifying.