L is equal to p if we treat both as functions of x, y, and theta. Of course, as you nicely explained, L(theta| y,x) is not equal to p(theta|y,x). This is why the MLE is in general different from MAP estimators.
This is a nice derivation of how we end up maximising likelihood, given the specific conditions for the prior. But do you disagree that it is a matter of notation? They point to the same pdf, right? Or do you mean that only the optimal value is the same after maximising both?
yeah sorry I didnât address that they also said L vs p, in this case they are using them interchangeably I believe
no they are not the same thing?
p(theta | y,x) = p(y | x,theta) * p(theta) / p(y)
but for maximization purposes they are the same
I was also really confused by the statement that the last term is a constant! But I think this is how it goes:
I believe here sigma^2
is also estimated via MLE, thus sigma^2 = \sum [(y - y_tilde)^2] / N
(we have it immediately in the next paragraph), and if we plug this into the log-likelihood equation above, the two \sum [(y - y_tilde)^2]
terms (in the numerator and denominator, respectively) cancel out with each other, thus the last term in MLE is a constant.
The first term is also not technically a constant, but since the number of data point n
is the same across all models, and we only care about the relative difference, we can indeed drop it.
I think the order the variables appear inside the parenthesis is the confusing bit.
L(θ|x,y), a likelihood function, is not equal to p(θ|x,y), a posterior function. They are related by Bayes rule as you presented it, but I think that L(θ|x,y) is equal, the very same pdf, as p(y|θ,x).
I am assuming people wanted to write L() with θ as first argument, as this is the parameter that you would usually optimise for, as we did today, but still find it a bit confusing.
this question came up in my pod today, and this thread is very helpful for understanding! is there any way I can link my students to this, or is it restricted to TAs?
I donât think there will be any objections to open up this thread also to students? @carsen.stringer Maybe Carsen can do it?
i donât see a way to maybe next time the discussion can happen in the main forum? also then maybe youâll get help from the people whoâve actually done the content sorry I havenât been more helpful
I changed the sub-category, you can now:)
In tutorial 2 appendix, what is the stimulus likelihood function? How can x be decoded given y?
I found the answers
[ http://pillowlab.princeton.edu/teaching/statneuro2018/slides/slides07_encodingmodels.pdf ]
For the T2 Exercise 1 plot, the likelihood image wonât follow changes you make to sigma
in the likelihood
function. The plotting code uses its own likelihood calculation with sigma=1
. You can set sigma
in the plotting function.
We were confused why the plots stayed the same with different sigma
!
What are the Frequentist and Bayesian frameworks mentioned in the W1D3 outro video? Dr. Wei mentions that MLE is from the frequentist framework. Maximum a posteriori (MAP) estimation is MLE with the addition of a prior probability term (prior*p(params|data). Does this mean that the Frequentist framework is subsumed by the Bayesian framework? Are there good resources for understanding these âcompetingâ frameworks?
In a frequentist framework, the unknown parameter (letâs call it theta) is treated as a fixed, but unknown quantity. Since itâs fixed (deterministic), statement such as âtheta falls in the confidence interval 95% of the timeâ doesnât make any sense, because theta is either in the interval, or not. The probability in a frequentist framework comes from sampling: Imagine we repeat the experiment a few times (e.g., draw samples from a Gaussian) , the observation we get will be different each time, thus the MLE estimate and confidence interval will also jump around.
In a Bayesian framework, theta is a random variable (thatâs why we can assign a prior distribution to it!), and you can freely use expression such as âthe posterior distribution of thetaâ, or âtheta has a 95% probability to be within intervalâŚâ.
I wouldnât say one is subsumed or better/wrose than the other. There are contexts where one is more applicable (e.g., base rate neglact) though.
Hope this answers your question (at least a little bit)
That was helpful! Thank you for clarifying.