L is equal to p if we treat both as functions of x, y, and theta. Of course, as you nicely explained, L(theta| y,x) is not equal to p(theta|y,x). This is why the MLE is in general different from MAP estimators.
This is a nice derivation of how we end up maximising likelihood, given the specific conditions for the prior. But do you disagree that it is a matter of notation? They point to the same pdf, right? Or do you mean that only the optimal value is the same after maximising both?
yeah sorry I didnât address that they also said L vs p, in this case they are using them interchangeably I believe
no they are not the same thing?
p(theta | y,x) = p(y | x,theta) * p(theta) / p(y)
but for maximization purposes they are the same
I was also really confused by the statement that the last term is a constant! But I think this is how it goes:
I believe here sigma^2 is also estimated via MLE, thus sigma^2 = \sum [(y - y_tilde)^2] / N (we have it immediately in the next paragraph), and if we plug this into the log-likelihood equation above, the two \sum [(y - y_tilde)^2] terms (in the numerator and denominator, respectively) cancel out with each other, thus the last term in MLE is a constant.
The first term is also not technically a constant, but since the number of data point n is the same across all models, and we only care about the relative difference, we can indeed drop it.
I think the order the variables appear inside the parenthesis is the confusing bit.
L(θ|x,y), a likelihood function, is not equal to p(θ|x,y), a posterior function. They are related by Bayes rule as you presented it, but I think that L(θ|x,y) is equal, the very same pdf, as p(y|θ,x).
I am assuming people wanted to write L() with θ as first argument, as this is the parameter that you would usually optimise for, as we did today, but still find it a bit confusing.
this question came up in my pod today, and this thread is very helpful for understanding! is there any way I can link my students to this, or is it restricted to TAs?
I donât think there will be any objections to open up this thread also to students? @carsen.stringer Maybe Carsen can do it?
i donât see a way to
maybe next time the discussion can happen in the main forum? also then maybe youâll get help from the people whoâve actually done the content
sorry I havenât been more helpful
I changed the sub-category, you can now:)
In tutorial 2 appendix, what is the stimulus likelihood function? How can x be decoded given y?
I found the answers
[ http://pillowlab.princeton.edu/teaching/statneuro2018/slides/slides07_encodingmodels.pdf ]
For the T2 Exercise 1 plot, the likelihood image wonât follow changes you make to sigma in the likelihood function. The plotting code uses its own likelihood calculation with sigma=1. You can set sigma in the plotting function.
We were confused why the plots stayed the same with different sigma!
What are the Frequentist and Bayesian frameworks mentioned in the W1D3 outro video? Dr. Wei mentions that MLE is from the frequentist framework. Maximum a posteriori (MAP) estimation is MLE with the addition of a prior probability term (prior*p(params|data). Does this mean that the Frequentist framework is subsumed by the Bayesian framework? Are there good resources for understanding these âcompetingâ frameworks?
In a frequentist framework, the unknown parameter (letâs call it theta) is treated as a fixed, but unknown quantity. Since itâs fixed (deterministic), statement such as âtheta falls in the confidence interval 95% of the timeâ doesnât make any sense, because theta is either in the interval, or not. The probability in a frequentist framework comes from sampling: Imagine we repeat the experiment a few times (e.g., draw samples from a Gaussian) , the observation we get will be different each time, thus the MLE estimate and confidence interval will also jump around.
In a Bayesian framework, theta is a random variable (thatâs why we can assign a prior distribution to it!), and you can freely use expression such as âthe posterior distribution of thetaâ, or âtheta has a 95% probability to be within intervalâŚâ.
I wouldnât say one is subsumed or better/wrose than the other. There are contexts where one is more applicable (e.g., base rate neglact) though.
Hope this answers your question (at least a little bit) 
That was helpful! Thank you for clarifying.
