These are different contexts. In exercise 2 we calculate the coefficients theta of the linear regression model with Gauss noise (the solution of MLE or MSE), using the output data. In exercise 3 we apply the coefficients theta to calculate the predicted output.
Dimensionally, if X has n rows (number of data points) and m columns (number of explanatory variables), theta has m rows and 1 column, so X@theta has n rows and 1 column - as many elements as data points.
X.T@X has shape mxm, X.T [mxn] and y [nx1], so their product has m elements - the number of coefficients.
In exercise 3, the equation written in the notebook is
yTlog(λ)−1Tλ, with rate λ=exp(X⊤θ)
But in the code, it was implemented as
rate = np.exp(X @ theta)
log_lik = y @ np.log(rate) - rate.sum()
So it seems that there is a discrepancy between the equation and the code?
Another related question is that on p.26 of the tutorial slides, the lambda in the log-likelihood equation is f(Xθ), while it is exp(X⊤θ) in the exercise 3 of the notebook. Why is there a transpose in the tutorial slides but not in exercise 3?