I noticed during the computation of the maximum likelihood, the function (likelihood(theta_hat, x, y)) computed the likelihood for each data point. Do we sum them up to get a composite score
You multiply the likelihoods for each data point (under the assumption that noise affects each output independently), and that gives you the aggregate likelihood for all data points. The multiplication arises from computing a joint probability between those individual likelihood values. In practice, we work with the log of the likelihoods to transform the product into a sum, which has better numerical properties.
2 Likes