Bayesian Brain Models Discussions - W2D1

Let’s use this topic as Q/A for W2D1-W2D5

Happy learning!

1 Like

W2D1, Tutorial 3, Exercise 6:
Are the axis labels for the marginalisation array swapped?? Unless I’m missing something, these can’t be both correct:



1 Like

W2D1 T1

In the text above exercise 2B, it is asked to compute the mean of the posterior distribution. I think there might be a x missing in the continous form :

\int_x p(x)dx \vspace{0.5cm}\rightarrow \int_x x p(x)dx

Without this x, the integral gives 1 since we are integrating the pdf over its domain.

Question about W2D1_T3: I am confused by the mathematical notations here. How is the equation in Video 6: p(x_hat | x) = integration( p(x_tilde | x) f(x_tilde) dx_tilde) related to the illustration and explanation in Introduction (between Video 1 and Section 1)?

For exercise 1 of W2D1, the solution has px = np.exp(- 1/2/sigma**2 * (mu - x_points) ** 2), which is subsequently normalized. Why does the solution leave out the coefficient 1/sqrt(2pivariance) shown in the general form for the Gaussian?

I find that exercises in tutorial 3 are presented and solved in a confusing and inconsistent way.

There are three variables of interest: x, x_tilde, x_hat.

  • x is the real stimulus value
  • x_tilde is the perceived stimulus value( brain representation of x)
  • x_hat is the subject’s guess about the true stimulus value

Here’s what I find confusing or misleading:

  • in tutorial 1 the prior, likelihood and posterior are plotted as if they were the same kind of functions, without (as far as I could see) making it clear that the prior is a function of x alone, while the likelihood and posterior are functions of x and x_tilde, and therefore plotting them as functions of x implies fixing a value of x_tilde
  • in tutorial 3 the likelihood_array has dimensions #possible values for x X #possible values for x_tilde, with likelihood_array[i,j]= p(x_tilde_j| x_i)
  • prior array has the same shape, and therefore prior[i,j] should be p(x_i), so independent of j, but here it seems to be flipped
  • posterior_array has the same dimensions, so it should be the case that posterior_array[i,j] is p(x_i|x_tilde_j), but instead the text says “For each row of the prior and likelihood (i.e. each hypothetical 𝑥 ), fill in the posterior matrix so that every row of the posterior array represents the posterior density for a different hypothetical 𝑥 .” which seems contradictory to me, since it implies that now the columns index the possible values of x.
  • the binary_decision matrix has the same shape, but now columns index not the values of x_tilde, but the values of x_hat.

So far we’re computing the way the subject generates the response.

Next the marginalisation should be so that we can get p(x_hat|x), which depends on the parameters of subject’s priors, so that we can then infer these. So marginalisation should happen over x_tilde, but I’m very confused as to how exactly that’s implemented.


whether you leave the 1/sqrt(2pivar) out or not in this case is the same, since it will dissappear when you normalize: it will appear in each entry of px, and therefore also in the np.sum(px), so it gets cancelled in the normalization.

I agree… so far my head have been jumping between “ok i think i understand it now” and “wait it doesn’t seem to be what I thought”…


This question was posted to Slack and has been forwarded to content creators, but I wanted to repost here to keep q’s about W2D1 all in the same place.

Question about W2D1_T3 Ex. 6: Should we be summing “marginalization_array” over axis=1, not axis=0, and thus summing over x? If correct, this would also require changing 1st input when plotting “marginal”. If that’s wrong, could someone explain why p(x_hat) should peak close to 0, as shown in the output for Ex. 6? Thanks

Regarding W2D1T3, in the video and explanations, it seems like the matrix should have each row representing encoding position (x_tilde) and each column representing the stimulus location (X), but it is flipped for most of the matrices we built with the code.

Furthermore, the flip doesn’t seem to be consistently correct. Such as for the prior matrix, since the notebook is using “hypothetical true position” as X. then, the high probability stripe should probably be horizontal instead of vertical, if we are using the flipped labels.

Also second some of the questions @elena_zamfir mentioned.

1 Like

:)) yes, there doesn’t seem to be a consistent way to imbue it with meaning… :stuck_out_tongue:

Not sure if this is helpful since I am as confused as you are, but I feel for the last point that you mentioned (changing from x_tilde to x_hat) is probably because we are estimating the optimal participants response (x_hat) from their brain encoding (x_tilde). Therefore, after we take the optimization (take the mean here), the x_tilde become x_hat, which is our estimate of participants response.

So the coefficient would be necessary if we were instead integrating the Gaussian from -inf to inf, but in this case it is not, because we normalize manually. Got it. I thought this was the case, but wanted to be sure in case my students take a look at the solutions and wonder why it was left out. Thanks!

that’s indeed what we are doing, setting x_hat as the mean of the posterior p(x|x_tilde), but then the variables in the matrix should be x_tilde and x_hat. indeed p(x_hat|x_tilde) has all the weight on one entry, the mean of p(x|x_tilde). but we haven’t yet computed p(x_hat|x)

1 Like

exactly, the stripe should be horizontal for the prior, if we’re consistent, and given the way matrixes are multiplied to get the posterior

actually… you can use the coefficient term for normalisation, but you need to *dt as well. And I think that’s the “proper” way of normalising the Gaussian, as in the solution he said /sum() is proper only if we include similar number of points on both sides of mu.

Thanks! Yes, that makes perfect sense. I had noticed before that pre- and post-normalization, the curve was scaled by a factor of 0.1. Now I realize that this is just the dt that was used in the arange function.

W2D1 tutorial 3, section 5 onwards:
can someone explain conceptually why we need to further apply a gaussian over the true stimulus on each given trial? Do we think of this just as noise in the stimulus itself?

1 Like

Yes! Nice catch. Will try to push out a patch for this shortly.

I think you are right and the ax-labels need to be swapped in the second one