Derivation for marginalization in W2D1 Tutorial 3

Hey everyone,

If you are still curious about the relationships between x, \widetilde{x} and \hat{x} and are wondering how to arrive at the marginalization formula at the end of Tutorial 3 for W2D1, I wrote up a derivation, based on steps suggested by Roman Pogodin, and thought I’d share here. Feel free to point out any errors!

We’re interested in the relationships between the true stimulus position x, the latent representation in the brain \widetilde{x} and the produced response \hat{x}.

Remember the definition of conditional probabilities:

P(A|B) = \frac{P(A,B)}{P(B)}

The whole objective of this exercise is to derive the formula for p(\hat{x}|x), which represents how responses will behave given a certain position stimulus:

p(\hat{x}|x) = ?

From the definition of conditional probabilities, we have:

p(\hat{x}|x) = \frac{p(\hat{x},x)}{p(x)}

Now let’s make use of our knowledge of how \widetilde{x} relates to these variables to our advantage, and represent the joint probability p(\hat{x},x) as a marginal distribution of \widetilde{x}. Using the definition of marginalization, and assuming we can pass the denominator into the integral,

p(\hat{x}|x) = \frac{\int_{\widetilde{x}} p(\hat{x},x|\widetilde{x})p(\widetilde{x}) d\widetilde{x}}{p(x)} = \int_{\widetilde{x}}\frac{ p(\hat{x},x|\widetilde{x})p(\widetilde{x})} {p(x)}d\widetilde{x}

We will see this quantity in the integral again, so keep that in mind. Now let’s try to find out how to re-frame the quantity inside the integral with something more approachable, preferably with variables we already defined in the previous exercises. Let’s start by defining some variables in advance, including the full model including all x variables. From the definition of conditional probabilities, note that

p(\hat{x},x|\widetilde{x}) = \frac{p(x,\widetilde{x},\hat{x})}{ p(\widetilde{x})}

Save this information for later - we need to define some more things. On the other hand, it is also true from the definition of conditional probabilities that

p(x,\widetilde{x},\hat{x}) = p(\hat{x} |\widetilde{x},x) p(x,\widetilde{x})

Since p(x,\widetilde{x}) = p(\widetilde{x}|x)p(x), this is the same as

p(x,\widetilde{x},\hat{x}) = p(\hat{x}|\widetilde{x},x)p(\widetilde{x}|x)p(x)

We can assume here that given \widetilde{x}, then we have full knowledge of p(\hat{x}|\widetilde{x},x), regardless of x. This is because we’re assuming a generative model where behavior is fully generated from the latent variable \widetilde{x} inside the brain, and once that is known, we don’t gain any information from knowing x. Therefore, we can tweak the equation to:

p(x,\widetilde{x},\hat{x}) = p(\hat{x}|\widetilde{x})p(\widetilde{x}|x)p(x)

Now let’s use the information we saved from before with this:

p(\hat{x}|\widetilde{x})p(\widetilde{x}|x)p(x) = p(\hat{x},x|\widetilde{x})p(\widetilde{x})


p(\hat{x}|\widetilde{x})p(\widetilde{x}|x) = \frac{p(\hat{x},x|\widetilde{x})p(\widetilde{x})}{p(x)}

This turns out to be the same quantity described in the integral from before!
So we can re-write the integral as:

p(\hat{x}|x) = \int_{\widetilde{x}} p(\hat{x}|\widetilde{x})p(\widetilde{x}|x) d\widetilde{x}

So it turns out this reduced the problem to variables we already have access to: p(\hat{x}|\widetilde{x}) is the relationship between the internal estimate and produced responses, which we determined to be the binary response function in the exercise. Finally, p(\widetilde{x}|x) is the relationship between estimated internal variables, given a stimulus position, which is the input matrix we also defined in the exercise. Now it’s all about how to best numerically compute quantities an integrate them!

1 Like

Thanks so much! I was doing the derivation and got f(x_tilde) = p(x_hat | x, x_tilde), then could not figure out how that equals to the decision array which is p(x_hat | x_tilde)… Your explanation helps a lot!!

1 Like