However when i run entropy on
lmf = [1/2, 1/3 1/6] i get entropy(lmf) = 1.45914
when i do entropy([1/2, 1/2]) + 1/2 * entropy([1/3, 1/6]) i get 1.47957.
Are my calculations wrong or is there something else going on?
Your calculations seem fine, but there’s a typo (I think) in the problem statement.
It should be (I think)
H(1/2,1/3,1/6) = H(1/2,1/2) + 0.5*H(2/3,2/6)
You should check that this works; don’t just take my word for it.
The typo is the "2"s in 2/3 and 2/6 should be there instead of the "1"s in the text you quoted.
That’s because, the 1/2 of the time we do not sample the first point, the probabilities of getting the other points are 2/3 and 2/6, whereas when we look at all samples (including ones where we do get the first point – which is 1/2 the time) those probabilities are 1/3 and 1/6. Those conditional probabilities (e.g., probabilities of the second and third points, on the times we don’t sample the first point) need to sum to one, for this to be a probability distribution. Does that make sense?