However when i run entropy on
lmf = [1/2, 1/3 1/6] i get entropy(lmf) = 1.45914
when i do entropy([1/2, 1/2]) + 1/2 * entropy([1/3, 1/6]) i get 1.47957.

Are my calculations wrong or is there something else going on?

Your calculations seem fine, but there’s a typo (I think) in the problem statement.

It should be (I think)

H(1/2,1/3,1/6) = H(1/2,1/2) + 0.5*H(2/3,2/6)

You should check that this works; don’t just take my word for it.

The typo is the "2"s in 2/3 and 2/6 should be there instead of the "1"s in the text you quoted.

That’s because, the 1/2 of the time we do not sample the first point, the probabilities of getting the other points are 2/3 and 2/6, whereas when we look at all samples (including ones where we do get the first point – which is 1/2 the time) those probabilities are 1/3 and 1/6. Those conditional probabilities (e.g., probabilities of the second and third points, on the times we don’t sample the first point) need to sum to one, for this to be a probability distribution. Does that make sense?