Mutual information is the idea that learning something about one variable might tell you about another. For example, learning that it’s daytime might give you information about whether the sun is shining. it could still be cloudy, but you can be more sure that it’s sunny than before you learned it was daytime.
Mathematically, mutual information is represented using the concept of entropy. The information gained about a variable X, assuming you learn Y, is given by: I(X;Y) = H(X) - H(X|Y)
In this case, H(.) is a measure of the entropy. It is given by H(X) = \sum_x p(x) \log_2(\frac{1}{p(x)})
Mutual information is supposed to be symmetric (I(X;Y) = I(Y;X)), but I’m interested in how that works in a causal context.
Let’s say you have a lightbulb that can be turned on from either of two light switches. If either lightswitch is on, then the bulb is on. Learning that one light switch is on tells you the bulb is on, but learning that the bulb is on does *not* tell you that one specific light switch is on. It tells you that at least one is on (but not which one).
Let’s assume for the sake of argument that each light switch has a probability p(on) = 0.25 of being turned on (and equivalently a probability p(off) = 0.75 of being off). Assume also that they’re independent.
The entropy of switch one is
H(S1) = p(on)\log_2(\frac{1}{p(on)}) + p(off)\log_2(\frac{1}{p(off)})
H(S1) = 1/4* \log_2(4) + 3/4 * \log_2(\frac{4}{3})
H(S1) = 0.811
Since either switch has a probability of 0.25 of being on, and they’re independent, the bulb itself has a probability of 7/16 of being on.
The entropy of the bulb is
H(B) = p(on)\log_2(\frac{1}{p(on)}) + p(off)\log_2(\frac{1}{p(off)})
H(B) = 7/16 * \log_2(\frac{16}{7}) + 9/16 * \log_2(\frac{16}{9})
H(B) = 0.989
If you know switch 1’s state, then the information you have about the light is given by
I(B;S1) = H(B) - H(B|S1)
I(B;S1) = H(B) - (3/4*H(B|S1=off) + 1/4*H(B|S1=on))
I(B;S1) = 0.989 - (3/4*0.811 + 1/4*0) = 0.380
If instead you know the bulb’s state, then the information you have about switch 1 is given by
I(S1;B) = H(S1) - H(S1|B)
I(S1;B) = H(S1) - (9/16*H(S1|B=off) + 7/16*H(S1|B=on))
I(S1;B) = 0.811 - (9/16*0 + 7/16 * 0.985) = 0.380
So even in a causal case the mutual information is still symmetric.
For me the point that helps give an intuitive sense of this is that if you know S1 is on, you know the bulb is on. Symmetrically, if you know the bulb is off, you know that S1 is off.