Part 4/8:
Mathematically Defining Surprisal
The surprise of an event ( e ) is defined as the negative logarithm of its probability:
[ I(e) = -\log(P(e)) ]
In information theory, a higher probability corresponds to a lower surprise factor. This logarithmic approach provides a diminishing return — the more probable an event, the less surprising it is, and this helps quantify varying degrees of uncertainty.
For example, when flipping a single coin, the probability of heads is 1/2. The calculation would thus yield:
[ I(heads) = -\log(1/2) = 1 \text{ bit} ]
When flipping two coins, with one successful prediction, the probability leads to a surprise of two bits.