Learning Parameters: Case Study (cont.)

Comparing two distribution P(x) (true model) vs. Q(x) (learned distribution) -- Measure their KL Divergence

1 KL divergence (when logs are in base 2) =
- The probability P assigns to an instance will be, on average, twice as small as the probability Q assigns to it
KL(P||Q) ? 0
KL(P||Q) = 0 iff are P and Q equal