Learning Parameters: Case Study (cont.)
Comparing two distribution P(x) (true model) vs. Q(x) (learned distribution) -- Measure their KL Divergence
-
-
-
- 1 KL divergence (when logs are in base 2) =
- The probability P assigns to an instance will be, on average, twice as small as the probability Q assigns to it
- KL(P||Q) ? 0
- KL(P||Q) = 0 iff are P and Q equal