What is the problem?
Objective function
- Learning of arbitrary Bayesian networks optimizes P(C, F1,...,Fn)
- It may learn a network that does a great job on P(Fi,...,Fj) but a poor job on P(C | F1,...,Fn) (Given enough data… No problem…)
- We want to optimize classification accuracy or at least the conditional likelihood P(C | F1,...,Fn)
- Scores based on this likelihood do not decompose ? learning is computationally expensive!
- Controversy as to the correct form for these scores
Naive Bayes, Tan, etc circumvent the problem by forcing a structure where all features are connected to the class