Learning Bayesian Networks from Data
8/11/98
Click here to start
Authors:
Nir Friedman, UC Berkeley
Moises Goldszmidt, SRI International
Table of Contents
Learning Bayesian Networks from Data
Outline
Learning (in this context)
Why learning?
Why learn a Bayesian network?
What will I get out of this tutorial?
Outline
Probability 101
Representing the Uncertainty in a Domain
Probabilistic Independence: a Key for Representation and Reasoning
Probabilistic Independence: a Key for Representation and Reasoning
Bayesian Networks
Bayesian Networks
Bayesian Network Semantics
Monitoring Intensive-Care Patients
Qualitative part
d-separation
Example
I-Equivalent Bayesian Networks
Quantitative Part
What Can We Do with Bayesian Networks?
Bayesian Networks: Summary
Learning Bayesian networks (reminder)
The Learning Problem
Learning Problem
Learning Problem
Learning Problem
Learning Problem
Outline
Example: Binomial Experiment (Statistics 101)
Statistical parameter fitting
The Likelihood Function
Sufficient Statistics
Maximum Likelihood Estimation
Maximum Likelihood Estimation (Cont.)
Example: MLE in Binomial Data
Learning Parameters for the Burglary Story
General Bayesian Networks
General Bayesian Networks (Cont.)
From Binomial to Multinomial
Likelihood for Multinomial Networks
Is MLE all we need?
Bayesian Inference
Bayesian Inference (cont.)
Bayesian Inference (cont.)
Bayesian Inference (cont.)
Example: Binomial Data Revisited
Bayesian Inference and MLE
Dirichlet Priors
Dirichlet Priors (cont.)
Priors Intuition
Effect of Priors
Effect of Priors (cont.)
Conjugate Families
Bayesian Networks and Bayesian Prediction
Bayesian Networks and Bayesian Prediction (Cont.)
Bayesian Prediction(cont.)
Bayesian Prediction(cont.)
Assessing Priors for Bayesian Networks
Learning Parameters: Case Study (cont.)
Learning Parameters: Case Study (cont.)
Learning Parameters: Case Study (cont.)
Learning Parameters: Summary
Outline
Incomplete Data
Missing Values
Missing Values (cont.)
Missing Values (cont.)
Hidden (Latent) Variables
Hidden Variables (cont.)
Learning Parameters from Incomplete Data
Example
Learning Parameters from Incomplete Data (cont.).
MLE from Incomplete Data
Gradient Ascent
Expectation Maximization (EM)
EM (cont.)
EM (cont.)
Example: EM in clustering
EM in Practice
Error on training set (Alarm)
Test set error (alarm)
Parameter value (Alarm)
Parameter value (Alarm)
Parameter value (Alarm)
Bayesian Inference with Incomplete Data
MAP Approximation
Stochastic Approximations
Stochastic Approximations (cont.)
Stochastic Approximations: Gibbs Sampling
Parameter Learning from Incomplete Data: Summary
Outline
Benefits of Learning Structure
Why Struggle for Accurate Structure
Approaches to Learning Structure
Constraints versus Scores
Likelihood Score for Structures
Likelihood Score for Structure (cont.)
Likelihood Score for Structure (cont.)
Avoiding Overfitting
Avoiding Overfitting (cont..)
Minimum Description Length
Minimum Description Length (cont.)
Minimum Description: Complexity Penalization
Minimum Description: Example
Minimum Description: Example (cont.)
Consistency of the MDL Score
Bayesian Inference
Marginal Likelihood: Binomial case
Marginal Likelihood: Binomials (cont.)
Binomial Likelihood: Example
Marginal Likelihood: Example (cont.)
Marginal Likelihood: Multinomials
Marginal Likelihood: Bayesian Networks
Marginal Likelihood (cont.)
Priors and BDe score
Bayesian Score: Asymptotic Behavior
Bayesian Score: Asymptotic Behavior
Scores -- Summary
Outline
Optimization Problem
Learning Trees
Learning Trees (cont.)
Learning Trees (cont)
Learning Trees: Example
Beyond Trees
Heuristic Search
Heuristic Search (cont.)
Exploiting Decomposability in Local Search
Greedy Hill-Climbing
Greedy Hill-Climbing (cont.)
Greedy Hill-Climbing (cont.)
Greedy Hill-Climbing
Other Local Search Heuristics
I-Equivalence Class Search
I-Equivalence Class Search (cont.)
Search and Statistics
Learning in Practice: Time & Statistics
Learning in Practice: Alarm domain
Model Averaging
Model Averaging (cont.)
Model Averaging (cont.)
Search: Summary
Outline
Local and Global Structure
Local structure: Decision trees
Learning decision trees
Effects on learning
Local Structure ? More Accurate Global Structure
Local structure: Noisy Or
Local structure: Noise-Or decomposition
Other Types of Local Structure
Outline
The Classification Problem
Examples
Approaches
Generative Models
Optimality of the decision rule Minimizing the error rate...
Advantages of the Generative Model Approach
Advantages of Using a Bayesian Network
The Naive Bayesian Classifier
The Naive Bayesian Classifier (cont.)
Improving Naive Bayes
Tree Augmented Naive Bayes (TAN)
Evaluating the performance of a classifier: n-fold cross validation
Performance: TAN vs. Naive Bayes
Performance: TAN vs C4.5
Beyond TAN
Performance: TAN vs. Bayesian Networks
What is the problem?
Classification: Summary
Outline
Learning Causal Relations (Thanks to David Heckerman and Peter Spirtes for the slides)
Causal Discovery by Experiment
What is 'Cause' Anyway?
Probabilistic vs. Causal Models
To Predict the Effects of Actions: Modify the Causal Graph
Causal Model
Ideal Interventions
How Can We Learn Cause and Effect from Observational Data?
Learning Cause from Observations: Constraint-Based Approach
Causal Markov assumption
Faithfulness
Other assumptions
All models under consideration are causal
Learning Cause from Observations: Constraint-based method
Learning Cause from Observations: Constraint-based method
Learning Cause from Observations: Constraint-based method
Learning Cause from Observations: Constraint-based method
Learning Cause from Observations: Constraint-Based Method
Cannot Always Learn Cause
But with four (or more) variables . . .
Constraint-Based Approach
The Bayesian Approach
The Bayesian approach
Assumptions
Definition of Model Hypothesis G
Faithfulness
Causes of publishing productivity Rodgers and Maranto 1989
Causes of publishing productivity
Results of Greedy Search...
Other Models
Bayesian Model Averaging
Challenges for the Bayesian Approach
Benefits of the Two Approaches
Summary
Outline
Learning Structure for Incomplete Data
Incomplete Data : Structure Scores
Incomplete Data : Structure Scores (cont.)
PPT Slide
PPT Slide
Problem
PPT Slide
Structural EM
Structural EM
Expected scores
How do we choose Q(H)?
Structural EM for MDL
PPT Slide
Structural EM in Practice
The Structural EM Procedure
Structural EM: Convergence Properties
Learning Structure from Incomplete Data: Summary
Outline
Summary: Learning Bayesian Networks
Untouched issues
Untouched Issues (Cont.)
Some Applications
Systems
Systems (Cont.)
Current Topics
Perspective: What's Old and What's New
The Future...
Many thanks to...