Download presentation
Presentation is loading. Please wait.
1
Irina Rish IBM T.J.Watson Research Center rish@us.ibm.com
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center
2
“Road map” Introduction and motivation: How to use them
What are Bayesian networks and why use them? How to use them Probabilistic inference How to learn them Learning parameters Learning graph structure Summary
3
Bayesian Networks P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?
X-ray Bronchitis Dyspnoea P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?
4
Bayesian Networks: Representation
P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S) Smoking lung Cancer Bronchitis CPD: C B D=0 D=1 X-ray Dyspnoea P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) Conditional Independencies Efficient Representation
5
What are they good for? Diagnosis: P(cause|symptom)=?
Prediction: P(symptom|cause)=? Classification: P(class|data) Decision-making (given a cost function) Medicine Bio-informatics Computer troubleshooting Stock market Text Classification Speech recognition
6
Example: Printer Troubleshooting
7
Bayesian networks: inference P(X|evidence)=?
P(s|d=1) “Moral” graph S X D B C P(s)P(c|s)P(b|s)P(x|c,s)P(d|c,b)= C B P(c|s)P(x|c,s)P(d|c,b) X D P(s) P(b|s) Variable Elimination W*=4 ”induced width” (max clique size) Complexity: Efficient inference: variable orderings, conditioning, approximations
8
“Road map” Introduction and motivation: How to use them
What are Bayesian networks and why use them? How to use them Probabilistic inference Why and how to learn them Learning parameters Learning graph structure Summary
9
Why learn Bayesian networks?
< > < ?? ??> < ?? > <?? ??> ………………. Combining domain expert knowledge with data Efficient representation and inference Incremental learning: P(H) or Handling missing data: < ?? > S C Learning causal relationships:
10
Learning Bayesian Networks
Known graph C S B D X P(S) P(B|S) P(X|C,S) P(C|S) P(D|C,B) – learn parameters Complete data: parameter estimation (ML, MAP) Incomplete data: non-linear parametric optimization (gradient descent, EM) C S B D X Unknown graph – learn graph and parameters Complete data: optimization (search in space of graphs) Incomplete data: structural EM, mixture models C S B D X
11
Learning Parameters: complete data
ML-estimate: - decomposable! X C B Multinomial counts MAP-estimate (Bayesian statistics) Conjugate priors - Dirichlet Equivalent sample size (prior knowledge)
12
Learning Parameters: incomplete data
Non-decomposable marginal likelihood (hidden nodes) S X D C B <? > <1 1 ? 0 1> < ? ?> <? ? 0 ? 1> ……… Data S X D C B ……….. Expected counts Expectation Inference: P(S|X=0,D=1,C=0,B=1) Initial parameters Current model Update parameters (ML, MAP) Maximization EM-algorithm: iterate until convergence
13
Learning graph structure
Find NP-hard optimization Heuristic search: Greedy local search Best-first search Simulated annealing Complete data – local computations Incomplete data (score non-decomposable): Structural EM C S B Add S->B C S B C S B Delete S->B C S B Reverse S->B Constrained-based methods Data impose independence relations (constraints)
14
Scoring functions: Minimum Description Length (MDL)
Learning data compression Other: MDL = -BIC (Bayesian Information Criterion) Bayesian score (BDe) - asymptotically equivalent to MDL < > < ?? ??> < ?? > <?? ??> ………………. DL(Data|model) DL(Model)
15
Summary Bayesian Networks – graphical probabilistic models
Efficient representation and inference Expert knowledge + learning from data Learning: parameters (parameter estimation, EM) structure (optimization w/ score functions – e.g., MDL) Applications/systems: collaborative filtering (MSBN), fraud detection (AT&T), classification (AutoClass (NASA), TAN-BLT(SRI)) Future directions: causality, time, model evaluation criteria, approximate inference/learning, on-line learning, etc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.