Download presentation
Presentation is loading. Please wait.
Published byColin George Modified over 9 years ago
1
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center rish@us.ibm.com
2
“Road map” Introduction and motivation: What are Bayesian networks and why use them? How to use them Probabilistic inference How to learn them Learning parameters Learning graph structure Summary
3
Bayesian Networks lung Cancer Smoking X-ray Bronchitis Dyspnoea P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?
4
What are they good for? Diagnosis: P(cause|symptom)=? Medicine Bio- informatics Computer troubleshooting Stock market Text Classification Speech recognition Prediction: P(symptom|cause)=? Classification: P(class|data) Decision-making (given a cost function) cause symptom cause
5
Bayesian Networks: Representation = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) lung Cancer Smoking X-ray Bronchitis Dyspnoea P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S) P(S, C, B, X, D) Conditional IndependenciesEfficient Representation CPD: C B D=0 D=1 0 0 0.1 0.9 0 1 0.7 0.3 1 0 0.8 0.2 1 1 0.9 0.1
6
Example: Printer Troubleshooting
7
Bayesian networks: inference P(X|evidence)=? Complexity: “Moral” graph S X D B C P(s|d=1) P(s)P(c|s)P(b|s)P(x|c,s)P(d|c,b)= Variable Elimination P(s)P(b|s) P(c|s)P(x|c,s)P(d|c,b) CB DX Efficient inference: variable orderings, conditioning, approximations W*=4 ”induced width” (max clique size)
8
“Road map” Introduction and motivation: What are Bayesian networks and why use them? How to use them Probabilistic inference Why and how to learn them Learning parameters Learning graph structure Summary
9
Why learn Bayesian networks? Incremental learning: P(H) or SC Learning causal relationships: Efficient representation and inference Handling missing data: ………………. Combining domain expert knowledge with data
10
Learning Bayesian Networks Known graph C S B D X Complete data: parameter estimation (ML, MAP) Incomplete data: non-linear parametric optimization (gradient descent, EM) P(S) P(B|S) P(X|C,S) P(C|S) P(D|C,B) – learn parameters C S B D X C S B D X Unknown graph Complete data: optimization (search in space of graphs) Incomplete data: structural EM, mixture models – learn graph and parameters
11
Learning Parameters: complete data ML-estimate:- decomposable! MAP-estimate ( Bayesian statistics) Conjugate priors - Dirichlet X CB Multinomial counts Equivalent sample size (prior knowledge)
12
Learning Parameters: incomplete data EM-algorithm: iterate until convergence Initial parameters Current model Non-decomposable marginal likelihood (hidden nodes) S X D C B ……… Data S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ……….. Expected counts Expectation Inference: P(S|X=0,D=1,C=0,B=1) Update parameters (ML, MAP) Maximization
13
Learning graph structure NP-hard optimization Heuristic search: Greedy local search Find C S B C S B Add S->B C S B Delete S->B C S B Reverse S->B Best-first search Simulated annealing Complete data – local computations Incomplete data (score non-decomposable): Structural EM Constrained-based methods Data impose independence relations (constrains)
14
Scoring functions: Minimum Description Length (MDL) Learning data compression Other: MDL = -BIC (Bayesian Information Criterion) Bayesian score (BDe) - asymptotically equivalent to MDL DL(Model)DL(Data|model) ……………….
15
Summary Bayesian Networks – graphical probabilistic models Efficient representation and inference Expert knowledge + learning from data Learning: parameters (parameter estimation, EM) structure (optimization w/ score functions – e.g., MDL) Applications/systems: collaborative filtering (MSBN), fraud detection (AT&T), classification (AutoClass (NASA), TAN- BLT(SRI)) Future directions: causality, time, model evaluation criteria, approximate inference/learning, on-line learning, etc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.