Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center

Similar presentations


Presentation on theme: "Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center"— Presentation transcript:

1 Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center rish@us.ibm.com

2 “Road map” Introduction and motivation: What are Bayesian networks and why use them? How to use them Probabilistic inference How to learn them Learning parameters Learning graph structure Summary

3 Bayesian Networks lung Cancer Smoking X-ray Bronchitis Dyspnoea P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?

4 What are they good for? Diagnosis: P(cause|symptom)=? Medicine Bio- informatics Computer troubleshooting Stock market Text Classification Speech recognition Prediction: P(symptom|cause)=? Classification: P(class|data) Decision-making (given a cost function) cause symptom cause

5 Bayesian Networks: Representation = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) lung Cancer Smoking X-ray Bronchitis Dyspnoea P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S) P(S, C, B, X, D) Conditional IndependenciesEfficient Representation CPD: C B D=0 D=1 0 0 0.1 0.9 0 1 0.7 0.3 1 0 0.8 0.2 1 1 0.9 0.1

6 Example: Printer Troubleshooting

7 Bayesian networks: inference P(X|evidence)=? Complexity: “Moral” graph S X D B C P(s|d=1) P(s)P(c|s)P(b|s)P(x|c,s)P(d|c,b)= Variable Elimination P(s)P(b|s) P(c|s)P(x|c,s)P(d|c,b) CB DX Efficient inference: variable orderings, conditioning, approximations W*=4 ”induced width” (max clique size)

8 “Road map” Introduction and motivation: What are Bayesian networks and why use them? How to use them Probabilistic inference Why and how to learn them Learning parameters Learning graph structure Summary

9 Why learn Bayesian networks? Incremental learning: P(H) or SC Learning causal relationships: Efficient representation and inference Handling missing data: ………………. Combining domain expert knowledge with data

10 Learning Bayesian Networks Known graph C S B D X  Complete data: parameter estimation (ML, MAP)  Incomplete data: non-linear parametric optimization (gradient descent, EM) P(S) P(B|S) P(X|C,S) P(C|S) P(D|C,B) – learn parameters C S B D X C S B D X Unknown graph  Complete data: optimization (search in space of graphs)  Incomplete data: structural EM, mixture models – learn graph and parameters

11 Learning Parameters: complete data ML-estimate:- decomposable! MAP-estimate ( Bayesian statistics) Conjugate priors - Dirichlet X CB Multinomial counts Equivalent sample size (prior knowledge)

12 Learning Parameters: incomplete data EM-algorithm: iterate until convergence Initial parameters Current model Non-decomposable marginal likelihood (hidden nodes) S X D C B ……… Data S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ……….. Expected counts Expectation Inference: P(S|X=0,D=1,C=0,B=1) Update parameters (ML, MAP) Maximization

13 Learning graph structure NP-hard optimization Heuristic search:  Greedy local search Find C S B C S B Add S->B C S B Delete S->B C S B Reverse S->B  Best-first search  Simulated annealing Complete data – local computations Incomplete data (score non-decomposable): Structural EM Constrained-based methods  Data impose independence relations (constrains)

14 Scoring functions: Minimum Description Length (MDL) Learning  data compression Other: MDL = -BIC (Bayesian Information Criterion) Bayesian score (BDe) - asymptotically equivalent to MDL DL(Model)DL(Data|model) ……………….

15 Summary Bayesian Networks – graphical probabilistic models Efficient representation and inference Expert knowledge + learning from data Learning: parameters (parameter estimation, EM) structure (optimization w/ score functions – e.g., MDL) Applications/systems: collaborative filtering (MSBN), fraud detection (AT&T), classification (AutoClass (NASA), TAN- BLT(SRI)) Future directions: causality, time, model evaluation criteria, approximate inference/learning, on-line learning, etc.


Download ppt "Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center"

Similar presentations


Ads by Google