Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center

Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center rish@us.ibm.com

“Road map” Introduction and motivation: What are Bayesian networks and why use them? How to use them Probabilistic inference How to learn them Learning parameters Learning graph structure Summary

Bayesian Networks lung Cancer Smoking X-ray Bronchitis Dyspnoea P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?

What are they good for? Diagnosis: P(cause|symptom)=? Medicine Bio- informatics Computer troubleshooting Stock market Text Classification Speech recognition Prediction: P(symptom|cause)=? Classification: P(class|data) Decision-making (given a cost function) cause symptom cause

Example: Printer Troubleshooting

“Road map” Introduction and motivation: What are Bayesian networks and why use them? How to use them Probabilistic inference Why and how to learn them Learning parameters Learning graph structure Summary

Why learn Bayesian networks? Incremental learning: P(H) or SC Learning causal relationships: Efficient representation and inference Handling missing data: ………………. Combining domain expert knowledge with data

Learning Bayesian Networks Known graph C S B D X  Complete data: parameter estimation (ML, MAP)  Incomplete data: non-linear parametric optimization (gradient descent, EM) P(S) P(B|S) P(X|C,S) P(C|S) P(D|C,B) – learn parameters C S B D X C S B D X Unknown graph  Complete data: optimization (search in space of graphs)  Incomplete data: structural EM, mixture models – learn graph and parameters

Learning Parameters: complete data ML-estimate:- decomposable! MAP-estimate ( Bayesian statistics) Conjugate priors - Dirichlet X CB Multinomial counts Equivalent sample size (prior knowledge)

Learning Parameters: incomplete data EM-algorithm: iterate until convergence Initial parameters Current model Non-decomposable marginal likelihood (hidden nodes) S X D C B ……… Data S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ……….. Expected counts Expectation Inference: P(S|X=0,D=1,C=0,B=1) Update parameters (ML, MAP) Maximization

Learning graph structure NP-hard optimization Heuristic search:  Greedy local search Find C S B C S B Add S->B C S B Delete S->B C S B Reverse S->B  Best-first search  Simulated annealing Complete data – local computations Incomplete data (score non-decomposable): Structural EM Constrained-based methods  Data impose independence relations (constrains)

Scoring functions: Minimum Description Length (MDL) Learning  data compression Other: MDL = -BIC (Bayesian Information Criterion) Bayesian score (BDe) - asymptotically equivalent to MDL DL(Model)DL(Data|model) ……………….

Summary Bayesian Networks – graphical probabilistic models Efficient representation and inference Expert knowledge + learning from data Learning: parameters (parameter estimation, EM) structure (optimization w/ score functions – e.g., MDL) Applications/systems: collaborative filtering (MSBN), fraud detection (AT&T), classification (AutoClass (NASA), TAN- BLT(SRI)) Future directions: causality, time, model evaluation criteria, approximate inference/learning, on-line learning, etc.

Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center

Similar presentations

Presentation on theme: "Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center

Similar presentations

Presentation on theme: "Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center"— Presentation transcript:

Similar presentations

About project

Feedback