Irina Rish IBM T.J.Watson Research Center

Slides:



Advertisements
Similar presentations
A Tutorial on Learning with Bayesian Networks
Advertisements

Learning: Parameter Estimation
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Graphical Models - Learning -
Visual Recognition Tutorial
Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Bayesian network inference
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
1 Exact Inference Algorithms Bucket-elimination and more COMPSCI 179, Spring 2010 Set 8: Rina Dechter (Reading: chapter 14, Russell and Norvig.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
. Learning Bayesian networks Slides by Nir Friedman.
Lecture 5: Learning models using EM
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Haimonti Dutta, Department Of Computer And Information Science1 David HeckerMann A Tutorial On Learning With Bayesian Networks.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Visual Recognition Tutorial
1 gR2002 Peter Spirtes Carnegie Mellon University.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
Learning Bayesian Networks (From David Heckerman’s tutorial)
Agenda Bayesian Network Introduction Inference of Bayesian Network
A Brief Introduction to Graphical Models
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.
1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Slides for “Data Mining” by I. H. Witten and E. Frank.
Variational Bayesian Methods for Audio Indexing
Lecture 2: Statistical learning primer for biologists
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Crash Course on Machine Learning Part VI Several slides from Derek Hoiem, Ben Taskar, Christopher Bishop, Lise Getoor.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Spring 2014 Course PSYC Thinking Proseminar – Matt Jones Provides beginning Ph.D. students with a basic introduction to research on complex human.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Bayesian Belief Network AI Contents t Introduction t Bayesian Network t KDD Data.
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
ICS 280 Learning in Graphical Models
Learning Bayesian Network Models from Data
CSCI 5822 Probabilistic Models of Human and Machine Learning
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Markov Networks.
Bayesian Networks: Motivation
Probabilistic Models with Latent Variables
CS498-EA Reasoning in AI Lecture #20
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
Bayesian Learning Chapter
Parameter Learning 2 Structure Learning 1: The good
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Markov Networks.
Learning Bayesian networks
Presentation transcript:

Irina Rish IBM T.J.Watson Research Center rish@us.ibm.com Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center rish@us.ibm.com

“Road map” Introduction and motivation: How to use them What are Bayesian networks and why use them? How to use them Probabilistic inference How to learn them Learning parameters Learning graph structure Summary

Bayesian Networks P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ? X-ray Bronchitis Dyspnoea P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?

Bayesian Networks: Representation P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S) Smoking lung Cancer Bronchitis CPD: C B D=0 D=1 0 0 0.1 0.9 0 1 0.7 0.3 1 0 0.8 0.2 1 1 0.9 0.1 X-ray Dyspnoea P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) Conditional Independencies Efficient Representation

What are they good for? Diagnosis: P(cause|symptom)=? Prediction: P(symptom|cause)=? Classification: P(class|data) Decision-making (given a cost function) Medicine Bio-informatics Computer troubleshooting Stock market Text Classification Speech recognition

Example: Printer Troubleshooting

Bayesian networks: inference P(X|evidence)=? P(s|d=1) “Moral” graph S X D B C P(s)P(c|s)P(b|s)P(x|c,s)P(d|c,b)= C B P(c|s)P(x|c,s)P(d|c,b) X D P(s) P(b|s) Variable Elimination W*=4 ”induced width” (max clique size) Complexity: Efficient inference: variable orderings, conditioning, approximations

“Road map” Introduction and motivation: How to use them What are Bayesian networks and why use them? How to use them Probabilistic inference Why and how to learn them Learning parameters Learning graph structure Summary

Why learn Bayesian networks? <9.7 0.6 8 14 18> <0.2 1.3 5 ?? ??> <1.3 2.8 ?? 0 1 > <?? 5.6 0 10 ??> ………………. Combining domain expert knowledge with data Efficient representation and inference Incremental learning: P(H) or Handling missing data: <1.3 2.8 ?? 0 1 > S C Learning causal relationships:

Learning Bayesian Networks Known graph C S B D X P(S) P(B|S) P(X|C,S) P(C|S) P(D|C,B) – learn parameters Complete data: parameter estimation (ML, MAP) Incomplete data: non-linear parametric optimization (gradient descent, EM) C S B D X Unknown graph – learn graph and parameters Complete data: optimization (search in space of graphs) Incomplete data: structural EM, mixture models C S B D X

Learning Parameters: complete data ML-estimate: - decomposable! X C B Multinomial counts MAP-estimate (Bayesian statistics) Conjugate priors - Dirichlet Equivalent sample size (prior knowledge)

Learning Parameters: incomplete data Non-decomposable marginal likelihood (hidden nodes) S X D C B <? 0 1 0 1> <1 1 ? 0 1> <0 0 0 ? ?> <? ? 0 ? 1> ……… Data S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ……….. Expected counts Expectation Inference: P(S|X=0,D=1,C=0,B=1) Initial parameters Current model Update parameters (ML, MAP) Maximization EM-algorithm: iterate until convergence

Learning graph structure Find NP-hard optimization Heuristic search: Greedy local search Best-first search Simulated annealing Complete data – local computations Incomplete data (score non-decomposable): Structural EM C S B Add S->B C S B C S B Delete S->B C S B Reverse S->B Constrained-based methods Data impose independence relations (constraints)

Scoring functions: Minimum Description Length (MDL) Learning  data compression Other: MDL = -BIC (Bayesian Information Criterion) Bayesian score (BDe) - asymptotically equivalent to MDL <9.7 0.6 8 14 18> <0.2 1.3 5 ?? ??> <1.3 2.8 ?? 0 1 > <?? 5.6 0 10 ??> ………………. DL(Data|model) DL(Model)

Summary Bayesian Networks – graphical probabilistic models Efficient representation and inference Expert knowledge + learning from data Learning: parameters (parameter estimation, EM) structure (optimization w/ score functions – e.g., MDL) Applications/systems: collaborative filtering (MSBN), fraud detection (AT&T), classification (AutoClass (NASA), TAN-BLT(SRI)) Future directions: causality, time, model evaluation criteria, approximate inference/learning, on-line learning, etc.