Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center

Slides:



Advertisements
Similar presentations
A Tutorial on Learning with Bayesian Networks
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
© 1998, Nir Friedman, U.C. Berkeley, and Moises Goldszmidt, SRI International. All rights reserved. Learning Bayesian Networks from Data Nir Friedman U.C.
Learning: Parameter Estimation
© 1998, Nir Friedman, U.C. Berkeley, and Moises Goldszmidt, SRI International. All rights reserved. Learning I Excerpts from Tutorial at:
Graphical Models - Learning -
Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Bayesian network inference
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
1 Exact Inference Algorithms Bucket-elimination and more COMPSCI 179, Spring 2010 Set 8: Rina Dechter (Reading: chapter 14, Russell and Norvig.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
. Learning Bayesian networks Slides by Nir Friedman.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Haimonti Dutta, Department Of Computer And Information Science1 David HeckerMann A Tutorial On Learning With Bayesian Networks.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
SampleSearch: A scheme that searches for Consistent Samples Vibhav Gogate and Rina Dechter University of California, Irvine USA.
Visual Recognition Tutorial
© Daniel S. Weld 1 Naïve Bayes & Expectation Maximization CSE 573.
Learning Bayesian Networks
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
Learning Bayesian Networks (From David Heckerman’s tutorial)
Agenda Bayesian Network Introduction Inference of Bayesian Network
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
A Brief Introduction to Graphical Models
Importance Sampling ICS 276 Fall 2007 Rina Dechter.
1 Machine Learning in Performance Management Irina Rish IBM T.J. Watson Research Center January 24, 2001.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.
1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Slides for “Data Mining” by I. H. Witten and E. Frank.
CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.
CHAPTER 1: Introduction. 2 Why “Learn”? Machine learning is programming computers to optimize a performance criterion using example data or past experience.
Lecture 2: Statistical learning primer for biologists
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Crash Course on Machine Learning Part VI Several slides from Derek Hoiem, Ben Taskar, Christopher Bishop, Lise Getoor.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Spring 2014 Course PSYC Thinking Proseminar – Matt Jones Provides beginning Ph.D. students with a basic introduction to research on complex human.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Bayesian Belief Network AI Contents t Introduction t Bayesian Network t KDD Data.
ICS 280 Learning in Graphical Models
Irina Rish IBM T.J.Watson Research Center
CSCI 5822 Probabilistic Models of Human and Machine Learning
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Markov Networks.
CS498-EA Reasoning in AI Lecture #20
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
Robust Full Bayesian Learning for Neural Networks
Bayesian Learning Chapter
Learning Probabilistic Graphical Models Overview Learning Problems.
Markov Networks.
Learning Bayesian networks
Presentation transcript:

Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center

“Road map” Introduction and motivation: What are Bayesian networks and why use them? How to use them Probabilistic inference How to learn them Learning parameters Learning graph structure Summary

Bayesian Networks lung Cancer Smoking X-ray Bronchitis Dyspnoea P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?

What are they good for? Diagnosis: P(cause|symptom)=? Medicine Bio- informatics Computer troubleshooting Stock market Text Classification Speech recognition Prediction: P(symptom|cause)=? Classification: P(class|data) Decision-making (given a cost function) cause symptom cause

Bayesian Networks: Representation = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) lung Cancer Smoking X-ray Bronchitis Dyspnoea P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S) P(S, C, B, X, D) Conditional IndependenciesEfficient Representation CPD: C B D=0 D=

Example: Printer Troubleshooting

Bayesian networks: inference P(X|evidence)=? Complexity: “Moral” graph S X D B C P(s|d=1) P(s)P(c|s)P(b|s)P(x|c,s)P(d|c,b)= Variable Elimination P(s)P(b|s) P(c|s)P(x|c,s)P(d|c,b) CB DX Efficient inference: variable orderings, conditioning, approximations W*=4 ”induced width” (max clique size)

“Road map” Introduction and motivation: What are Bayesian networks and why use them? How to use them Probabilistic inference Why and how to learn them Learning parameters Learning graph structure Summary

Why learn Bayesian networks? Incremental learning: P(H) or SC Learning causal relationships: Efficient representation and inference Handling missing data: ………………. Combining domain expert knowledge with data

Learning Bayesian Networks Known graph C S B D X  Complete data: parameter estimation (ML, MAP)  Incomplete data: non-linear parametric optimization (gradient descent, EM) P(S) P(B|S) P(X|C,S) P(C|S) P(D|C,B) – learn parameters C S B D X C S B D X Unknown graph  Complete data: optimization (search in space of graphs)  Incomplete data: structural EM, mixture models – learn graph and parameters

Learning Parameters: complete data ML-estimate:- decomposable! MAP-estimate ( Bayesian statistics) Conjugate priors - Dirichlet X CB Multinomial counts Equivalent sample size (prior knowledge)

Learning Parameters: incomplete data EM-algorithm: iterate until convergence Initial parameters Current model Non-decomposable marginal likelihood (hidden nodes) S X D C B ……… Data S X D C B ……….. Expected counts Expectation Inference: P(S|X=0,D=1,C=0,B=1) Update parameters (ML, MAP) Maximization

Learning graph structure NP-hard optimization Heuristic search:  Greedy local search Find C S B C S B Add S->B C S B Delete S->B C S B Reverse S->B  Best-first search  Simulated annealing Complete data – local computations Incomplete data (score non-decomposable): Structural EM Constrained-based methods  Data impose independence relations (constrains)

Scoring functions: Minimum Description Length (MDL) Learning  data compression Other: MDL = -BIC (Bayesian Information Criterion) Bayesian score (BDe) - asymptotically equivalent to MDL DL(Model)DL(Data|model) ……………….

Summary Bayesian Networks – graphical probabilistic models Efficient representation and inference Expert knowledge + learning from data Learning: parameters (parameter estimation, EM) structure (optimization w/ score functions – e.g., MDL) Applications/systems: collaborative filtering (MSBN), fraud detection (AT&T), classification (AutoClass (NASA), TAN- BLT(SRI)) Future directions: causality, time, model evaluation criteria, approximate inference/learning, on-line learning, etc.