Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Slides:



Advertisements
Similar presentations
Bayesian network for gene regulatory network construction
Advertisements

A Tutorial on Learning with Bayesian Networks
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Exact Inference in Bayes Nets
. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Visual Recognition Tutorial
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Recent Development on Elimination Ordering Group 1.
Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Learning Bayesian Networks
Today Logistic Regression Decision Trees Redux Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Learning Bayesian Networks (From David Heckerman’s tutorial)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
A Brief Introduction to Graphical Models
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Markov Random Fields Probabilistic Models for Images
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Selected Topics in Graphical Models Petr Šimeček.
Randomized Algorithms for Bayesian Hierarchical Clustering
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Lecture 2: Statistical learning primer for biologists
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Inference in Bayesian Networks
Irina Rish IBM T.J.Watson Research Center
Markov Properties of Directed Acyclic Graphs
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Networks.
Still More Uncertainty
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Bayesian Statistics and Belief Networks
An Algorithm for Bayesian Network Construction from Data
Class #19 – Tuesday, November 3
Expectation-Maximization & Belief Propagation
Class #16 – Tuesday, October 26
Markov Networks.
Presentation transcript:

Learning With Bayesian Networks Markus Kalisch ETH Zürich

Inference in BNs - Review Exact Inference: –P(b|j,m) = c Sum e Sum a P(b)P(e)P(a|b,e)P(j|a)P(m|a) –Deal with sums in a clever way: Variable elimination, message passing –Singly connected: linear in space/time Multiply connected: exponential in space/time (worst case) Approximate Inference: –Direct sampling –Likelihood weighting –MCMC methods P(Burglary|JohnCalls=TRUE, MaryCalls=TRUE) 2 Markus Kalisch, ETH Zürich

Learning BNs - Overview Brief summary of Heckerman Tutorial Recent provably correct Search Methods: –Greedy Equivalence Search (GES) –PC-algorithm Discussion 3

Abstract and Introduction Easy handling of missing data Easy modeling of causal relationships Easy combination of prior information and data Easy to avoid overfitting Graphical Modeling offers: 4 Markus Kalisch, ETH Zürich

Bayesian Approach Degree of belief Rules of probability are a good tool to deal with beliefs Probability assessment: Precision & Accuracy Running Example: Multinomial Sampling with Dirichlet Prior 5 Markus Kalisch, ETH Zürich

Bayesian Networks (BN) Define a BN by a network structure local probability distributions To learn a BN, we have to choose the variables of the model choose the structure of the model assess local probability distributions 6 Markus Kalisch, ETH Zürich

Inference Book by Russell / Norvig: –exact inference –variable elimination –approximate methods Talk by Prof. Loeliger: –factor graphs / belief propagation / message passing Probabilistic inference in BN is NP-hard: Approximations or special-case-solutions are needed We have seen up to now: 7 Markus Kalisch, ETH Zürich

Learning Parameters (structure given) Prof. Loeliger: Trainable parameters can be added to the factor graph and therefore be infered Complete Data –reduce to one-variable case Incomplete Data (missing at random) –formula for posterior grows exponential in number of incomplete cases –Gibbs-Sampling –Gaussian Approximation; get MAP by gradient based optimization or EM-algorithm 8 Markus Kalisch, ETH Zürich

Learning Parameters AND structure Can learn structure only up to likelihood equivalence Averaging over all structures is infeasible: Space of DAGs and of equivalence classes grows super-exponentially in the number of nodes. 9 Markus Kalisch, ETH Zürich

Model Selection Don't average over all structures, but select a good one (Model Selection) A good scoring criterion is the log posterior probability: log(P(D,S)) = log(P(S)) + log(P(D|S)) Priors: Dirichlet for Parameters / Uniform for structure Complete cases: Compute this exactly Incomplete cases: Gaussian Approximation and further simplification lead to BIC log(P(D|S)) = log(P(D|ML-Par,S)) – d/2 * log(N) This is usually used in practice. 10 Markus Kalisch, ETH Zürich

Search Methods Learning BNs on discrete nodes (3 or more parents) is NP-hard (Heckerman 2004) There are provably (asymptoticly) correct search methods: –Search and Score methods: Greedy Equivalence Search (GES; Chickering 2002) –Constrained based methods: PC-algorithm (Spirtes et. al. 2000) 11 Markus Kalisch, ETH Zürich

GES – The Idea Restrict the search space to equivalence classes Score: BIC “separable search criterion” => fast Greedy Search for “best” equivalence class In theory (asymptotic): Correct equivalence class is found 12 Markus Kalisch, ETH Zürich

GES – The Algorithm Initialize with equivalence class E containing the empty DAG Stage 1: Repeatedly replace E with the member of E + (E) that has the highest score, until no such replacement increases the score Stage 2: Repeatedly replace E with the member of E - (E) that has the highest score, until no such replacement increases the score GES is a two-stage greedy algorithm 13 Markus Kalisch, ETH Zürich

PC – The idea Start: Complete, undirected graph Recursive conditional independence tests for deleting edges Afterwards: Add arrowheads In theory (asymptotic): Correct equivalence class is found 14 Markus Kalisch, ETH Zürich

PC – The Algorithm Form complete, undirected graph G l = -1 repeat l=l+1 repeat select ordered pair of adjacent nodes A,B in G select neighborhood N of A with size l (if possible) delete edge A,B in G if A,B are cond. indep. given N until all ordered pairs have been tested until all neighborhoods are of size smaller than l Add arrowheads by applying a couple of simple rules 15

Markus Kalisch, ETH Zürich Example Conditional Independencies: l=0: none l=1: PC-algorithm: correct skeleton A BC D A BC D A BC D 16

Markus Kalisch, ETH Zürich Sample Version of PC-algorithm Real World: Cond. Indep. Relations not known Instead: Use statistical test for Conditional independence Theory: Using statistical test instead of true conditional independence relations is often ok 17

Comparing PC and GES The PC-algorithm finds less edges finds true edges with higher reliability is fast for sparse graphs (e.g. p=100,n=1000,E[N]=3: T = 13 sec) For p = 10, n = 50, E(N) = 0.9, 50 replicates: 18 Markus Kalisch, ETH Zürich

Learning Causal Relationships Causal Markov Condition: Let C be a causal graph for X then C is also a Bayesian-network structure for the pdf of X Use this to infer causal relationships 19 Markus Kalisch, ETH Zürich

Conclusion Using a BN: Inference (NP-Hard) –exact inference, variable elimination, message passing (factor graphs) –approximate methods Learn BN: –Parameters: Exact, Factor Graphs Monte Carlo, Gauss –Structure: GES, PC-algorithm; NP-Hard 20