Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.

Slides:

Advertisements

Similar presentations

Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.

Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.

Identifying Conditional Independencies in Bayes Nets Lecture 4.

1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.

Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.

CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.

3/19. Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables.

Review: Bayesian learning and inference

(c) , C. Boutilier, E. Hansen, D. Weld

Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.

Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.

Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.

Bayesian networks Chapter 14 Section 1 – 2.

Bayesian Belief Networks

Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

© Daniel S. Weld 1 Naïve Bayes & Expectation Maximization CSE 573.

© Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)

Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.

Probabilistic Reasoning

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Bayesian networks Chapter 14. Outline Syntax Semantics.

Bayesian Networks Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.

An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati

Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.

Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.

Introduction to Bayesian Networks

An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? Example.

CSE 473: Artificial Intelligence Spring 2012 Bayesian Networks - Inference Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell,

CSE 473 Artificial Intelligence Review. Logistics Project reports and homework 5 due Monday 5pm Monday 5pm project demo in 002 Exam next Wed 8:30—10:30.

Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.

CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.

Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.

Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.

Inference Algorithms for Bayes Networks

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

Bayesian Networks Statistical Learning CSE 573 Based on lecture notes from David Page and Dan Weld.

CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.

Bayesian networks Chapter 14 Slide Set 2. Constructing Bayesian networks 1. Choose an ordering of variables X 1, …,X n 2. For i = 1 to n –add X i to the.

Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.

Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)

Another look at Bayesian inference

Bayesian networks Chapter 14 Section 1 – 2.

Presented By S.Yamuna AP/CSE

Qian Liu CSE spring University of Pennsylvania

Inference in Bayesian Networks

Read R&N Ch Next lecture: Read R&N

Learning Bayesian Network Models from Data

Still More Uncertainty

Read R&N Ch Next lecture: Read R&N

Structure and Semantics of BN

Class #16 – Tuesday, October 26

Structure and Semantics of BN

Bayesian networks Chapter 14 Section 1 – 2.

Probabilistic Reasoning

Read R&N Ch Next lecture: Read R&N

Bayesian Networks CSE 573.

Probabilistic Reasoning

Representing Uncertainty

Presentation transcript:

Bayesian Networks CSE 473

© D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update

© D. Weld and D. Fox 3 An Example Bayes Net Earthquake BurglaryAlarm Nbr2CallsNbr1Calls Pr(B=t) Pr(B=f) Pr(A|E,B) e,b 0.9 (0.1) e,b 0.2 (0.8) e,b 0.85 (0.15) e,b 0.01 (0.99)

© D. Weld and D. Fox 4 Earthquake Example (cont’d) If we know Alarm, no other evidence influences our degree of belief in Nbr1Calls P(N1|N2,A,E,B) = P(N1|A) also: P(N2|N1,A,E,B) = P(N2|A) and P(E|B) = P(E) By the chain rule we have P(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)· P(A|E,B) ·P(E|B) ·P(B) = P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B) Full joint requires only 10 parameters (cf. 32) Earthquake Burglary Alarm Nbr2CallsNbr1Calls

© D. Weld and D. Fox 5 BNs: Qualitative Structure Graphical structure of BN reflects conditional independence among variables Each variable X is a node in the DAG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence “Markov Blanket”

© D. Weld and D. Fox 6 Given Parents, X is Independent of Non-Descendants

© D. Weld and D. Fox 7 For Example EarthquakeBurglary Alarm Nbr2CallsNbr1Calls

© D. Weld and D. Fox 8 For Example EarthquakeBurglary Alarm Nbr2CallsNbr1Calls

© D. Weld and D. Fox 9 For Example EarthquakeBurglary Alarm Nbr2CallsNbr1Calls Radio

© D. Weld and D. Fox 10 For Example EarthquakeBurglary Alarm Nbr2CallsNbr1Calls Radio

© D. Weld and D. Fox 11 For Example EarthquakeBurglary Alarm Nbr2CallsNbr1Calls Radio

© D. Weld and D. Fox 12 Given Markov Blanket, X is Independent of All Other Nodes MB(X) = Par(X)  Childs(X)  Par(Childs(X))

© D. Weld and D. Fox 13 For Example EarthquakeBurglary Alarm Nbr2Calls Radio Nbr1Calls

© D. Weld and D. Fox 14 For Example EarthquakeBurglary Alarm Nbr2Calls Radio Nbr1Calls

© D. Weld and D. Fox 15 Conditional Probability Tables Earthquake BurglaryAlarm Nbr2CallsNbr1Calls Pr(B=t) Pr(B=f) Pr(A|E,B) e,b 0.9 (0.1) e,b 0.2 (0.8) e,b 0.85 (0.15) e,b 0.01 (0.99)

© D. Weld and D. Fox 16 Conditional Probability Tables For complete spec. of joint dist., quantify BN For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)| If X 1, X 2,... X n is any topological sort of the network, then we are assured: P(X n,X n-1,...X 1 ) = P(X n | X n-1,...X 1 ) · P(X n-1 | X n-2,… X 1 ) … P(X 2 | X 1 ) · P(X 1 ) = P(X n | Par(X n )) · P(X n-1 | Par(X n-1 )) … P(X 1 )

© D. Weld and D. Fox 17 Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? Bayes Net Construction Example

© D. Weld and D. Fox 18 Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? Example

© D. Weld and D. Fox 19 Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)? Example

© D. Weld and D. Fox 20 Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A,J, M) = P(E | A)? P(E | B, A, J, M) = P(E | A, B)? Example

© D. Weld and D. Fox 21 Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A,J, M) = P(E | A)? No P(E | B, A, J, M) = P(E | A, B)? Yes Example

© D. Weld and D. Fox 22 Example contd. Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: = 13 numbers needed

© D. Weld and D. Fox 23 Inference in BNs The graphical independence representation yields efficient inference schemes We generally want to compute Pr(X), or Pr(X|E) where E is (conjunctive) evidence Computations organized by network topology One simple algorithm: variable elimination (VE)

© D. Weld and D. Fox 24 P(B | J=true, M=true) EarthquakeBurglary Alarm MaryJohn P(b|j,m) =   P(b,j,m,e,a) e,a

© D. Weld and D. Fox 25 P(B | J=true, M=true) EarthquakeBurglary Alarm MaryJohn P(b|j,m) =  P(b)  P(e)  P(a|b,e)P(j|a)P(m,a) e a

© D. Weld and D. Fox 26 Structure of Computation Dynamic Programming

© D. Weld and D. Fox 27 Variable Elimination A factor is a function from some set of variables into a specific value: e.g., f(E,A,N1) CPTs are factors, e.g., P(A|E,B) function of A,E,B VE works by eliminating all variables in turn until there is a factor with only query variable To eliminate a variable: join all factors containing that variable (like DB) sum out the influence of the variable on new factor exploits product form of joint distribution

© D. Weld and D. Fox 28 Example of VE: P(N1) Earthqk Burgl Alarm N2N1 P(N1) =  N2,A,B,E P(N1,N2,A,B,E) =  N2,A,B,E P(N1|A)P(N2|A) P(B)P(A|B,E)P(E) =  A P(N1|A)  N2 P(N2|A)  B P(B)  E P(A|B,E)P(E) =  A P(N1|A)  N2 P(N2|A)  B P(B) f1(A,B) =  A P(N1|A)  N2 P(N2|A) f2(A) =  A P(N1|A) f3(A) = f4(N1)

© D. Weld and D. Fox 29 Notes on VE Each operation is a simple multiplication of factors and summing out a variable Complexity determined by size of largest factor in our example, 3 vars (not 5) linear in number of vars, exponential in largest factor elimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) Practically, inference is much more tractable using structure of this sort