CPS 570: Artificial Intelligence Bayesian networks

Slides:

Advertisements

Similar presentations

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

Advertisements

Lirong Xia Bayesian networks (2) Thursday, Feb 25, 2014.

Identifying Conditional Independencies in Bayes Nets Lecture 4.

Bayes Nets Rong Jin. Hidden Markov Model  Inferring from observations (o i ) to hidden variables (q i )  This is a general framework for representing.

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Announcements Homework 8 is out Final Contest (Optional)

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.

CPS 270: Artificial Intelligence Bayesian networks Instructor: Vincent Conitzer.

Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Artificial Intelligence CS 165A Thursday, November 29, 2007  Probabilistic Reasoning / Bayesian networks (Ch 14)

CPS 570: Artificial Intelligence Introduction to probability Instructor: Vincent Conitzer.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.

Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.

A Brief Introduction to Bayesian networks

CS 2750: Machine Learning Directed Graphical Models

Availability Availability - A(t)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Inference in Bayesian Networks

Read R&N Ch Next lecture: Read R&N

Approximate Inference

Bayesian Networks: A Tutorial

Instructor: Vincent Conitzer

Artificial Intelligence

Bell & Coins Example Coin1 Bell Coin2

CS 4/527: Artificial Intelligence

CAP 5636 – Advanced Artificial Intelligence

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11

Read R&N Ch Next lecture: Read R&N

CSCI 5822 Probabilistic Models of Human and Machine Learning

Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Propagation Algorithm in Bayesian Networks

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CAP 5636 – Advanced Artificial Intelligence

Professor Marie desJardins,

CSCI 5822 Probabilistic Models of Human and Machine Learning

CS 188: Artificial Intelligence

CPS 570: Artificial Intelligence Introduction to probability

Class #19 – Tuesday, November 3

CS 188: Artificial Intelligence

Directed Graphical Probabilistic Models: the sequel

CS 188: Artificial Intelligence Fall 2008

Class #16 – Tuesday, October 26

Inference III: Approximate Inference

Approximate Inference by Sampling

Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo

Artificial Intelligence Introduction to probability

Instructor: Vincent Conitzer

Lectures prepared by: Elchanan Mossel Yelena Shvets

Probabilistic Reasoning

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Read R&N Ch Next lecture: Read R&N

Instructor: Vincent Conitzer

Instructor: Vincent Conitzer

Bayesian networks (2) Lirong Xia. Bayesian networks (2) Lirong Xia.

Bayesian networks (2) Lirong Xia.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Presentation transcript:

CPS 570: Artificial Intelligence Bayesian networks Instructor: Vincent Conitzer

Specifying probability distributions Specifying a probability for every atomic event is impractical P(X1, …, Xn) would need to be specified for every combination x1, …, xn of values for X1, …, Xn If there are k possible values per variable… … we need to specify kn - 1 probabilities! We have already seen it can be easier to specify probability distributions by using (conditional) independence Bayesian networks allow us to specify any distribution, to specify such distributions concisely if there is (conditional) independence, in a natural way

A general approach to specifying probability distributions Say the variables are X1, …, Xn P(X1, …, Xn) = P(X1)P(X2|X1)P(X3|X1,X2)…P(Xn|X1, …, Xn-1) or: P(X1, …, Xn) = P(Xn)P(Xn-1|Xn)P(Xn-2|Xn,Xn-1)…P(X1|Xn, …, X2) Can specify every component For every combination of values for the variables on the right of |, specify the probability over the values for the variable on the left If every variable can take k values, P(Xi|X1, …, Xi-1) requires (k-1)ki-1 values Σi={1,..,n}(k-1)ki-1 = Σi={1,..,n}ki-ki-1 = kn - 1 Same as specifying probabilities of all atomic events – of course, because we can specify any distribution!

Graphically representing influences X1 X2 X3 X4

Conditional independence to the rescue! Problem: P(Xi|X1, …, Xi-1) requires us to specify too many values Suppose X1, …, Xi-1 partition into two subsets, S and T, so that Xi is conditionally independent from T given S P(Xi|X1, …, Xi-1) = P(Xi|S, T) = P(Xi|S) Requires only (k-1)k|S| values instead of (k- 1)ki-1 values

Graphically representing influences … if X4 is conditionally independent from X2 given X1 and X3 X1 X2 X3 X4

Rain and sprinklers example sprinklers is independent of raining, so no edge between them sprinklers (Y) raining (X) P(X=1) = .3 P(Y=1) = .4 grass wet (Z) Each node has a conditional probability table (CPT) P(Z=1 | X=0, Y=0) = .1 P(Z=1 | X=0, Y=1) = .8 P(Z=1 | X=1, Y=0) = .7 P(Z=1 | X=1, Y=1) = .9

Rigged casino example casino rigged die 1 die 2 P(CR=1) = 1/2 P(D2=1|CR=0) = 1/6 … P(D2=5|CR=0) = 1/6 P(D2=1|CR=1) = 3/12 P(D2=5|CR=1) = 1/6 P(D1=1|CR=0) = 1/6 … P(D1=5|CR=0) = 1/6 P(D1=1|CR=1) = 3/12 P(D1=5|CR=1) = 1/6 die 1 die 2 die 2 is conditionally independent of die 1 given casino rigged, so no edge between them

Rigged casino example with poorly chosen order die 1 die 2 die 1 and die 2 are not independent casino rigged both the dice have relevant information for whether the casino is rigged need 36 probabilities here!

More elaborate rain and sprinklers example sprinklers were on P(+r) = .2 rained P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet

Atomic events sprinklers were on rained neighbor walked dog grass wet P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet Can easily calculate the probability of any atomic event P(+r,+s,+n,+g,+d) = P(+r)P(+s)P(+n|+r)P(+g|+r,+s)P(+d|+n,+g) Can also sample atomic events easily

Inference Want to know: P(+r|+d) = P(+r,+d)/P(+d) sprinklers were on P(+r) = .2 rained P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet Want to know: P(+r|+d) = P(+r,+d)/P(+d) Let’s compute P(+r,+d)

Inference… sprinklers were on rained neighbor walked dog grass wet P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet P(+r,+d)= ΣsΣgΣn P(+r)P(s)P(n|+r)P(g|+r,s)P(+d|n,g) = P(+r)ΣsP(s)ΣgP(g|+r,s)Σn P(n|+r)P(+d|n,g)

Variable elimination sprinklers were on rained neighbor walked dog P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet From the factor Σn P(n|+r)P(+d|n,g) we sum out n to obtain a factor only depending on g [Σn P(n|+r)P(+d|n,+g)] = P(+n|+r)P(+d|+n,+g) + P(-n|+r)P(+d|-n,+g) = .3*.9+.7*.5 = .62 [Σn P(n|+r)P(+d|n,-g)] = P(+n|+r)P(+d|+n,-g) + P(-n|+r)P(+d|-n,-g) = .3*.4+.7*.3 = .33 Continuing to the left, g will be summed out next, etc. (continued on board)

Elimination order matters sprinklers were on P(+r) = .2 rained P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet P(+r,+d)= ΣnΣsΣg P(+r)P(s)P(n|+r)P(g|+r,s)P(+d|n,g) = P(+r)ΣnP(n|+r)ΣsP(s)Σg P(g|+r,s)P(+d|n,g) Last factor will depend on two variables in this case!

Don’t always need to sum over all variables sprinklers were on P(+r) = .2 rained P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet Can drop parts of the network that are irrelevant P(+r, +s) = P(+r)P(+s) = .6*.2 = .12 P(+n, +s) = ΣrP(r, +n, +s) = ΣrP(r)P(+n|r)P(+s) = P(+s)ΣrP(r)P(+n|r) = P(+s)(P(+r)P(+n|+r) + P(-r)P(+n|-r)) = .6*(.2*.3 + .8*.4) = .6*.38 = .228 P(+d | +n, +g, +s) = P(+d | +n, +g) = .9

Trees are easy X1 X2 X3 X4 X5 X6 X7 X8 Choose an extreme variable to eliminate first Its probability is “absorbed” by its neighbor …Σx4 P(x4|x1,x2)…Σx5 P(x5|x4) = …Σx4 P(x4|x1,x2)[Σx5 P(x5|x4)]…

Clustering algorithms sprinklers were on sprinklers were on rained rained neighbor walked dog, grass wet neighbor walked dog grass wet dog wet dog wet Merge nodes into “meganodes” until we have a tree Then, can apply special-purpose algorithm for trees Merged node has values {+n+g,+n-g,-n+g,-n-g} Much larger CPT

Logic gates in Bayes nets Not everything needs to be random… AND gate OR gate X1 X2 X1 X2 Y P(+y|+x1,+x2) = 1 P(+y|-x1,+x2) = 0 P(+y|+x1,-x2) = 0 P(+y|-x1,-x2) = 0 Y P(+y|+x1,+x2) = 1 P(+y|-x1,+x2) = 1 P(+y|+x1,-x2) = 1 P(+y|-x1,-x2) = 0

Modeling satisfiability as a Bayes Net (+X1 OR -X2) AND (-X1 OR -X2 OR -X3) P(+x1) = ½ P(+x2) = ½ P(+x3) = ½ X1 X2 X3 Y = -X1 OR -X2 P(+c1|+x1,+x2) = 1 P(+c1|-x1,+x2) = 0 P(+c1|+x1,-x2) = 1 P(+c1|-x1,-x2) = 1 C1 P(+c2|+y,+x3) = 1 P(+c2|-y,+x3) = 0 P(+c2|+y,-x3) = 1 P(+c2|-y,-x3) = 1 P(+y|+x1,+x2) = 0 P(+y|-x1,+x2) = 1 P(+y|+x1,-x2) = 1 P(+y|-x1,-x2) = 1 C2 P(+f|+c1,+c2) = 1 P(+f|-c1,+c2) = 0 P(+f|+c1,-c2) = 0 P(+f|-c1,-c2) = 0 formula P(+f) > 0 iff formula is satisfiable, so inference is NP-hard P(+f) = (#satisfying assignments/2n), so inference is #P-hard (because counting number of satisfying assignments is)

More about conditional independence A node is conditionally independent of its non-descendants, given its parents A node is conditionally independent of everything else in the graph, given its parents, children, and children’s parents (its Markov blanket) sprinklers were on rained N is independent of G given R N is not independent of G given R and D N is independent of S given R, G, D neighbor walked dog grass wet Note: can’t know for sure that two nodes are not independent: edges may be dummy edges dog wet

General criterion: d-separation Sets of variables X and Y are conditionally independent given variables in Z if all paths between X and Y are blocked; a path is blocked if one of the following holds: it contains U -> V -> W or U <- V <- W or U <- V -> W, and V is in Z it contains U -> V <- W, and neither V nor any of its descendants are in Z sprinklers were on rained N is independent of G given R N is not independent of S given R and D neighbor walked dog grass wet dog wet