CPS 570: Artificial Intelligence Bayesian networks

Slides:



Advertisements
Similar presentations
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Advertisements

Lirong Xia Bayesian networks (2) Thursday, Feb 25, 2014.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Bayes Nets Rong Jin. Hidden Markov Model  Inferring from observations (o i ) to hidden variables (q i )  This is a general framework for representing.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
CPS 270: Artificial Intelligence Bayesian networks Instructor: Vincent Conitzer.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Artificial Intelligence CS 165A Thursday, November 29, 2007  Probabilistic Reasoning / Bayesian networks (Ch 14)
CPS 570: Artificial Intelligence Introduction to probability Instructor: Vincent Conitzer.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.
A Brief Introduction to Bayesian networks
CS 2750: Machine Learning Directed Graphical Models
Availability Availability - A(t)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Inference in Bayesian Networks
Read R&N Ch Next lecture: Read R&N
Approximate Inference
Bayesian Networks: A Tutorial
Instructor: Vincent Conitzer
Artificial Intelligence
Bell & Coins Example Coin1 Bell Coin2
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Read R&N Ch Next lecture: Read R&N
CSCI 5822 Probabilistic Models of Human and Machine Learning
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Propagation Algorithm in Bayesian Networks
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CAP 5636 – Advanced Artificial Intelligence
Professor Marie desJardins,
CSCI 5822 Probabilistic Models of Human and Machine Learning
CS 188: Artificial Intelligence
CPS 570: Artificial Intelligence Introduction to probability
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
Directed Graphical Probabilistic Models: the sequel
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Inference III: Approximate Inference
Approximate Inference by Sampling
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Artificial Intelligence Introduction to probability
Instructor: Vincent Conitzer
Lectures prepared by: Elchanan Mossel Yelena Shvets
Probabilistic Reasoning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Instructor: Vincent Conitzer
Instructor: Vincent Conitzer
Bayesian networks (2) Lirong Xia. Bayesian networks (2) Lirong Xia.
Bayesian networks (2) Lirong Xia.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

CPS 570: Artificial Intelligence Bayesian networks Instructor: Vincent Conitzer

Specifying probability distributions Specifying a probability for every atomic event is impractical P(X1, …, Xn) would need to be specified for every combination x1, …, xn of values for X1, …, Xn If there are k possible values per variable… … we need to specify kn - 1 probabilities! We have already seen it can be easier to specify probability distributions by using (conditional) independence Bayesian networks allow us to specify any distribution, to specify such distributions concisely if there is (conditional) independence, in a natural way

A general approach to specifying probability distributions Say the variables are X1, …, Xn P(X1, …, Xn) = P(X1)P(X2|X1)P(X3|X1,X2)…P(Xn|X1, …, Xn-1) or: P(X1, …, Xn) = P(Xn)P(Xn-1|Xn)P(Xn-2|Xn,Xn-1)…P(X1|Xn, …, X2) Can specify every component For every combination of values for the variables on the right of |, specify the probability over the values for the variable on the left If every variable can take k values, P(Xi|X1, …, Xi-1) requires (k-1)ki-1 values Σi={1,..,n}(k-1)ki-1 = Σi={1,..,n}ki-ki-1 = kn - 1 Same as specifying probabilities of all atomic events – of course, because we can specify any distribution!

Graphically representing influences X1 X2 X3 X4

Conditional independence to the rescue! Problem: P(Xi|X1, …, Xi-1) requires us to specify too many values Suppose X1, …, Xi-1 partition into two subsets, S and T, so that Xi is conditionally independent from T given S P(Xi|X1, …, Xi-1) = P(Xi|S, T) = P(Xi|S) Requires only (k-1)k|S| values instead of (k- 1)ki-1 values

Graphically representing influences … if X4 is conditionally independent from X2 given X1 and X3 X1 X2 X3 X4

Rain and sprinklers example sprinklers is independent of raining, so no edge between them sprinklers (Y) raining (X) P(X=1) = .3 P(Y=1) = .4 grass wet (Z) Each node has a conditional probability table (CPT) P(Z=1 | X=0, Y=0) = .1 P(Z=1 | X=0, Y=1) = .8 P(Z=1 | X=1, Y=0) = .7 P(Z=1 | X=1, Y=1) = .9

Rigged casino example casino rigged die 1 die 2 P(CR=1) = 1/2 P(D2=1|CR=0) = 1/6 … P(D2=5|CR=0) = 1/6 P(D2=1|CR=1) = 3/12 P(D2=5|CR=1) = 1/6 P(D1=1|CR=0) = 1/6 … P(D1=5|CR=0) = 1/6 P(D1=1|CR=1) = 3/12 P(D1=5|CR=1) = 1/6 die 1 die 2 die 2 is conditionally independent of die 1 given casino rigged, so no edge between them

Rigged casino example with poorly chosen order die 1 die 2 die 1 and die 2 are not independent casino rigged both the dice have relevant information for whether the casino is rigged need 36 probabilities here!

More elaborate rain and sprinklers example sprinklers were on P(+r) = .2 rained P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet

Atomic events sprinklers were on rained neighbor walked dog grass wet P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet Can easily calculate the probability of any atomic event P(+r,+s,+n,+g,+d) = P(+r)P(+s)P(+n|+r)P(+g|+r,+s)P(+d|+n,+g) Can also sample atomic events easily

Inference Want to know: P(+r|+d) = P(+r,+d)/P(+d) sprinklers were on P(+r) = .2 rained P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet Want to know: P(+r|+d) = P(+r,+d)/P(+d) Let’s compute P(+r,+d)

Inference… sprinklers were on rained neighbor walked dog grass wet P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet P(+r,+d)= ΣsΣgΣn P(+r)P(s)P(n|+r)P(g|+r,s)P(+d|n,g) = P(+r)ΣsP(s)ΣgP(g|+r,s)Σn P(n|+r)P(+d|n,g)

Variable elimination sprinklers were on rained neighbor walked dog P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet From the factor Σn P(n|+r)P(+d|n,g) we sum out n to obtain a factor only depending on g [Σn P(n|+r)P(+d|n,+g)] = P(+n|+r)P(+d|+n,+g) + P(-n|+r)P(+d|-n,+g) = .3*.9+.7*.5 = .62 [Σn P(n|+r)P(+d|n,-g)] = P(+n|+r)P(+d|+n,-g) + P(-n|+r)P(+d|-n,-g) = .3*.4+.7*.3 = .33 Continuing to the left, g will be summed out next, etc. (continued on board)

Elimination order matters sprinklers were on P(+r) = .2 rained P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet P(+r,+d)= ΣnΣsΣg P(+r)P(s)P(n|+r)P(g|+r,s)P(+d|n,g) = P(+r)ΣnP(n|+r)ΣsP(s)Σg P(g|+r,s)P(+d|n,g) Last factor will depend on two variables in this case!

Don’t always need to sum over all variables sprinklers were on P(+r) = .2 rained P(+s) = .6 neighbor walked dog P(+g|+r,+s) = .9 P(+g|+r,-s) = .7 P(+g|-r,+s) = .8 P(+g|-r,-s) = .2 grass wet P(+n|+r) = .3 P(+n|-r) = .4 P(+d|+n,+g) = .9 P(+d|+n,-g) = .4 P(+d|-n,+g) = .5 P(+d|-n,-g) = .3 dog wet Can drop parts of the network that are irrelevant P(+r, +s) = P(+r)P(+s) = .6*.2 = .12 P(+n, +s) = ΣrP(r, +n, +s) = ΣrP(r)P(+n|r)P(+s) = P(+s)ΣrP(r)P(+n|r) = P(+s)(P(+r)P(+n|+r) + P(-r)P(+n|-r)) = .6*(.2*.3 + .8*.4) = .6*.38 = .228 P(+d | +n, +g, +s) = P(+d | +n, +g) = .9

Trees are easy X1 X2 X3 X4 X5 X6 X7 X8 Choose an extreme variable to eliminate first Its probability is “absorbed” by its neighbor …Σx4 P(x4|x1,x2)…Σx5 P(x5|x4) = …Σx4 P(x4|x1,x2)[Σx5 P(x5|x4)]…

Clustering algorithms sprinklers were on sprinklers were on rained rained neighbor walked dog, grass wet neighbor walked dog grass wet dog wet dog wet Merge nodes into “meganodes” until we have a tree Then, can apply special-purpose algorithm for trees Merged node has values {+n+g,+n-g,-n+g,-n-g} Much larger CPT

Logic gates in Bayes nets Not everything needs to be random… AND gate OR gate X1 X2 X1 X2 Y P(+y|+x1,+x2) = 1 P(+y|-x1,+x2) = 0 P(+y|+x1,-x2) = 0 P(+y|-x1,-x2) = 0 Y P(+y|+x1,+x2) = 1 P(+y|-x1,+x2) = 1 P(+y|+x1,-x2) = 1 P(+y|-x1,-x2) = 0

Modeling satisfiability as a Bayes Net (+X1 OR -X2) AND (-X1 OR -X2 OR -X3) P(+x1) = ½ P(+x2) = ½ P(+x3) = ½ X1 X2 X3 Y = -X1 OR -X2 P(+c1|+x1,+x2) = 1 P(+c1|-x1,+x2) = 0 P(+c1|+x1,-x2) = 1 P(+c1|-x1,-x2) = 1 C1 P(+c2|+y,+x3) = 1 P(+c2|-y,+x3) = 0 P(+c2|+y,-x3) = 1 P(+c2|-y,-x3) = 1 P(+y|+x1,+x2) = 0 P(+y|-x1,+x2) = 1 P(+y|+x1,-x2) = 1 P(+y|-x1,-x2) = 1 C2 P(+f|+c1,+c2) = 1 P(+f|-c1,+c2) = 0 P(+f|+c1,-c2) = 0 P(+f|-c1,-c2) = 0 formula P(+f) > 0 iff formula is satisfiable, so inference is NP-hard P(+f) = (#satisfying assignments/2n), so inference is #P-hard (because counting number of satisfying assignments is)

More about conditional independence A node is conditionally independent of its non-descendants, given its parents A node is conditionally independent of everything else in the graph, given its parents, children, and children’s parents (its Markov blanket) sprinklers were on rained N is independent of G given R N is not independent of G given R and D N is independent of S given R, G, D neighbor walked dog grass wet Note: can’t know for sure that two nodes are not independent: edges may be dummy edges dog wet

General criterion: d-separation Sets of variables X and Y are conditionally independent given variables in Z if all paths between X and Y are blocked; a path is blocked if one of the following holds: it contains U -> V -> W or U <- V <- W or U <- V -> W, and V is in Z it contains U -> V <- W, and neither V nor any of its descendants are in Z sprinklers were on rained N is independent of G given R N is not independent of S given R and D neighbor walked dog grass wet dog wet