Probabilistic Reasoning (2)

Slides:



Advertisements
Similar presentations
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Advertisements

Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference CS 460, Belief Networks1 Mundhenk and Itti Based.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Bayesian network inference
Inference in Bayesian Nets
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayesian networks Chapter 14. Outline Syntax Semantics.
INC 551 Artificial Intelligence Lecture 8 Models of Uncertainty.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Introduction to Bayesian Networks
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Introduction on Graphic Models
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
CS 541: Artificial Intelligence
CS 2750: Machine Learning Directed Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Web-Mining Agents Data Mining
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Artificial Intelligence
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Read R&N Ch Next lecture: Read R&N
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Bayesian Networks: Structure and Semantics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Probabilistic Reasoning (2) Daehwan Kim, Ravshan Khamidov, Sehyong Kim

Contents Basics of Bayesian Networks (BNs) Construction Inference Single and Multi connected BNs Inference in Multi-connected BNs Clustering algorithms Cutset conditioning Approximate Inference In Bayesian Network Direct Sampling Markov Chain Monte Carlo Example applications of BNs

Representing Knowledge in an Uncertainty Joint probability distribution Answer any questions about the domain Become intractably large as the number of variables grows Specifying probabilities for atomic events is unnatural and difficult

Representing Knowledge in an Uncertainty A Bayesian network Provide concise way to present conditional independence relationships in the domain Specifies full joint distribution but often exponentially smaller than the full joint distribution

Basics of Bayesian Network Definition Topology of the network + CPT A set of random variables makes up the nodes of the network A set of directed links or arrows connects pairs of nodes. Example: X->Y  means X has a direct influence on Y. Each node has a conditional probability table (CPT) that quantifies the effects that the parents have on the node. The parents of a node are all those nodes that have arrows pointing to it. The graph has no directed cycles (hence is a directed, acyclic graph, or DAG).

Basics of Bayesian Network Construction General procedure for incremental network construction: choose the set of relevant variables Xi that describe the domain choose the ordering for the variables while there are variables left: pick a variable Xi and add a node to the network for it set Parent(Xi) by testing its conditional independence in the net define the conditional probability table for Xi

Basics of Bayesian Network Topology of the network Suppose we choose the ordering B,E,A,J,M P(B|E)= P(E)? Yes P(A|B)= P(A)? P(A|E)= P(A)? No P(J|A,B,E)= P(J|A)? Yes P(J |A)= P(J)? No P(M|A)= P(M)? No P(M|A,J)= P(M|A)? Yes Burglary Earthquake Alarm MaryCalls JohnCalls

Basics of Bayesian Network Conditional Probability Table (CPT) Once we get the topology of the network, conditional probability table (CPT) must be specified for each node. Example of CPT for the variable WetGrass: C S R P(W=F) P(W=T) F F T F F T T T 1.0 0.0 0.1 0.9 0.01 0.99 S R W

Basics of Bayesian Network Conditional Probability Table (CPT) Each row in the table contains the conditional probability of each node value for a conditioning case. Each row must sum to 1, because the entries represent an exhaustive set of cases for the variable. A conditioning case is a possible combination of values for the parent nodes.

Basics of Bayesian Network Example Representing Knowledge Example P(C=F) P(C=T) 0.5 0.5 Cloudy C P(S=F) P(S=T) F T 0.5 0.5 0.9 0.1 C P(R=F) P(R=T) F T 0.8 0.2 0.2 0.8 Spinkler Rain WetGrass S R P(W=F) P(W=T) F F T F F T T T 1.0 0.0 0.1 0.9 0.01 0.99

Basics of Bayesian Network Example

Basics of Bayesian Network Example In the example, notice that the two causes "compete" to "explain" the observed data. Hence S and R become conditionally dependent given that their common child, W, is observed, even though they are marginally independent. For example, suppose the grass is wet, but that we also know that it is raining. Then the posterior probability that the sprinkler is on goes down: Pr(S=1|W=1,R=1) = 0.1945

Basics of Bayesian Network Inference Inference in Bayesian network means computing the probability distribution of a set of query variables, given a set of evidence variables

Basics of Bayesian Network Exact inference Inference by enumeration (with alarm example) B E A J M

Basics of Bayesian Network Exact inference Variable elimination by distributive law In general B E A J M Q H E

Basics of Bayesian Network Exact inference Another example of variable elimination C W R S

Basics of Bayesian Network Complexity of exact inference O(n) for polytree(singly connected network) - there exist at most one undirected path between any two nodes in the networks (e.g. alarm example) Multiply connected network: exponential time (e.g. wet grass example)

Inference in Multi-connected BNs Clustering algorithm (aka Join Tree algorithm) Basic idea Transform network into probabilistically equivalent single connected BN (aka polytree) by merging (clustering) offending nodes Most effective approach for exact evaluation of multiple connected BNs The “new” node has only one parent The time is reduced and is O(n)

Inference in Multi-connected BNs Clustering algorithm (aka Join Tree algorithm) P(C)=.5 P(C)=.5 C P(S) ________ t .10 f .50 Cloudy Cloudy C P(R) ________ t .80 f .20 C P(S+R=x) t t t f f t f f _________________________ t .08 .02 .72 .18 f .10 .40 .10 40 Spr+Rain Rain Sprinkler WetGrass S R P(W) ________________ t t .99 t f .90 f t .90 f f .00 S R P(W) ______________ t t .99 t f .90 f t .90 f f .00 WetGrass

Inference in Multi-connected BNs Clustering algorithm (aka Join Tree algorithm) Jensen Join-tree (Jensen, 1996) version the current most efficient algorithm in this class (e.g. was used in Hugin, Netica) Network evaluation is done in two stages Compile into join-tree May be slow May require too much memory if original network is highly connected Do belief updating in join-tree (usually fast) Note: clustered nodes have increased complexity; updates may be computationally complex

Inference in Multi-connected BNs Cutset conditioning The Basic idea find a minimal set of nodes whose instantiation will make the remainder of the network single connected and therefore safe for propagation Historical note: This technique to deal with the problem of propagation was suggested by Pearl.

Inference in Multi-connected BNs Cutset conditioning methods Once a variable is instantiated it can be duplicated and thus “break” a cycle A cutset is a set of variables whose instantiation makes the graph a polytree Each polytree’s likelihood is used as a weight when combining the results Evaluating the most likely polytrees first is called bounded cutset conditioning

Inference in Multi-connected BNs Cutset conditioning - Examples Eliminate Cloudy from the BN; Sum(Cloudy+,Cloudy-) Cloudy+ Cloudy+ Cloudy- Cloudy- P(R)=0.8 P(R)=0.2 Sprinkler Rain Sprinkler Rain Wet Grass Wet Grass P(S)=0.1 P(S)=0.5 C P(S) C P(R) T 0.10 T 0.80 F 0.50 F 0.20

Approximate Inference In Bayesian Network Solution to intractably large, multiply connected networks Monte Carlo algorithm Widely used to estimate quantities that are difficult to calculate exactly Randomized sampling algorithm Accuracy depends on the number of samples Two families Direct sampling Markov chaining sampling

Direct Sampling Method Procedure Sampling from known probability distribution Estimate value as (# of matched samples) / (# of total samples) Sampling order Sample each variable in turn, in topological order

Example in simple case Sampling Estimating Cloudy P(C)=.5 Sampling Cloudy [Cloudy, Sprinkler, Rain, WetGrass] C P(R) ________ t .80 f .20 C P(S) ________ t .10 f .50 [true, , , ] [true, false, , ] Rain Sprinkler [true, false, true, ] [true, false, true, true] WetGrass S R P(W) ______________ t t .99 t f .90 f t .90 f f .00 N = 1000 N(Rain=true) = N([ _ , _ , true, _ ]) = 511 P(Rain=true) = 0.511 Estimating

Rejection Sampling Used in compute conditional probabilities Procedure Generating sample from prior distribution specified by the Bayesian Network Rejecting all that do not match the evidence Estimating probability

Rejection Sampling Example Let us assume we want to estimate P(Rain|Sprinkler = true) with 100 samples 100 samples 73 samples => Sprinkler = false 27 samples => Sprinkler = true 8 samples => Rain = true 19 samples => Rain = false P(Rain|Sprinkler = true) = NORMALIZE({8,19}) = {0.296,0.704} Problem It rejects too many samples

Likelihood Weighting Advantage Idea Avoiding inefficiency of rejection sampling Idea Generating only events consistent with evidence Each event is weighted by likelihood that the event accords to the evidence

Likelihood Weighting Example P(Rain|Sprinkler=true, WetGrass = true)? Sampling The weight is set to 1.0 1. Sample from P(Cloudy) = {0.5,0.5} => true 2. Sprinkler is an evidence variable with value true w  w * P(Sprinkler=true | Cloudy = true) = 0.1 3. Sample from P(Rain|Cloudy=true)={0.8,0.2} => true 4. WetGrass is an evidence variable with value true w w * P(WetGrass=true |Sprinkler=true, Rain = true) = 0.099 [true, true, true, true] with weight 0.099 Estimating Accumulating weights to either Rain=true or Rain=false Normalize

Markov Chain Monte Carlo Let’s think of the network as being in a particular current state specifying a value for every variable MCMC generates each event by making a random change to the preceding event The next state is generated by randomly sampling a value for one of the nonevidence variables Xi, conditioned on the current values of the variables in the MarkovBlanket of Xi

Markov Blanket Markov blanket: Parents + children + children’s parents Node is conditionally independent of all other nodes in network, given its Markov Blanket

Markov Chain Monte Carlo Example Query P(Rain|Sprinkler = true, WetGrass = true) Initial state is [true, true, false, true] The following steps are executed repeatedly: Cloudy is sampled, given the current values of its MarkovBlanket variables So, we sample from P(Cloudy|Sprinkler = true, Rain=false) Suppose the result is Cloudy = false. Then current state is [false, true, false, true] Rain is sampled, given the current values of its MarkovBlanket variables So, we sample from P(Rain|Cloudy=false,Sprinkler = true, Rain=false) Suppose the result is Rain = true. Then current state is [false, true, true, true] After all the iterations, let’s say the process visited 20 states where rain is true and 60 states where rain is false then the answer of the query is NORMALIZE({20,60})={0.25,0.75}

Example applications of BNs Microsoft Belief Networks Advantages Easy to learn how to use We can specify full and casually independent probability distributions And finally it is free http://www.research.microsoft.com/adapt/MSBNx/ Netica – from Norsys Software Corp/ Disadvatages Not free, commercial product http://www.norsys.com/

Example applications of BNs Microsoft Belief Networks

Example applications of BNs Netica

Thank you!