CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock

Overview and Example A A J J M M B B E E The alarm (A) might go off because of either a Burglary (B) and/or an Earthquake (E). And when the alarm (A) goes off, either John (J) and/or Mary (M) will call to report. Possible questions: Given the evidence of either B or E, what’s the probability of J or M will call? Answer to this type of questions: Posterior distribution: P(Q 1, Q 2 … | E 1 =e 1, E 2 =e 2 ) It's the probability distribution of one or more query variables given the values of the evidence variables. EVIDENCE QUERY HIDDEN

Overview and Example A A J J M M B B E E The alarm (A) might go off because of either a Burglary (B) and/or an Earthquake (E). And when the alarm (A) goes off, either John (J) and/or Mary (M) will call to report. Possible questions: Out of all the possible values for all the query variables, which combination of values has the highest probability? Answer to these questions: argmax q : P(Q 1 =q 1, Q 2 =q 2 … | E 1 =e 1, …) Which Q values are maxable given the evidence values? EVIDENCE QUERY HIDDEN

Overview and Example A A J J M M B B E E Imagine the situation where Mary has called to report that the alarm is going off, and we want to know whether or not there has been a burglary. For each of the nodes, tell us if the node is an evidence node, a hidden node or a query node?

Overview and Example A A J J M M B B E E Imagine the situation where Mary has called to report that the alarm is going off, and we want to know whether or not there has been a burglary. For each of the nodes, tell us if the node is an evidence node, a hidden node or a query node? Evidence: M Query: B Hidden: E, A, J

Inference through enumeration A A J J M M B B E E P(+b|+j, +m) = ??? Imagine the situation where both John and Mary have called to report that the alarm is going off, and we want to know the probability of a burglary. Definition: Conditional probability: P(Q|E) = P(Q, E) / P(E)

Inference through enumeration A A J J M M B B E E P(+b|+j, +m) = ??? = P(+b, +j, +m) / P(+j, +m) P(+b, +j, +m) Definition: Conditional probability: P(Q|E) = P(Q, E) / P(E)

Inference through enumeration A A J J M M B B E E BP(B) +b+b 0.01 ¬b¬b 0.999 EP(E) +e0.002 ¬e0.998 AJP(J|A ) +a+j0.9 +a¬j0.1 ¬a+j0.05 ¬a¬j0.95 AMP(M|A ) +a+m0.7 +a¬m0.3 ¬a+m0.01 ¬a¬m0.99 BEAP(A|B,E) +b+b +e+a+a 0.95 +b+b +e¬a¬a 0.05 +b+b ¬e+a+a 0.94 +b+b ¬e¬a¬a 0.06 ¬b¬b +e+a+a 0.29 ¬b¬b +e¬a¬a 0.71 ¬b¬b ¬e+a+a 0.001 ¬b¬b ¬e¬a¬a 0.999 Given +e, +a ???

Inference through enumeration P(+b)P(e)P(a|+b,e)P(+j|a ) P(+m|a ) +e, +a 0.0010.0020.950.90.70.000001197 +e, ¬a 0.0010.0020.05 0.015e-11 ¬e, +a 0.0010.9980.940.90.70.0005910156 ¬e, ¬a 0.0010.9980.05 0.012.495e-8 0.0005922376 P(+b, +j, +m)

Inference through enumeration P(b)P(e)P(a|b,e)P(+j|a ) P(+m|a ) +e, +a, +b 0.0010.0020.950.90.70.000001197 +e, ¬a, +b 0.0010.0020.05 0.015e-11 ¬e, +a, +b 0.0010.9980.940.90.70.0005910156 ¬e, ¬a, +b 0.0010.9980.05 0.012.495e-8 +e, +a, ¬b 0.9990.0020.290.90.70.0003650346 +e, ¬a, ¬b 0.9990.0020.05 0.014.995e-8 ¬e, +a, ¬b 0.9990.9980.710.90.70.4459589946 ¬e, ¬a, ¬b 0.9990.9980.9990.050.010.00049800249 0.44741431924 P(+j, +m)

Inference through enumeration A A J J M M B B E E P(+b|+j, +m) = ??? = P(+b, +j, +m) / P(+j, +m) = 0.0005922376 / 0.44741431924 = 0.284 Definition: Conditional probability: P(Q|E) = P(Q, E) / P(E)

Enumeration We assumed binary events/Boolean variables. Only 5 variables: – 2 5 = 32 rows in the CPT Practically, what if we have a large network? A A J J M M B B E E

Example: Car-diagnosis Initial evidence: engine won't start Testable variables (thin ovals), diagnosis variables (thick ovals) Hidden variables (shaded) ensure sparse structure, reduce parameters

Example: Car insurance Predict claim costs (medical, liability, property) given data on application form (other unshaded nodes) If Boolean: 2 27 rows in the CPT NOT Boolean in reality.

Speed Up Enumeration P(+b, +j, +m) Pulling out terms:

Speed up enumeration Maximize Independence – The structure of the Bayes network determines how efficient to calculate the probability values. X1X1 X1X1 X2X2 X2X2 XnXn XnXn O(n) X1X1 X1X1 XnXn XnXn X2X2 X2X2 O(2 n )

Bayesian networks: definition A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: – a set of nodes, one per variable – a directed, acyclic graph (link = “directly influences") – a conditional distribution for each node given its parents: P(X i |Parents(X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values

Constructing Bayesian Networks Dependent or Independent? – P(J|M) = P(J)? A A J J M M B B E E J J M M The alarm (A) might go off because of either a Burglary (B) and/or an Earthquake (E). And when the alarm (A) goes off, either John (J) and/or Mary (M) will call to report. Suppose we choose the ordering M, J, A, B, E

J J M M A A P(A|J,M) = P(A|J)? P(A|J,M) = P(A)?

J J M M A A B B P(B|A, J, M) = P(B|A)? P(B|A, J, M) = P(B)?

J J M M A A B B E E P(E|B, A, J, M) = P(E|A)? P(E|B, A, J, M) = P(E|A, B)?

J J M M A A B B E E Deciding conditional independence is hard in non-causal directions (Causal models and conditional independence seem hardwired for humans!) Assessing conditional probabilities is hard in non-causal directions Network is less compact: 1 + 2 + 4 + 2 + 4=13 numbers needed

Variable Elimination Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid re-computation (sum out A) (sum out E)

Variable Elimination Variable elimination: – Summing out a variable from a product of factors: move any constant factors outside the summation add up submatrices in pointwise product of remaining factors – still N-P complete, but faster than enumeration Pointwise product of factors f 1 and f 2

Variable Elimination R R T T L L +r0.1 ¬r0.9 +r+t0.8 +r¬t0.2 ¬r+t0.1 ¬r¬t0.9 +t+l0.3 +t¬l0.7 ¬t+l0.1 ¬t¬l0.9 1)Joining factors P(R, T) P(R) P(T|R) P(L|T) +r+t0.08 +r¬t0.02 ¬r+t0.09 ¬r¬t0.81

Variable Elimination R R T T L L RT L L +r+t0.08 +r¬t0.02 ¬r+t0.09 ¬r¬t0.81 P(R, T) +t+l0.3 +t¬l0.7 ¬t+l0.1 ¬t¬l0.9 P(L|T) Marginalize on the variable R, to gives us a table of just the variable T. P(R,T) - > P(T) +t?? ¬t??

Variable Elimination R R T T L L RT L L +r+t0.08 +r¬t0.02 ¬r+t0.09 ¬r¬t0.81 P(R, T) +t+l0.3 +t¬l0.7 ¬t+l0.1 ¬t¬l0.9 P(L|T) 2) Marginalize on the variable R, to gives us a table of just the variable T. P(R,T) - > P(T) +t0.17 ¬t0.83

Variable Elimination R R T T L L RT L L +t+l0.3 +t¬l0.7 ¬t+l0.1 ¬t¬l0.9 P(L|T) +t0.17 ¬t0.83 T T L L P(T) 3) Joint probability of P(T, L) +t+l?? +t¬l?? ¬t+l?? ¬t¬l??

Variable Elimination R R T T L L RT L L +t+l0.3 +t¬l0.7 ¬t+l0.1 ¬t¬l0.9 P(L|T) +t0.17 ¬t0.83 T T L L P(T) 3) Joint probability of P(T, L) +t+l0.051 +t¬l0.119 ¬t+l0.083 ¬t¬l0.747

Variable Elimination R R T T L L RT L L T T L L 4) P(L) +t+l0.051 +t¬l0.119 ¬t+l0.083 ¬t¬l0.747 P(T, L) T, L +l?? ¬l??

Variable Elimination R R T T L L RT L L T T L L 4) P(L) +t+l0.051 +t¬l0.119 ¬t+l0.083 ¬t¬l0.747 P(T, L) T, L +l0.134 ¬l0.886 Choice of ordering is important!

Approximate Inference: Sampling Joint probability of heads and tails of a 1 cent, and a 5 cent coin. Advantages: – Computationally easier. – Works even without CPTs. 1 cent5 cent HH HT TH TT

Sampling Example Cloudy: P(C) C C S S R R W W +c0.5 ¬c0.5 Rain: P(R|C) +c+r0.8 ¬r0.2 ¬c+r0.2 ¬r0.8 Sprinkler: P(S|C) +c+s0.1 ¬s0.9 ¬c+s0.5 ¬s0.5 Sprinkler: P(W|S,R) +c+s+s+w0.99 ¬w0.01 ¬s+w0.90 ¬w0.10 ¬c+s+w0.90 ¬w0.10 ¬s+w0.01 ¬w0.99 Samples: +c, ¬s, +r Sampling is consistent if we want to compute the full joint probability of the network or individual variables. What about conditional probability? P(w|¬c) Rejection sampling: need to reject samples that do not match the probabilities that we are interested in.

Rejection sampling Too many rejected samples make it in- efficient. – Likelihood weight sampling: inconsistent A A B B

Likelihood weighting Cloudy: P(C) C C S S R R W W +c0.5 ¬c0.5 Rain: P(R|C) +c+r0.8 ¬r0.2 ¬c+r0.2 ¬r0.8 Sprinkler: P(S|C) +c+s0.1 ¬s0.9 ¬c+s0.5 ¬s0.5 Sprinkler: P(W|S,R) +c+s+s+w0.99 ¬w0.01 ¬s+w0.90 ¬w0.10 ¬c+s+w0.90 ¬w0.10 ¬s+w0.01 ¬w0.99 P(R|+s, +w) Weight samples: +c, 0.1 +s, +r, 0.99 +w weight:.01 x.99, +c, +s, +r, +w P(C|+s, +r) ??

Gibbs Sampling Markov Chain Monte Carlo (MCMC) – Sample one variable at a time conditioning on others. +s +c -r -w -s +c -r -w -s +c +r -w

Monty Hall Problem Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 2 [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say No. 1, which has a goat. He then says to you, "Do you want to pick door No. 3?" Is it to your advantage to switch your choice? P(C=3|S=2) = ?? P(C=3|H=1,S=2) = ??

Monty Hall Problem Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 2 [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say No. 1, which has a goat. He then says to you, "Do you want to pick door No. 3?" Is it to your advantage to switch your choice? P(C=3|S=2) = 1/3 P(C=3|H=1,S=2) = 2/3 Why???

Monty Hall Problem P(C=3|H=1,S=2) – = P(H=1|C=3,S=1)P(C=3|S=1)/SUM(P(H=1|C=i, S=2)P(C=i|S=2) = 2/3 P(C=1|S=2) = P(C=2|S=2)=P(C=3|S=2) = 1/3

CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

Similar presentations

Presentation on theme: "CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

Similar presentations

Presentation on theme: "CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock."— Presentation transcript:

Similar presentations

About project

Feedback