CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying Bayes’ Rule Odds Odds-Likelihood Formulation of Bayes’ Rule Combining independent items of evidence General combination of evidence Benefits of Bayes nets for expert systems
CSE (c) S. Tanimoto, 2008 Bayes Nets 2 Motivation Logical reasoning has limitations: It requires that assumptions be considered “certain”. It typically uses general rules. General rules that are reliable may be difficult to come by. Logical reasoning can be awkward for certain structured domains such as time and space.
CSE (c) S. Tanimoto, 2008 Bayes Nets 3 Generalizing Modus Ponens Modus Ponens: P -> Q P Q Bayes’ Rule: (general idea) If P then sometimes Q P Maybe Q (Bayes’ rule lets us calculate the probability of Q, taking P into account.)
CSE (c) S. Tanimoto, 2008 Bayes Nets 4 Bayes’ Rule E: Some evidence exists, i.e., a particular condition is true H: some hypothesis is true. P(E|H) = probability of E given H. P(E|~H) = probability of E given not H. P(H) = probability of H, independent of E. P(E|H) P(H) P(H|E) = P(E) P(E) = P(E|H) P(H) + P(E|~H)(1 - P(H))
CSE (c) S. Tanimoto, 2008 Bayes Nets 5 Applying Bayes’ Rule E: The patient’s white blood cell count exceeds 110% of average. H: The patient is infected with tetanus. P(E|H) = 0.8 class-conditional probability P(E|~H) = 0.3 “ P(H) = 0.01 prior probability posterior probability: P(E|H) P(H) (0.8) (0.01) P(H|E) = = = = P(E) (0.8) (0.01) + (0.3)(0.99) P(E) = P(E|H) P(H) + P(E|~H)(1 - P(H))
CSE (c) S. Tanimoto, 2008 Bayes Nets 6 Odds Odds are 10 to 1 it will rain tomorrow P(rain) = = Suppose P(A) = 1/4 Then O(A) = (1/4) / (3/4) = 1/3 P(A) P(A) in general: O(A) = = P(~A) 1 - P(A)
CSE (c) S. Tanimoto, 2008 Bayes Nets 7 Bayes’ Rule reformulated... P(E|H) P(H) P(H|E) = P(E) _______ ______________ P(E|~H) P(~H) P(~H|E) = P(E) P(E|H) O(H|E) = O(H) P(E|~H)
CSE (c) S. Tanimoto, 2008 Bayes Nets 8 Odds-Likelihood Form of Bayes’ Rule E: The patient’s white blood cell count exceeds 110% of average. H: The patient is infected with tetanus. O(H) = 0.01/0.99 O(H|E) = λ O(H) lambda is called the sufficiency factor. O(H|~E) = λ’ O(H) lambda prime is called the necessity factor.
CSE (c) S. Tanimoto, 2008 Bayes Nets 9 The Monty Hall Problem From the Wikipedia
CSE (c) S. Tanimoto, 2008 Bayes Nets 10 The Monty Hall Problem There are three doors: a red door, green door, and blue door. Behind one is a car, and behind the other two are goats. You get to keep whatever is behind the door you choose. You choose a door (say, red). The host opens one of the other doors (say, green), which reveals a goat. The host says, “Would you like to select the OTHER door?” Should you switch?
CSE (c) S. Tanimoto, 2008 Bayes Nets 11 Discussion A: car is behind red door B: car is behind green door C: car is behind blue door P(A) = P(B) = P(C) = 1/3 Suppose D: you choose the red door, and the host opens the green door revealing a goat. Is P(C|D) = ½ ???? (No) Why not? What is P(C|D)?
CSE (c) S. Tanimoto, 2008 Bayes Nets 12 Bayes Nets A practical way to manage probabilistic inference when multiple variables (perhaps many) are involved.
CSE (c) S. Tanimoto, 2008 Bayes Nets 13 Why Bayes Networks? Reasoning about events involving many parts or contingencies generally requires that a joint probability distribution be known. Such a distribution might require thousands of parameters. Modeling at this level of detail is typically not practical. Bayes Nets require making assumptions about the relevance of some conditions to others. Once the assumptions are made, the joint distribution can be “factored” so that there are many fewer separate parameters that must be specified.
CSE (c) S. Tanimoto, 2008 Bayes Nets 14 Review of Bayes’ Rule E: Some evidence exists, i.e., a particular condition is true H: some hypothesis is true. P(E|H) = probability of E given H. P(E|~H) = probability of E given not H. P(H) = probability of H, independent of E. P(E|H) P(H) P(H|E) = P(E) P(E) = P(E|H) P(H) + P(E|~H)(1 - P(H))
CSE (c) S. Tanimoto, 2008 Bayes Nets 15 Combining Independent Items of Evidence E 1 : The patient’s white blood cell count exceeds 110% of average. E 2 : The patient’s body temperature is above 101 o F. H: The patient is infected with tetanus. O(H) = 0.01/0.99 O(H|E 1 ) = λ 1 O(H) sufficiency factor for high white cell count. O(H|E 2 ) = λ 2 O(H) sufficiency factor for high body temp. Assuming E1 and E2 are independent: O(H|E 1 E 2 ) = λ 1 λ 2 O(H)
CSE (c) S. Tanimoto, 2008 Bayes Nets 16 Bayes Net Example A: Accident (An accident blocked traffic on the highway.) B: Barb Late (Barbara is late for work). C: Chris Late (Christopher is late for work). BC A P(A) = 0.2 P(B|A) = 0.5 P(B|~A) = 0.15 P(C|A) = 0.3 P(C|~A) = 0.1
CSE (c) S. Tanimoto, 2008 Bayes Nets 17 Forward Propagation (from causes to effects) BC A P(A) = 0.2 P(B|A) = 0.5 P(B|~A) = 0.15 P(C|A) = 0.3 P(C|~A) = 0.1 Suppose A (there is an accident): Then P(B|A) = 0.5 P(C|A) = 0.3 Suppose ~A (no accident): Then P(B|~A) = 0.15 P(C|A) = 0.1 (These come directly from the given information.)
CSE (c) S. Tanimoto, 2008 Bayes Nets 18 Marginal Probabilities (using forward propagation) BC A P(A) = 0.2 P(B|A) = 0.5 P(B|~A) = 0.15 P(C|A) = 0.3 P(C|~A) = 0.1 Then P(B) = probability Barb is late in any situation = P(B|A) P(A) + P(B|~A) P(~A) = (0.5)(0.2) + (0.15)(0.8) = 0.22 Similarly P(C) = probability Chris is late in any situation = P(C|A) P(A) + P(C|~A) P(~A) = (0.3)(0.2) + (0.1)(0.8) = 0.14 Marginalizing means eliminating a contingency by summing the probabilities for its different cases (here A and ~A).
CSE (c) S. Tanimoto, 2008 Bayes Nets 19 Backward Propagation: “diagnosis” (from effects to causes) BC A P(A) = 0.2 P(B|A) = 0.5 P(B|~A) = 0.15 P(C|A) = 0.3 P(C|~A) = 0.1 Suppose B (Barb is late) What’s the probability of an accident on the highway? Use Bayes’ rule: Then P(A|B) = P(B|A) P(A) / P(B) = 0.5 * 0.2 / (0.5 * * 0.8) = 0.1 / 0.22 =
CSE (c) S. Tanimoto, 2008 Bayes Nets 20 Revising Probabilities of Consequences BC A P(A|B) = P(B|A) = 0.5 P(B|~A) = 0.15 P(C|A) = 0.3 P(C|~A) = 0.1 P(C|B) = ??? Suppose B (Barb is late). What’s the probability that Chris is also late, given this information? We already figured that P(A|B) = P(C|B) = P(C|A) P(A|B) + P(C|~A) P(~A|B) = (0.3)(0.4545) + (0.1)(0.5455) = somewhat higher than P(C)=0.14
CSE (c) S. Tanimoto, 2008 Bayes Nets 21 Handling Multiple Causes BC A P(B|A^D) = 0.9 P(B|A^~D) = 0.45 P(B|~A^D) = 0.75 P(B|~A^~D) = 0.1 D: Disease (Barb has the flu). P(D) = (These values are consistent with P(B|A) = 0.5. ) D
CSE (c) S. Tanimoto, 2008 Bayes Nets 22 Explaining Away BC A P(B|A^D) = 0.9 P(B|A^~D) = 0.45 P(B|~A^D) = 0.75 P(B|~A^~D) = 0.1 Suppose B (Barb is late). This raises the probability for each cause: P(A|B) = , P(D|B) = P(B|D) P(D)/ P(B) = Now, in addition, suppose C (Chris is late). C makes it more likely that A is true, “And this explains B.” D is now a less probable. P(B|D) = P(B|A^D)P(A) + P(B|~A^D)P(~A) = 0.78 D
CSE (c) S. Tanimoto, 2008 Bayes Nets 23 Benefits of Bayes Nets The joint probability distribution normally requires 2 n – 1 independent parameters. With Bayes Nets we only specify these parameters: 1.“root” node probabilities. e. g., P(A=true) = 0.2; P(A=false)= For each non-root node, a table of 2 k values, where k is the number of parents of that node. Typically k < Propagating probabilities happens along the paths in the net. With a full joint prob. dist., many more computations may be needed.