Uncertainty in Environments Comp3710 Artificial Intelligence Computing Science Thompson Rivers University
Course Outline Part I – Introduction to Artificial Intelligence Part II – Classical Artificial Intelligence Part III – Machine Learning Introduction to Machine Learning Neural Networks Probabilistic Reasoning and Bayesian Belief Networks Artificial Life: Learning through Emergent Behavior Part IV – Advanced Topics TRU-COMP3710 Uncertainty
Learning Outcomes Relate a proposition to probability, i.e., random variables. Give the sample space for a set of random variables. Use one of Komogorov’s 3 axioms P(A B) = P(A) + P(B) – P(A B) Give the joint probability distribution for a set of random variables Compute probabilities given a joint probability distribution Give the joint probability distribution for a set of random variables that have conditional independence relations. Compute the diagnostic probability from causal probabilities. Uncertainty
Outline How to deal with uncertainty in environments Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty
1. Uncertainty in environments An example of decision making problem: Let action At = leave for airport t minutes before flight. Will At get me there on time? [Q] How to decide? Problems: Application environments are not deterministic and unpredictable. Many hidden factors. partial observability (road state, other drivers' plans, etc.) noisy sensors (traffic reports) uncertainty in action outcomes (flat tire, etc.) immense complexity of modeling and predicting traffic Hence a purely logical approach either risks falsehood: “A25 will get me there on time”, or leads to conclusions that are too weak for decision making: “A25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc.” (A1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport …) Uncertainty
Methods for handling uncertainty Topics Methods for handling uncertainty [Q] How to handle uncertainty? [Q] Default logic? Default assumptions: My car does not have a flat tire. ... A25 works unless contradicted by evidence. But issues: What assumptions are reasonable? [Q] Can we use fuzzy logic? Probability – uncertainty/certainty Model of degree of belief Given the available evidence or knowledge – environments, A25 will get me there on time with probability 0.04. Fuzzy logic – ambiguity/clearness Degree of truth – value (observed data/facts) E.g., my age Uncertainty
2. Probability Probability is a way of expressing knowledge or belief that an event will occur or has occurred. E.g., … Probabilistic assertions summarize effects of hidden factors, i.e., laziness: failure to enumerate exceptions, qualifications, etc. ignorance: lack of relevant facts, initial conditions, etc. Subjective or Bayesian probability: Probabilities can relate propositions to agent's own state of knowledge (or conditions). e.g., P(A25 | no reported accidents) = 0.06 Probabilities of propositions change with new evidence: e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15 Hence probability theory can be a good tool. Uncertainty
Making decisions under uncertainty Topics Making decisions under uncertainty Suppose I believe the following: P(A25 gets me there on time | …) = 0.04 P(A90 gets me there on time | …) = 0.70 P(A120 gets me there on time | …) = 0.95 P(A1440 gets me there on time | …) = 0.9999 [Q] Which action to choose? Depends on my preferences for missing flight vs. time spent waiting, etc. Utility (the quality of being useful) theory is used to represent and infer preferences. Decision theory = probability theory + utility theory Uncertainty
3. Syntax and Semantics Sample space, event, probability, probability distribution Random variable and proposition Prior probability (aka unconditional probability) Posterior probability (aka conditional probability) Uncertainty
Sample Space and Probability Definition. Begin with a set Ω – the sample space of the same type atomic events E.g., 6 possible rolls of a die -> Ω = {???} ω in Ω is a sample point / possible world / atomic event Definition. A probability space or probability distribution or probability model is a sample space with an assignment P(ω) for every ω in Ω s.t. 0 ≤ P(ω) ≤ 1 Σω P(ω) = 1 E.g., P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6. [Q] Sample space? [Q] Probability distribution? Definition. An event A can be considered as a subset of Ω. P(A) = Σω in A P(ω) E.g., P(die roll < 4) = P(1) + P(2) + P(3) = 1/6 + 1/6 + 1/6 = 1/2 Surface 1 2 3 4 5 6 Probability 1/6 Uncertainty
3. Axioms of probability Kolmogorov’s axioms: For any propositions (events) A, B 0 ≤ P(A) ≤ 1 P(true) = 1 and P(false) = 0 P(A B) = P(A) + P(B) – P(A B) Uncertainty
Random Variable and Proposition Definition. Basic element: random variable, representing a sample space A function from sample points to some range, e.g., the reals or Booleans E.g., Odd: N -> {true, false} Odd(1) = true Boolean random variables E.g., Cavity (do I have a cavity?); [Q] What is the sample space? Discrete random variables (finite or infinite) E.g., Weather is one of <sunny, rainy, cloudy, snow>; [Q] What is the sample space? Continuous random variables (bounded or unbounded) E.g., RoomTemperature < 0.22; [Q] What is the sample space? Uncertainty
Proposition (how to relate to probability?) 3. Proposition (how to relate to probability?) Think of a proposition as the event where the proposition is true. E.g., I have a toothache; The weather is sunny and there is no cavity. [Q] How to relate to probability, i.e., random variables? Similar to propositional logic: possible worlds defined by assignment of values to random variables. A sample point is a value of a set of random variables. A sample space is the Cartesian product of the ranges of random variables. [Q] What is the sample space for Cavity and Weather? Elementary proposition constructed by assignment of a value to a random variable: E.g., Weather = sunny; Cavity = false (can be abbreviated as cavity) Complex propositions formed from elementary propositions and standard logical connectives E.g., Weather = sunny Cavity = false Proposition = conjunction/disjunction of atomic events Uncertainty
Prior and Posterior Probability Prior (aka unconditional) probability Posterior (aka conditional) probability Uncertainty
Prior probability Prior or unconditional probabilities of propositions E.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence. Probability distribution gives values for all possible assignments: P(Weather) = <0.72, 0.1, 0.08, 0.1> (normalized, i.e., sums to 1) Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables. P(Weather, Cavity) = a 4 × 2 matrix of values: Weather = sunny rainy cloudy snow Cavity = true 0.144 0.02 0.016 0.02 [Q] What is the sum of all the values above? Every question about a domain can be answered with the joint probability distribution. [Q] Any problem? Size of the table. Uncertainty
P(Weather, Cavity) = a 4 × 2 matrix of values: Weather = sunny rainy cloudy snow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 [Q] What is the probability of cloudy? [Q] What is the probability of snow and no-cavity? [Q] What is the probability of sunny or cavity? P(W=sunny) P(C=true) = P(W=sunny) + P(C=true) ??? ??? = (.144 + .576) + (.144 + .02 + .016 + .02) – ??? Uncertainty
Prior probability – continuous variables In general, express probability distribution as parameterized function, called a probability density function (pdf), of value E.g., P(X=x) = U[18,26](x) = 0.125 P is a density; integrates to 1 0.125 18 dx 26 Uncertainty
Example of Normal distribution, or also called Gaussian distribution Probabilities [This image is from Wikipedia.org] Events Uncertainty
Interesting facts: 3. P(A) = 1 – P(~A) P(A) = P(A B) + P(A ~B) for any B Uncertainty
Conditional probability Posterior or conditional probabilities i.e., given that toothache is all I know, (it is not like if toothache,) 60% chance of cavity. Another example, P(toothache | cavity) = 0.8 New evidence/observation/fact may be irrelevant, allowing simplification, e.g., when Cavity and Weather are irrelevant, P(cavity | toothache, sunny) = P(cavity | toothache) = 0.6 This kind of inference, sanctioned by domain knowledge, is crucial. Very interesting examples: [Q] The stiff neck is one symptom of meningitis; the probability of having meningitis with stiff neck? [Q] A car won’t start. What is the probability of having a bad starter? What is the probability of having a bad battery? Uncertainty
Definition of conditional probability: P(a | b) = P(a b) / P(b) if P(b) > 0 Product rule gives an alternative formulation: P(a b) = P(a) P(b | a) = P(b) P(a | b) Chain rule is derived by successive application of product rule: P(a b c) = ??? = P(a b) P(c | a b) = P(a) P(b | a) P(c | a b) P(X1,…, Xn) = P(X1,..., Xn-1) P(Xn | X1,..., Xn-1) = P(X1,..., Xn-2) P(Xn-1 | X1,..., Xn-2) P(Xn | X1,..., Xn-1) = … = P(X1) P(X2 | X1) P(X3 | X1, X2) P(X4 | X1, X2, X3) … = i=1 P(Xi | X1,…, Xi-1) will be used in Inference later! a n Uncertainty
3. Topics Interesting facts: P(A|B) = 1 – P(~A|B) Uncertainty
4. Inference by Enumeration Start with the joint probability distribution: Toothache (I have a toothache), Cavity (meaning?), Catch (catch probe?). [Q] What are random variables here? [Q] What is the sum of probabilities of all the atomic events? [Q] How can you obtain the above probability distribution? For any proposition φ, the probability is the sum of the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) Uncertainty
For any proposition φ, sum the atomic events where it is true: Start with the joint probability distribution: Toothache, Cavity, Catch For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) [Q] P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 [Q] P(cavity toothache) = ??? [Q] P(cavity toothache) = ??? Uncertainty
Can also compute conditional probabilities: Very important! Start with the joint probability distribution: Toothache, Cavity, Catch Can also compute conditional probabilities: Very important! P(cavity | toothache) = P(cavity toothache) P(toothache) = 0.016 + 0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4 [Q] P(cavity | toothache) = ??? P(cavity | toothache) = ??? [Q] P(cavity | catch) = ??? [Q] P(cavity | toothache) ??? Uncertainty
Denominator can be viewed as a normalization constant α P(Cavity | toothache) = P(Cavity, toothache) / P(toothache) = α, P(Cavity, toothache) = α, [P(Cavity, toothache, catch) + P(Cavity, toothache, catch)] = α, [<0.108, 0.016> + <0.012, 0.064>] = α, <0.12, 0.08> = <0.6, 0.4> [Q] What is α? General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables Uncertainty
Obvious problems: In other words, too much. [Q] How to reduce? Topics For n random variables, Worst-case time complexity O(dn) where d is the largest arity Space complexity O(dn) to store the joint distribution How to find the numbers for O(dn) entries? In other words, too much. [Q] How to reduce? Uncertainty
5. Independence & Bayes’ Rule Conditional independence Bayes’ rule Uncertainty
5. Independence A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A,B) = P(A) P(B) P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) × P(Weather) In the above example, 32 (2*2*2*4) entries reduced to 12 (2*2*2 + 4). Absolute independence is powerful but rare. Dentistry is a large field with hundreds of variables, none of which are independent. [Q] What do we have to do? Uncertainty
Conditional Independence If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(catch | (toothache cavity)) = P(catch | cavity) The same independence holds if I haven't got a cavity: (2) P(catch | (toothache cavity)) = P(catch | cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache, Cavity) = P(Catch | Cavity) When Catch is conditionally independent of Toothache given Cavity, equivalent statements: P(Toothache | Catch, Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) (similarly P(A, B) = P(A) P(B) when A and B are independent) Uncertainty
Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) by the product rule = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) by ??? = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) by ??? 2 * 2 * 2 = 8 -> 2 + 2 + 1 How ??? Cavity Toothache Catch [Q] P(toothache) from the tables below??? = P(tcavity) + P(tcavity) = P(t|c) P(c) + P(t|c) P(c) = ??? [Q] P(t|cavity) = ??? = 1 – P(t|c) = ??? ? toothache | cavity toothache | cavity 0.6 0.1 catch | cavity | cavity 0.9 0.2 cavity 0.2 Uncertainty
Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments. Cavity Toothache Catch Uncertainty
It is how to show the conditional independency. 5. Interesting fact: P(A,B|C) = P(A|C) P(B|C) when A and B are conditionally independent of each other given C. Actually it is a definition. P(toothache catch | cavity) = .108 / (.108 + .012 + .072 + .008) = .054 P(toothache | cavity) = (.108 + .012) / .2 = .12 / .2 = .06 P(catch | cavity) = (.108 + .072) / .2 = .09 P(toothache | cavity) P(catch | cavity) = .06 * .09 = .054 P(toothache catch | cavity) = P(toothache | cavity) P(catch | cavity) … Therefore Toothache and Catch are conditionally independent given Cavity. It is how to show the conditional independency. Uncertainty
Bayes' Rule Product rule P(ab) = P(a | b) P(b) = P(b | a) P(a) Bayes' rule, or also called Bayes’ Theorem: P(a | b) = P(b | a) P(a) / P(b) or in distribution form P(Y|X) = P(X|Y) P(Y) / P(X) = α P(X|Y) P(Y) Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) [Q] The stiff neck is one symptom of meningitis; the probability of having meningitis with stiff neck? [Q] A car won’t start. What is the probability of having a bad starter? What is the probability of having a bad battery? Uncertainty
Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) [Q] When cavity, usually toothache P(cavity | toothache) ? [Q] E.g., let M be meningitis, S be stiff neck; the stiff neck is one symptom of meningitis; the probability of having meningitis with stiff neck? P(s) = 0.1; P(m) = 0.0001; P(s|m) = 0.8 P(m|s) = ??? [Q] How to find P(s|m), P(m) and P(s) ??? Note: posterior probability of meningitis still very small! Uncertainty
Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) [Q] E.g., let B be ‘bad battery’, N be ‘no starting in cold morning’; N is one symptom of B; the probability of having bad battery when a car won’t start in cold morning? P(n) = 0.01; P(b) = 0.05; P(n|b) = 0.8 P(b|n) = ??? Uncertainty
Bayes' rule and conditional independence 5. Bayes' rule and conditional independence Topics P(Cavity | toothache catch) = P(toothache catch | Cavity) P(Cavity) = P(toothache | Cavity) P(catch | Cavity) P(Cavity) This is an example of a naïve Bayes model: P(Cause, Effect1, … , Effectn) = P(Cause) Πi P(Effecti|Cause), where Effects are conditionally independent of each other, given Cause. We will use this later again! Uncertainty
Topics 6. Summary Probability is a rigorous formalism for uncertain knowledge. Joint probability distribution for discrete random variables specifies probability of every atomic event. Queries can be answered by summing over atomic events. For nontrivial domains, we must find a way to reduce the joint size. Independence and conditional independence provide the tools. Uncertainty