CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes Feb 28 and March 13-15, 2012.

Slides:



Advertisements
Similar presentations
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Advertisements

PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Artificial Intelligence Uncertainty
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) March, 17, 2010.
Marginal Independence and Conditional Independence Computer Science cpsc322, Lecture 26 (Textbook Chpt 6.1-2) March, 19, 2010.
CPSC 422 Review Of Probability Theory.
Probability.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Uncertainty Management for Intelligent Systems : for SEP502 June 2006 김 진형 KAIST
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Probabilistic Reasoning
Methods in Computational Linguistics II Queens College Lecture 2: Counting Things.
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider:  x Symptom(x, Toothache)  Disease(x, Cavity). The problem is that this.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
CHAPTER 13 Oliver Schulte Summer 2011 Uncertainty.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 15and20, 2012.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 20, 2012.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Probability and naïve Bayes Classifier Louis Oliphant cs540 section 2 Fall 2005.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) Nov, 5, 2012.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
2101INT – Principles of Intelligent Systems Lecture 9.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 13 –Reasoning with Uncertainty Tuesday –AIMA, Ch. 14.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
Uncertainty 1. 2 ♦ Uncertainty ♦ Probability ♦ Syntax and Semantics ♦ Inference ♦ Independence and Bayes’ Rule Outline.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Uncertainty Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Outline [AIMA Ch 13] 1 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty & Probability CIS 391 – Introduction to Artificial Intelligence AIMA, Chapter 13 Many slides adapted from CMSC 421 (U. Maryland) by Bonnie.
Anifuddin Azis UNCERTAINTY. 2 Introduction The world is not a well-defined place. There is uncertainty in the facts we know: What’s the temperature? Imprecise.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Introduction to Artificial Intelligence – Unit 7 Probabilistic Reasoning Course The Hebrew University of Jerusalem School of Engineering and Computer.
Computer Science cpsc322, Lecture 25
Marginal Independence and Conditional Independence
Bayesian networks Chapter 14 Section 1 – 2.
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Bayes Rule Which is shorthand for: 10 Mar 2004 CS Uncertainty.
Conditional Probability, Bayes’ Theorem, and Belief Networks
Uncertainty.
Uncertainty in Environments
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2007
Uncertainty Chapter 13.
Probabilistic Reasoning
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Bayesian Networks CSE 573.
Uncertainty Chapter 13.
Presentation transcript:

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes Feb 28 and March 13-15, 2012

Uncertain knowledge/beliefs arise from Chance (probability) Incomplete knowledge – Defeasible (“default”) reasoning – Reasoning from ignorance Ambiguous or conflicting evidence – Evidential reasoning Vagueness – Fuzzy / open-textured concepts

Examples Chance – Will your first child be a boy or girl (before conceived) – Will the coin come up heads or tails Defeasible reasoning: – If I believe Tweety is a bird, then I think he can fly – If I learn Tweety is a penguin then I think he can’t fly Assuming the normal case unless/until you learn otherwise Reasoning from ignorance: – I believe the President of the US is alive

Examples (cont.) “Evidential reasoning” - how strongly do I believe P based on the evidence? (Confidence levels) – Quantitative [ 0, 1 ], [ -1, 1 ] – Qualitative {definite; very likely, likely, neutral, unlikely, very unlikely, definitely not} Fuzzy concepts: – John is tall – My friend promises to return a book “soon” Add “degree” to fuzzy assertions (between 0 and 1)

Some problems Using probability, if we know P(A) and P(B) we have methods for calculating P(~A), P(A and B), P(A or B), P(A | B) Using evidential or fuzzy models, these cause problems of consistency: If “John is tall”.8 and “John is smart”.6 what can be said about “John is both tall and smart” ? We will mainly focus on probabilistic reasoning models

Syntax for probabilistic reasoning Basic element: random variable Similar to propositional logic: possible worlds defined by assignment of values to random variables. Boolean random variables e.g., Cavity (do I have a cavity?) Discrete random variables e.g., Weather is one of Domain values must be exhaustive and mutually exclusive Elementary proposition constructed by assignment of a value to a single random variable: e.g., Weather = sunny, Cavity = false (sometime abbreviated as  cavity) Complex propositions formed from elementary propositions and standard logical connectives e.g., Weather = sunny  Cavity = false

Syntax Atomic event: A complete specification of the state of the world about which the agent is uncertain E.g., if the world consists of only two Boolean variables Cavity and Toothache, then there are 4 distinct atomic events: Cavity = false  Toothache = false Cavity = false  Toothache = true Cavity = true  Toothache = false Cavity = true  Toothache = true Atomic events are mutually exclusive and exhaustive (often called “Outcomes”) Events in general are sets of atomic events, such as “Cavity = true”

Axioms of probability For any propositions A, B – 0 ≤ P(A) ≤ 1 – P(true) = 1 and P(false) = 0 – P(A  B) = P(A) + P(B) - P(A  B)

Prior probability Prior or unconditional probabilities of propositions e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution gives values for all possible assignments: P(Weather) = (sums to 1) Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables: P(Weather,Cavity) = a 4 × 2 matrix of values: Weather =sunnyrainycloudysnow Cavity = true Cavity = false Every probability question about a domain can be answered by the joint distribution

Conditional probability Conditional or posterior probabilities e.g., P(cavity | toothache) = 0.8 i.e., given that toothache is all I know (Notation for conditional distributions (use boldface): P(Cavity | Toothache) = 2-element vector of 2-element vectors) If we know more, e.g., cavity is also given, then we have P(cavity | toothache,cavity) = 1 New evidence may be irrelevant, allowing simplification, e.g., cavity does not depend on weather: P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8 This kind of inference, sanctioned by domain knowledge, is crucial for probabilistic reasoning in AI. WHY?

Conditional probability Definition of conditional probability: P(a | b) = P(a  b) / P(b) if P(b) > 0 Product rule gives an alternative formulation: P(a  b) = P(a | b) P(b) = P(b | a) P(a) A general version holds for whole distributions, e.g., P(Weather,Cavity) = P(Weather | Cavity) P(Cavity) (View as a set of 4 × 2 equations, not matrix mult.) Chain rule is derived by successive application of product rule: P(X 1, …,X n ) = P(X 1,...,X n-1 ) P(X n | X 1,...,X n-1 ) = P(X 1,...,X n-2 ) P(X n-1 | X 1,...,X n-2 ) P(X n | X 1,...,X n-1 ) = … = P(X1) P(X2 | X1) P(X3 | X1, X2)... P(Xn | X1,..., Xn-1) OR: π i= 1 to n P(X i | X 1, …,X i-1 )

Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω ╞ φ P(ω)

Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω ╞ φ P(ω) P(toothache) = = 0.2

Inference by enumeration Start with the joint probability distribution: Can also compute conditional probabilities: P(  cavity | toothache) = P(  cavity  toothache) P(toothache) = = 0.4

In class exercise Given the joint distribution shown on slide and the definition P(a | b) = P(a  b) / P(b): – What is P(Cavity = True) ? – What is P(Weather = Sunny) ? – What is P(Cavity = True | Weather = Sunny) Given the meta-equation: – P(Weather,Cavity) = P(Weather | Cavity) P(Cavity) What are the 8 equations represented here? Weather = sunny rainycloudy snow Cavity = true Cavity = false

Independence A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B) P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) 32 entries reduced to 12; for n independent biased coins, O(2 n ) →O(n) Absolute independence powerful but rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do?

Bayes' Rule Product rule P(a  b) = P(a | b) P(b) = P(b | a) P(a)  Bayes' rule: P(a | b) = P(b | a) P(a) / P(b) or in distribution form P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y) Useful for assessing diagnostic probability from causal probability: – P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) – E.g., let M be meningitis, S be stiff neck: P(m|s) = P(s|m) P(m) / P(s) = 0.8 × / 0.1 = – Note: posterior probability of meningitis still very small!

Example: Expert Systems for Medical Diagnosis 100 diseases 20 symptoms # of parameters needed to calculate P(Di) when a patient provides his/her symptoms Strategy to reduce the size: assume independence of symptoms Recalculate number of parameters needed

Bayes' Rule and conditional independence P(Cavity | toothache  catch) = αP(toothache  catch | Cavity) P(Cavity) = αP(toothache | Cavity) P(catch | Cavity) P(Cavity) This is an example of a naïve Bayes model: P(Cause,Effect 1, …,Effect n ) = P(Cause) π i P(Effect i |Cause) Total number of parameters is linear in n

Conditional independence P(Toothache, Cavity, Catch) has 2 3 – 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(catch | toothache, cavity) = P(catch | cavity) The same independence holds if I haven't got a cavity: (2) P(catch | toothache,  cavity) = P(catch |  cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache,Cavity) = P(Catch | Cavity) Equivalent statements: P(Toothache | Catch, Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)

Conditional independence contd. Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) I.e., = 5 independent numbers In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments.

Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: – a set of nodes, one per variable – a directed, acyclic graph (link ≈ "directly influences") – a conditional distribution for each node given its parents: P (X i | Parents (X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values

Probabilities/Bayesian Inference Nets Thanks to Andrew Moore from CMU for some great slides. (used with permission)

Extend to P(A ^ B ^ C ^ …) = ? Review: Conditional probabilities and JPD (joint distribution)

Chain rule follows from this definition Product rule P(a  b) = P(a | b) P(b) = P(b | a) P(a) Chain rule is derived by successive application of product rule: P(X 1, …,X n ) can also be written P(X1 ^... ^ Xn) = P([Xn ^ [X1,... Xn-1]) = P(X 1,...X n-1 ) P(X n | X 1,...,X n-1 ) = P(X 1,...,X n-2 ) P(X n-1 | X 1,...,X n-2 ) P(X n | X 1,...,X n-1 ) = … = P(X1) P(X2 | X1) P(X3 | X1, X2)... P(Xn | X1,..., Xn-1)

Conditional Prob. example

Example Likes FootballDislikesNeutral Male Female In-class exercise: Calculate: P(Likes Football | Male ) P( ~ Likes Football | Female)

Review the Joint Distribution (JPD)

What assumption can we make ?

Test your understanding: Fill in the table

Structure for CP-based AI Models Given a set of RV’s X, typically, we are interested in the posterior joint distribution of the query variables Y given specific values e for the evidence variables E Let the hidden variables be H = X - Y – E Then the required calculation of P(Y | E) is done by summing out the hidden variables: P( Y | E = e) = αP(Y ^ E = e) or αΣ h P(Y ^ E= e ^ H = h) Note: what is α ? Given the definition: P(a | b) = P(a  b) / P(b) α is the denominator 1/P(E=e). P(E=e) can be calculated from the joint distribution as: Σ h P(E= e ^ H = h)

Example (medical diagnosis) Causal model: D  I  S (Y  H  E) Cancer  anemia  fatigue Kidney disease  anemia  fatigue P(Y=cancer | E=fatigue) = α [ P(Y=cancer ^ E=fatigue ^ anemia) + P(Y=cancer ^ E=fatigue ^ ~anemia) ] α = 1/P(E = fatigue) or 1/[P(E=fatigue ^ anemia) + P(E=fatigue ^ ~anemia) ]

Analysis P(Y | E = e) = αP(Y ^ E = e) = αΣ h P(Y ^ E= e ^ H = h) [repeated] The terms in the summation are joint entries because Y, E and H together exhaust the set of random variables Obvious problems: 1.Time and space complexity O(d n ) where d is the largest arity 2.How to find the numbers to solve real problems? (A solution to 1. : assume independence !!)

What is Independence ?? A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B) P(Toothache, Catch, Cavity, Weather) JD entries are 2x2x2x4 = P(Toothache, Catch, Cavity) P(Weather) entries are 2x2x entries reduced to 12 In general, total independence assumption reduces exponential to linear complexity

What is Independence ?? A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B) Toss 10 coins, different OUTCOMES are 2^10 = 2048 Biased coins whose behavior is independent of each other: O(2 n ) →O(n) = can compute P(all outcomes) with 10 values All coins have the same bias (includes the case of fair coins) ???? How many values are needed ? Test your understanding: Consider a “3 sided coin” (or die). How many entries needed to show the probabilities of all outcomes? If you toss 10 of those and: All have the same bias? Bias unknown, but independence is assumed? Bias unknown, no independence assumed?

Example: Expert Systems for Medical Diagnosis 10 diseases 20 symptoms # of parameters needed to calculate P(D | S) for all combinations using a JPD Strategy to reduce the size of the model: assume mutual independence of symptoms and diseases - Recalculate number of parameters needed Absolute independence powerful but rare Medicine is a large field with hundreds of variables, many of which are not independent. What to do?

Problem 2: We still need to find the numbers Assuming independence, doctors may be able to estimate: P(symptom | disease) for each S/D pair (causal reasoning) While what we need to know s/he may not be able to estimate as easily: P(disease | symptom) Thus, the importance of Bayes rule in probabilistic AI

Bayes' Rule Product rule P(a  b) = P(a | b) P(b) = P(b | a) P(a)  Bayes' rule: P(a | b) = P(b | a) P(a) / P(b) or in distribution form P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y) Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) P(Disease|Symptom) = P(Symptom|Diease) P(Symptom) / (Disease) – E.g., let M be meningitis, S be stiff neck: P(m|s) = P(s|m) P(m) / P(s) = 0.8 × / 0.1 = – Note: posterior probability of meningitis still very small!

Bayes' Rule and conditional independence P(Cavity | toothache  catch) = αP(toothache  catch | Cavity) P(Cavity) = αP(toothache | Cavity) P(catch | Cavity) P(Cavity) We say: “toothache and catch are independent, given cavity”. This is an example of a naïve Bayes model. We will study this later as our simplest machine learning application P(Cause,Effect 1, …,Effect n ) = P(Cause) π i P(Effect i |Cause) Total number of parameters is linear in n (number of symptoms). This is our first Bayesian inference net.

Conditional independence P(Toothache, Cavity, Catch) has 2 3 – 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(catch | toothache, cavity) = P(catch | cavity) The same independence holds if I haven't got a cavity: (2) P(catch | toothache,  cavity) = P(catch |  cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache,Cavity) = P(Catch | Cavity) Equivalent statements (from original definitions of independence): P(Toothache | Catch, Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)

Conditional independence contd. Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) I.e., = 5 independent numbers In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments.

Remember this examples

Example of conditional independence

Test your understanding of the Chain Rule

This is our second Bayesian inference net

How to construct a Bayes Net

Test your understanding: design a Bayes net with plausible numbers

Calculating using Bayes’ Nets