ECE457 Applied Artificial Intelligence Spring 2008 Lecture #8

Slides:



Advertisements
Similar presentations
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Advertisements

PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Probability.
Uncertainty Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 13.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Probability Notation Review Prior (unconditional) probability is before evidence is obtained, after is posterior or conditional probability P(A) – Prior.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review (Chapter 13)
Artificial Intelligence Uncertainty & probability Chapter 13, AIMA.
Uncertainty Chapter 13.
CSCI 121 Special Topics: Bayesian Network Lecture #1: Reasoning Under Uncertainty.
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
Uncertainty1 Uncertainty Russell and Norvig: Chapter 14 Russell and Norvig: Chapter 13 CS121 – Winter 2003.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
Probability & Statistics I IE 254 Exam I - Reminder  Reminder: Test 1 - June 21 (see syllabus) Chapters 1, 2, Appendix BI  HW Chapter 1 due Monday at.
Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9.
Probability and naïve Bayes Classifier Louis Oliphant cs540 section 2 Fall 2005.
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
AP STATISTICS LESSON 6.3 (DAY 1) GENERAL PROBABILITY RULES.
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
Chapter 13 February 19, Acting Under Uncertainty Rational Decision – Depends on the relative importance of the goals and the likelihood of.
Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY
12/7/20151 Math b Conditional Probability, Independency, Bayes Theorem.
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
Education as a Signaling Device and Investment in Human Capital Topic 3 Part I.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.
Textbook Basics of an Expert System: – “Expert systems: Design and Development,” by: John Durkin, 1994, Chapters 1-4. Uncertainty (Probability, Certainty.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
STATISTICS 6.0 Conditional Probabilities “Conditional Probabilities”
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 4-1 Chapter 4 Basic Probability Business Statistics: A First Course 5 th Edition.
Decision Making ECE457 Applied Artificial Intelligence Spring 2007 Lecture #10.
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #8
CMPT 310 CHAPTER 13 Oliver Schulte
Introduction to Artificial Intelligence
Chapter 4 Probability.
Chapter 4 Basic Probability.
Quick Review Probability Theory
Quick Review Probability Theory
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #10
ECE457 Applied Artificial Intelligence Spring 2008 Lecture #10
From last time: on-policy vs off-policy Take an action Observe a reward Choose the next action Learn (using chosen action) Take the next action Off-policy.
Uncertainty Chapter 13.
Basic Probability aft A RAJASEKHAR YADAV.
Basic Probabilistic Reasoning
Probability and Information
Introduction to Artificial Intelligence
Probability Topics Random Variables Joint and Marginal Distributions
Probability and Information
Statistical NLP: Lecture 4
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
Class #21 – Monday, November 10
LESSON 5: PROBABILITY Outline Probability Events
Chapter 5 – Probability Rules
Basic Probability Chapter Goal:
basic probability and bayes' rule
Naïve Bayes Classifier
Chapter 1 Probability Spaces
Presentation transcript:

ECE457 Applied Artificial Intelligence Spring 2008 Lecture #8 Uncertainty ECE457 Applied Artificial Intelligence Spring 2008 Lecture #8

Outline Uncertainty Probability Bayes’ Theorem Russell & Norvig, chapter 13 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 2

Limit of FOL FOL works only when facts are known to be true or false “Some purple mushrooms are poisonous” x Purple(x)  Mushroom(x)  Poisonous(x) In real life there is almost always uncertainty “There’s a 70% chance that a purple mushroom is poisonous” Can’t be represented as FOL sentence ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 3

Acting Under Uncertainty So far, rational decision is to pick action with “best” outcome Two actions #1 leads to great outcome #2 leads to good outcome It’s only rational to pick #1 Assumes outcome is 100% certain What if outcome is not certain? #1 has 1% probability to lead to great outcome #2 has 90% probability to lead to good outcome What is the rational decision? ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 4

Acting Under Uncertainty Maximum Expected Utility (MEU) Pick action that leads to best outcome averaged over all possible outcomes of the action Same principle as Expectiminimax, used to solve games of chance (see Game Playing, lecture #5) How do we compute the MEU? First, we need to compute the probability of each event ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 5

Types of Uncertain Variables Boolean Can be true or false Warm  {True, False} Discrete Can take a value from a limited, countable domain Temperature  {Hot, Warm, Cool, Cold} Continuous Can take a value from a set of real numbers Temperature  [-35, 35] We’ll focus on discrete variables for now ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 6

Probability Each possible value in the domain of an uncertain variable is assigned a probability Represents how likely it is that the variable will have this value P(Temperature=Warm) Probability that the “Temperature” variable will have the value “Warm” We can simply write P(Warm) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 7

Probability Axioms P(x)  [0, 1] P(x) = 1 P(x) = 0 x is necessarily true, or certain to occur P(x) = 0 x is necessarily false, or certain not to occur P(A  B) = P(A) + P(B) – P(A  B) P(A  B) = 0 A and B are said to be mutually exclusive  P(x) = 1 If all values of x are mutually exclusive ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 8

Prior (Unconditional) Probability Probability that A is true in the absence of any other information P(A) Example P(Temperature=Hot) = 0.2 P(Temperature=Warm) = 0.6 P(Temperature=Cool) = 0.15 P(Temperature=Cold) = 0.05 P(Temperature) = {0.2, 0.6, 0.15, 0.05} This is a probability distribution ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 9

Joint Probability Distribution Let’s add another variable Condition  {Sunny, Cloudy, Raining} We can compute P(Temperature,Condition) Sunny Cloudy Raining Hot 0.12 0.05 0.03 Warm 0.23 0.2 0.17 Cool 0.02 0.08 Cold 0.01 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 10

Joint Probability Distribution Given a joint probability distribution P(a,b), we can compute P(a=Ai) P(Ai) = j P(Ai,Bj) Assumes all events (Ai,Bj) are mutually exclusive This is called marginalization P(Warm) = P(Warm,Sunny) + P(Warm,Cloudy) + P(Warm,Raining) P(Warm) = 0.23 + 0.2 + 0.17 P(Warm) = 0.6 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 11

Posterior (Conditional) Probability Probability that A is true given that we know that B is true P(A|B) Can be computed using prior and joint probability P(A|B) = P(A,B) / P(B) P(Warm|Cloudy) = P(Warm,Cloudy) / P(Cloudy) P(Warm|Cloudy) = 0.2 / 0.32 P(Warm|Cloudy) = 0.625 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 12

Bayes’ Theorem Start from previous conditional probability equation P(A|B)P(B) = P(A,B) P(B|A)P(A) = P(B,A) P(A|B)P(B) = P(B|A)P(A) P(A|B) = P(B|A)P(A) / P(B) (important!) P(A|B): Posterior probability P(A): Prior probability P(B|A): Likelihood P(B): Normalizing constant ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 13

Bayes’ Theorem Allows us to compute P(A|B) without knowing P(A,B) In many real-life situations, P(A|B) cannot be measured directly, but P(B|A) is available Bayes’ Theorem underlies all modern probabilistic AI systems ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 14

Bayes’ Theorem Example #1 We want to design a classifier (for email spam) Compute the probability that an item belongs to class C (spam) given that it exhibits feature F (the word “Viagra”) We know that 20% of items in the world belong to class C 90% of items in class C exhibit feature F 40% of items in the world exhibit feature F P(C|F) = P(F|C) * P(C) / P(F) P(C|F) = 0.9 * 0.2 / 0.4 P(C|F) = 0.45 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 15

Bayes’ Theorem Example #2 A drug test returns “positive” if drugs are detected in an athlete’s system, but it can make mistakes If an athlete took drugs, 99% chance of + If an athlete didn’t take drugs, 10% chance of + 5% of athletes take drugs What’s the probability that an athlete who tested positive really does take drugs? P(drug|+) = P(+|drug) * P(drug) / P(+) P(+) = P(+|drug)P(drug) + P(+|nodrug)P(nodrug) P(+) = 0.99 * 0.05 + 0.1*0.95 = 0.1445 P(drug|+) = 0.99 * 0.05 / 0.1445 = 0.3426 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 16

Bayes’ Theorem We computed the normalizing constant using marginalization! P(B) = i P(B|Ai)P(Ai) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 17

Chain Rule Recall that P(A,B) = P(A|B)P(B) Can be extended to multiple variables Extend to three variables P(A,B,C) = P(A|B,C)P(B,C) P(A,B,C) = P(A|B,C)P(B|C)P(C) General form P(A1,A2,…,An) = P(A1|A2,…,An)P(A2|A3,…,An)…P(An-1|An)P(An) Compute full joint probability distribution Simple if variables conditionally independent ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 18

Independence Two variables are independent if knowledge of one does not affect the probability of the other P(A|B) = P(A) P(B|A) = P(B) P(A  B) = P(A)P(B) Impact on chain rule P(A1,A2,…,An) = P(A1)P(A2)…P(An) P(A1,A2,…,An) = i=1n P(Ai) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 19

Conditional Independence Independence is hard to satisfy Two variables are conditionally independent given a third if knowledge of one does not affect the probability of the other if the value of the third is known P(A|B,C) = P(A|C) P(B|A,C) = P(B|C) Impact on chain rule P(A1,A2,…,An) = P(A1|An)P(A2|An)…P(An-1|An)P(An) P(A1,A2,…,An) = P(An) i=1n-1 P(Ai|An) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 20

Bayes’ Theorem Example #3 We want to design a classifier Compute the probability that an item belongs to class C given that it exhibits features F1 to Fn We know % of items in the world that belong to class C % of items in class C that exhibit feature Fi % of items in the world exhibit features F1 to Fn P(C|F1,…,Fn) = P(F1,…,Fn|C)*P(C)/P(F1,…,Fn) P(F1,…,Fn|C) * P(C) = P(C,F1,…,Fn) by chain rule P(C,F1,…,Fn) = P(C) i P(Fi|C) assuming features are conditionally independent given the class P(C|F1,…,Fn) = P(C) i P(Fi|C) / P(F1,…,Fn) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 21

Naïve Bayes Classifier P(F1,…,Fn) Independent of class C In multi-class problems, it makes no difference! P(C|F1,…,Fn) = P(C) i P(Fi|C) This is called the Naïve Bayes Classifier “Naïve” because it assumes conditional independence of Fi given C whether it’s actually true or not Often used in practice in cases where Fi are not conditionally independent given C, with very good results ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 22