CS 188: Artificial Intelligence Spring 2007 Lecture 11: Probability 2/20/2007 Srini Narayanan – ICSI and UC Berkeley.

Slides:



Advertisements
Similar presentations
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Advertisements

PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) March, 17, 2010.
CPSC 422 Review Of Probability Theory.
Probability.
CS 188: Artificial Intelligence Fall 2009 Lecture 13: Probability 10/8/2009 Dan Klein – UC Berkeley 1.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
Uncertainty Logical approach problem: we do not always know complete truth about the environment Example: Leave(t) = leave for airport t minutes before.
CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes’ Nets 10/13/2009 Dan Klein – UC Berkeley.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes Feb 28 and March 13-15, 2012.
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider:  x Symptom(x, Toothache)  Disease(x, Cavity). The problem is that this.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
Advanced Artificial Intelligence
Announcements  Midterm 1  Graded midterms available on pandagrader  See Piazza for post on how fire alarm is handled  Project 4: Ghostbusters  Due.
Probability Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 20, 2012.
Probabilistic Models  Models describe how (a portion of) the world works  Models are always simplifications  May not account for every variable  May.
Probability and naïve Bayes Classifier Louis Oliphant cs540 section 2 Fall 2005.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
QUIZ!! T/F: Probability Tables PT(X) do not always sum to one. FALSE
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
Uncertainty 1. 2 ♦ Uncertainty ♦ Probability ♦ Syntax and Semantics ♦ Inference ♦ Independence and Bayes’ Rule Outline.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Uncertainty Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 511a: Artificial Intelligence Spring 2013 Lecture 13: Probability/ State Space Uncertainty 03/18/2013 Robert Pless Class directly adapted from Kilian.
Outline [AIMA Ch 13] 1 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty & Probability CIS 391 – Introduction to Artificial Intelligence AIMA, Chapter 13 Many slides adapted from CMSC 421 (U. Maryland) by Bonnie.
CS 188: Artificial Intelligence Spring 2007
Bayesian networks (1) Lirong Xia Spring Bayesian networks (1) Lirong Xia Spring 2017.
From last time: on-policy vs off-policy Take an action Observe a reward Choose the next action Learn (using chosen action) Take the next action Off-policy.
Probability Lirong Xia Spring Probability Lirong Xia Spring 2017.
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Our Status We’re done with Part I (for now at least) Search and Planning! Reference AI winter --- logic (exceptions, etc. + so much knowledge --- can.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
CS 4/527: Artificial Intelligence
Uncertainty.
Uncertainty in Environments
Probability Topics Random Variables Joint and Marginal Distributions
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
CSE 473: Artificial Intelligence Autumn 2011
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
CAP 5636 – Advanced Artificial Intelligence
Uncertainty Logical approach problem: we do not always know complete truth about the environment Example: Leave(t) = leave for airport t minutes before.
CS 188: Artificial Intelligence Spring 2006
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Probability Lirong Xia.
Uncertainty Chapter 13.
Bayesian networks (1) Lirong Xia. Bayesian networks (1) Lirong Xia.
Uncertainty Chapter 13.
Presentation transcript:

CS 188: Artificial Intelligence Spring 2007 Lecture 11: Probability 2/20/2007 Srini Narayanan – ICSI and UC Berkeley

Announcements  HW1 graded  Solutions to HW 2 posted Wednesday  HW 3 due Thursday 11:59 PM

Today  Probability  Random Variables  Joint and Conditional Distributions  Bayes Rule  Independence  You’ll need all this stuff for the next few weeks, so make sure you go over it!

What is this? Uncertainty

 Let action A t = leave for airport t minutes before flight  Will A t get me there on time?  Problems:  partial observability (road state, other drivers' plans, etc.)  noisy sensors (KCBS traffic reports)  uncertainty in action outcomes (flat tire, etc.)  immense complexity of modeling and predicting traffic  A purely logical approach either  Risks falsehood: “A 25 will get me there on time” or  Leads to conclusions that are too weak for decision making:  “A 25 will get me there on time if there's no accident on the bridge, and it doesn't rain, and my tires remain intact, etc., etc.''  A 1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport…

Probabilities  Probabilistic approach  Given the available evidence, A 25 will get me there on time with probability 0.04  P(A 25 | no reported accidents) = 0.04  Probabilities change with new evidence:  P(A 25 | no reported accidents, 5 a.m.) = 0.15  P(A 25 | no reported accidents, 5 a.m., raining) = 0.08  i.e., observing evidence causes beliefs to be updated

Probabilities Everywhere?  Not just for games of chance!  I’m snuffling: am I sick?  contains “FREE!”: is it spam?  Tooth hurts: have cavity?  Safe to cross street?  60 min enough to get to the airport?  Robot rotated wheel three times, how far did it advance?  Why can a random variable have uncertainty?  Inherently random process (dice, etc)  Insufficient or weak evidence  Unmodeled variables  Ignorance of underlying processes  The world’s just noisy!

Probabilistic Models  CSP/Prop Logic:  Variables with domains  Constraints: map from assignments to true/false  Ideally: only certain variables directly interact  Probabilistic models:  (Random) variables with domains  Joint distributions: map from assignments (or outcomes) to positive numbers  Normalized: sum to 1.0  Ideally: only certain variables are directly correlated ABP warmsun0.4 warmrain0.1 coldsun0.2 coldrain0.3 ABP warmsunT warmrainF coldsunF coldrainT

Random Variables  A random variable is some aspect of the world about which we have uncertainty  R = Is it raining?  D = How long will it take to drive to work?  L = Where am I?  We denote random variables with capital letters  Like in a CSP, each random variable has a domain  R in {true, false}  D in [0,  ]  L in possible locations

Distributions on Random Vars  A joint distribution over a set of random variables: is a map from assignments (or outcomes, or atomic events) to reals:  Size of distribution if n variables with domain sizes d?  Must obey:  For all but the smallest distributions, impractical to write out TSP warmsun0.4 warmrain0.1 coldsun0.2 coldrain0.3

Examples  An event is a set E of assignments (or outcomes)  From a joint distribution, we can calculate the probability of any event  Probability that it’s warm AND sunny?  Probability that it’s warm?  Probability that it’s warm OR sunny? TSP warmsun0.4 warmrain0.1 coldsun0.2 coldrain0.3

Marginalization  Marginalization (or summing out) is projecting a joint distribution to a sub-distribution over subset of variables TSP warmsun0.4 warmrain0.1 coldsun0.2 coldrain0.3 TP warm0.5 cold0.5 SP sun0.6 rain0.4

Conditional Probabilities  A conditional probability is the probability of an event given another event (usually evidence) TSP warmsun0.4 warmrain0.1 coldsun0.2 coldrain0.3

Conditional Probabilities  Conditional or posterior probabilities:  E.g., P(cavity | toothache) = 0.8  Given that toothache is all I know…  Notation for conditional distributions:  P(cavity | toothache) = a single number  P(Cavity, Toothache) = 2x2 table summing to 1  P(Cavity | Toothache) = Two 2-element vectors, each summing to 1  If we know more:  P(cavity | toothache, catch) = 0.9  P(cavity | toothache, cavity) = 1  Note: the less specific belief remains valid after more evidence arrives, but is not always useful  New evidence may be irrelevant, allowing simplification:  P(cavity | toothache, traffic) = P(cavity | toothache) = 0.8  This kind of inference, guided by domain knowledge, is crucial

Conditioning  Conditional probabilities are the ratio of two probabilities: TSP warmsun0.4 warmrain0.1 coldsun0.2 coldrain0.3

Normalization Trick  A trick to get the whole conditional distribution at once:  Get the joint probabilities for each value of the query variable  Renormalize the resulting vector TSP warmsun0.4 warmrain0.1 coldsun0.2 coldrain0.3 TP warm0.1 cold0.3 TP warm0.25 cold0.75 Select Normalize

The Product Rule  Sometimes joint P(X,Y) is easy to get  Sometimes easier to get conditional P(X|Y)  Example: P(sun, dry)? RP sun0.8 rain0.2 DSP wetsun0.1 drysun0.9 wetrain0.7 dryrain0.3 DSP wetsun0.08 drysun0.72 wetrain0.14 dryrain0.06

Lewis Carroll's Sack Problem  Sack contains a red or blue token, 50/50  We add a red token  If we draw a red token, what’s the chance of drawing a second red token?  Variables:  F={r,b} is the original token  D={r,b} is the first token we draw  Query: P(F=r|D=r) FP r0.5 b FDP rr rb br bb FDP rr1.0 rb0.0 br0.5 bb

Lewis Carroll's Sack Problem  Now we have P(F,D)  Want P(F=r|D=r) FDP rr0.5 rb0.0 br0.25 bb

Bayes’ Rule  Two ways to factor a joint distribution over two variables:  Dividing, we get:  Why is this at all helpful?  Lets us invert a conditional distribution  Often the one conditional is tricky but the other simple  Foundation of many systems we’ll see later (e.g. ASR, MT)  In the running for most important AI equation! That’s my rule!

More Bayes’ Rule  Diagnostic probability from causal probability:  Example:  m is meningitis, s is stiff neck  Note: posterior probability of meningitis still very small  Note: you should still get stiff necks checked out! Why?

Inference by Enumeration  P(sun)?  P(sun | winter)?  P(sun | winter, warm)? STRP summerwarmsun0.30 summerwarmrain0.05 summercoldsun0.10 summercoldrain0.05 winterwarmsun0.10 winterwarmrain0.05 wintercoldsun0.15 wintercoldrain0.20

Inference by Enumeration  General case:  Evidence variables:  Query variables:  Hidden variables:  We want:  First, select the entries consistent with the evidence  Second, sum out H:  Finally, normalize the remaining entries to conditionalize  Obvious problems:  Worst-case time complexity O(d n )  Space complexity O(d n ) to store the joint distribution All variables

Independence  Two variables are independent if:  This says that their joint distribution factors into a product two simpler distributions  Independence is a modeling assumption  Empirical joint distributions: at best “close” to independent  What could we assume for {Weather, Traffic, Cavity}?  How many parameters in the joint model?  How many parameters in the independent model?  Independence is like something from CSPs: what?

Example: Independence  N fair, independent coin flips: H0.5 T H T H T

Example: Independence?  Arbitrary joint distributions can be poorly modeled by independent factors TSP warmsun0.4 warmrain0.1 coldsun0.2 coldrain0.3 TSP warmsun0.3 warmrain0.2 coldsun0.3 coldrain0.2 TP warm0.5 cold0.5 SP sun0.6 rain0.4

Conditional Independence  P(Toothache,Cavity,Catch) has 2 3 = 8 entries (7 independent entries)  If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache:  P(catch | toothache, cavity) = P(catch | cavity)  The same independence holds if I don’t have a cavity:  P(catch | toothache,  cavity) = P(catch|  cavity)  Catch is conditionally independent of Toothache given Cavity:  P(Catch | Toothache, Cavity) = P(Catch | Cavity)  Equivalent statements:  P(Toothache | Catch, Cavity) = P(Toothache | Cavity)  P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)

Conditional Independence  Unconditional (absolute) independence is very rare (why?)  Conditional independence is our most basic and robust form of knowledge about uncertain environments:  What about this domain:  Traffic  Umbrella  Raining  What about fire, smoke, alarm?

The Chain Rule II  Can always factor any joint distribution as an incremental product of conditional distributions  Why?  This actually claims nothing…  What are the sizes of the tables we supply?

The Chain Rule III  Trivial decomposition:  With conditional independence:  Conditional independence is our most basic and robust form of knowledge about uncertain environments  Graphical models (next class) will help us work with independence

The Chain Rule IV  Write out full joint distribution using chain rule:  P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) Cav TCat P(Cavity) P(Toothache | Cavity)P(Catch | Cavity) Graphical model notation: Each variable is a node The parents of a node are the other variables which the decomposed joint conditions on MUCH more on this to come!

Combining Evidence P(cavity | toothache, catch) =  P(toothache, catch | cavity) P(cavity) =  P(toothache | cavity) P(catch | cavity) P(cavity)  This is an example of a naive Bayes model: C E1E1 EnEn E2E2