CS 511a: Artificial Intelligence Spring 2013 Lecture 13: Probability/ State Space Uncertainty 03/18/2013 Robert Pless Class directly adapted from Kilian.

Slides:

Advertisements

Similar presentations

PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :

Advertisements

Lirong Xia Hidden Markov Models Tue, March 28, 2014.

Planning under Uncertainty

CS 188: Artificial Intelligence Spring 2007 Lecture 11: Probability 2/20/2007 Srini Narayanan – ICSI and UC Berkeley.

CPSC 422 Review Of Probability Theory.

CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

CS 188: Artificial Intelligence Fall 2009 Lecture 13: Probability 10/8/2009 Dan Klein – UC Berkeley 1.

CS 188: Artificial Intelligence Fall 2009 Lecture 16: Bayes’ Nets III – Inference 10/20/2009 Dan Klein – UC Berkeley.

KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.

Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.

CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.

CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.

CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.

CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5) March, 27, 2009.

Announcements Homework 8 is out Final Contest (Optional)

CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.

CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.

CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes’ Nets 10/13/2009 Dan Klein – UC Berkeley.

Announcements  Midterm 1  Graded midterms available on pandagrader  See Piazza for post on how fire alarm is handled  Project 4: Ghostbusters  Due.

Probability Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:

CSE 573: Artificial Intelligence Autumn 2012

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.

Probabilistic Models  Models describe how (a portion of) the world works  Models are always simplifications  May not account for every variable  May.

CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.

1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Jim Little Uncertainty 2 Nov 3, 2014 Textbook §6.1.3.

Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.

Announcements Project 4: Ghostbusters Homework 7

Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.

CSE 473: Artificial Intelligence Spring 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

QUIZ!! T/F: Probability Tables PT(X) do not always sum to one. FALSE

Computer Science CPSC 322 Lecture 27 Conditioning Ch Slide 1.

MDPs (cont) & Reinforcement Learning

QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.

Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.

Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.

Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.

CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.

CSE 473: Artificial Intelligence Spring 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart.

CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.

CSE 473: Artificial Intelligence Spring 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart.

Probabilistic Reasoning

From last time: on-policy vs off-policy Take an action Observe a reward Choose the next action Learn (using chosen action) Take the next action Off-policy.

Probability Lirong Xia Spring Probability Lirong Xia Spring 2017.

Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.

Our Status We’re done with Part I (for now at least) Search and Planning! Reference AI winter --- logic (exceptions, etc. + so much knowledge --- can.

Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Artificial Intelligence

Quizzz Rihanna’s car engine does not start (E).

CS 4/527: Artificial Intelligence

CS 4/527: Artificial Intelligence

CAP 5636 – Advanced Artificial Intelligence

Probabilistic Reasoning

Probability Topics Random Variables Joint and Marginal Distributions

Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence

Probabilistic Reasoning

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence Fall 2008

Probabilistic Reasoning

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

CS 188: Artificial Intelligence Fall 2008

Hidden Markov Models Lirong Xia.

Probability Lirong Xia.

Presentation transcript:

CS 511a: Artificial Intelligence Spring 2013 Lecture 13: Probability/ State Space Uncertainty 03/18/2013 Robert Pless Class directly adapted from Kilian Weinberger, with multiple slides adapted from Dan Klein and Deva Ramanan

Announcements  Upcoming  Project 3 out now(-ish)  Due Monday, April 1, 11:59pm + 1 minute  Midterm graded, to be returned on Wednesday. 2

Course Status:  An MDP:  set of states s  S  set of actions a  A  transition function T(s,a,s’)  Prob that a from s leads to s’  reward function R(s, a, s’)  Sometimes just R(s) or R(s’)  start state (or distribution)  one or many terminal states  Solution is a policy dictating action to take in each state.  Reinforcement learning: MDPs where we don’t know the transition or reward functions  A search problem:  set of states s  S  set of actions a  A  start state  goal test  cost function  Solution is a sequence of actions (a plan) which transforms the start state to a goal state

Uncertainty  General situation:  Evidence: Agent knows certain things about the state of the world (e.g., sensor readings or symptoms)  Hidden variables: Agent needs to reason about other aspects (e.g. where an object is or what disease is present, or how sensor is bad.)  Model: Agent knows something about how the known variables relate to the unknown variables  Probabilistic reasoning gives us a framework for managing our beliefs and knowledge 4

Inference in Ghostbusters  A ghost is in the grid somewhere  Sensor readings tell how close a square is to the ghost  On the ghost: red  1 or 2 away: orange  3 or 4 away: yellow  5+ away: green P(red | 3)P(orange | 3)P(yellow | 3)P(green | 3)  Sensors are noisy, but we know P(Color | Distance) [Demo]

Today  Goal:  Modeling and using distributions over LARGE numbers of random variables  Probability  Random Variables  Joint and Marginal Distributions  Conditional Distribution  Product Rule, Chain Rule, Bayes’ Rule  Inference  Independence  You’ll need all this stuff A LOT for the next few weeks, so make sure you go over it now! 6

Random Variables  A random variable is some aspect of the world about which we (may) have uncertainty  R = Is it raining?  D = How long will it take to drive to work?  L = Where am I?  We denote random variables with capital letters  Like variables in a CSP, random variables have domains  R in {true, false} (sometimes write as {+r,  r})  D in [0,  )  L in possible locations, maybe {(0,0), (0,1), …} 7

Probabilistic Models 8

Probability Distributions  Unobserved random variables have distributions  A distribution is a TABLE of probabilities of values  A probability (lower case value) is a single number  Must have: 9 TP warm0.5 cold0.5 WP sun0.6 rain0.1 fog meteor

Joint Distributions  A joint distribution over a set of random variables: specifies a real number for each assignment (or outcome):  Size of distribution if n variables with domain sizes d?  Must obey:  Joint Distribution Table list probabilities of all combinations. For all but the smallest distributions, this is impractical to write out TWP hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 10

Probabilistic Models  A probabilistic model is a joint distribution over a set of random variables  Probabilistic models:  (Random) variables with domains Assignments are called outcomes  Joint distributions: say whether assignments (outcomes) are likely  Normalized: sum to 1.0  Ideally: only certain variables directly interact  Constraint satisfaction probs:  Variables with domains  Constraints: state whether assignments are possible  Ideally: only certain variables directly interact TWP hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 TWP hotsunT hotrainF coldsunF coldrainT Distribution over T,W Constraint over T,W

Events  An event is a set E of outcomes  From a joint distribution, we can calculate the probability of any event  Probability that it’s hot AND sunny?  Probability that it’s hot?  Probability that it’s hot OR sunny?  Typically, the events we care about are partial assignments, like P(T=hot) TWP hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 12

Marginal Distributions  Marginal distributions are sub-tables which eliminate variables  Marginalization (summing out): Combine collapsed rows by adding TWP hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 TP hot0.5 cold0.5 WP sun0.6 rain0.4 13

Conditional Probabilities  A simple relation between joint and conditional probabilities  In fact, this is taken as the definition of a conditional probability TWP hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 14

Conditional Distributions  Conditional distributions are probability distributions over some variables given fixed values of others TWP hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 WP sun0.8 rain0.2 WP sun0.4 rain0.6 Conditional Distributions Joint Distribution 15

Normalization Trick  A trick to get a whole conditional distribution at once:  Select the joint probabilities matching the evidence  Normalize the selection (scale values so table sums to one)  Compute (T | W = rain)  Why does this work? Sum of selection is P(evidence)! (here, P(r)) TWP hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 TWP hotrain0.1 coldrain0.3 TP hot0.25 cold0.75 Select Normalize 16

The Product Rule  Sometimes have conditional distributions but want the joint  Example with new weather condition, “dryness”. WP sun0.8 rain0.2 DWP wetsun0.1 drysun0.9 wetrain0.7 dryrain0.3 DWP wetsun0.08 drysun0.72 wetrain0.14 dryrain

Inference 18

Probabilistic Inference  Probabilistic inference: compute a desired probability from other known probabilities (e.g. conditional from joint)  We generally compute conditional probabilities  P(on time | no reported accidents) = 0.90  These represent the agent’s beliefs given the evidence  Probabilities change with new evidence:  P(on time | no accidents, 5 a.m.) = 0.95  P(on time | no accidents, 5 a.m., raining) = 0.80  Observing new evidence causes beliefs to be updated 19

Inference by Enumeration  P(W=sun)?  P(W=sun | S=winter)?  P(W=sun | S=winter, T=hot)? STWP summerhotsun0.30 summerhotrain0.05 summercoldsun0.10 summercoldrain0.05 winterhotsun0.10 winterhotrain0.05 wintercoldsun0.15 wintercoldrain

Inference by Enumeration  General case:  Evidence variables:  Query* variable:  Hidden variables:  We want:  First, select the entries consistent with the evidence  Second, sum out H to get joint of Query and evidence:  Finally, normalize the remaining entries to conditionalize  Obvious problems:  Worst-case time complexity O(d n )  Space complexity O(d n ) to store the joint distribution All variables * Works fine with multiple query variables, too

Monty Hall Problem 22  Car behind one of three doors  Contestant picks door #1  Host (Monty) shows that behind door #2 is no car  Should contest switch to door #3 or stay at #1?  Random Variables:  C(=car), M(=Monty’s pick)  What is the query, evidence?  What is P(C)?  What are the CPTs P(M|C=c)  How do you solve for P(C|evidence)?

Probability Tables 23 Assume (without loss of generality) candidate picks door 1. CP door1 door2 door3 M|C=1P door1 door2 door3 M|C=2P door1 door2 door3 M|C=3P door1 door2 door3

Probability Tables 24 MCP door1 door2door1 door3door1 door2 door3door2 door1door3 door2door3 Assume (without loss of generality) candidate picks door 1. CP door11/3 door21/3 door31/3 M|C=1P door10 door20.5 door30.5 M|C=2P door10 door20 door31 M|C=3P door10 door21 door30

Probability Tables 25 MCP door1 0 door2door11/6 door3door11/6 door1door20 0 door3door21/3 door1door30 door2door31/3 door3 0 Assume (without loss of generality) candidate picks door 1. CP door11/3 door21/3 door31/3 M|C=1P door10 door20.5 door30.5 M|C=2P door10 door20 door31 M|C=3P door10 door21 door30

Probability Tables 26 MCP door1 0 door2door11/6 door3door11/6 door1door20 0 door3door21/3 door1door30 door2door31/3 door3 0 Assume (without loss of generality) candidate picks door 1. CP door11/3 door21/3 door31/3 C|M=door2P door1 door2 door3 M|C=1P door10 door20.5 door30.5 M|C=2P door10 door20 door31 M|C=3P door10 door21 door30

Probability Tables 27 M|C=1P door10 door20.5 door30.5 M|C=2P door10 door20 door31 M|C=3P door10 door21 door30 MCP door1 0 door2door11/6 door3door11/6 door1door20 0 door3door21/3 door1door30 door2door31/3 door3 0 Assume (without loss of generality) candidate picks door 1. CP door11/3 door21/3 door31/3 C|M=door2P door11/6 door20 door31/3 (un-normalized!!)

Probability Tables 28 M|C=1P door10 door20.5 door30.5 M|C=2P door10 door20 door31 M|C=3P door10 door21 door30 MCP door1 0 door2door11/6 door3door11/6 door1door20 0 door3door21/3 door1door30 door2door31/3 door3 0 Assume (without loss of generality) candidate picks door 1. CP door11/3 door21/3 door31/3 C|M=door2P door11/3 door20 door32/3

Humans don’t get it… 29

Humans don’t get it… 30

Pigeons … 31 SourceSource: Discover

Pigeons do! 32 SourceSource: Discover

Inference by Enumeration  General case:  Evidence variables:  Query* variable:  Hidden variables:  We want:  First, select the entries consistent with the evidence  Second, sum out H to get joint of Query and evidence:  Finally, normalize the remaining entries to conditionalize  Obvious problems:  Worst-case time complexity O(d n )  Space complexity O(d n ) to store the joint distribution All variables

The Chain Rule  More generally, can always write any joint distribution as an incremental product of conditional distributions  Why is this always true? 34 Parse this carefully

Bayes’ Rule  Two ways to factor a joint distribution over two variables:  Dividing, we get:  Why is this at all helpful?  Lets us build one conditional from its reverse  Often one conditional is tricky but the other one is simple  Foundation of many systems we’ll see later (e.g. Automated Speech Recognition, Machine Translation)  In the running for most important AI equation! That’s my rule! 35

Inference with Bayes’ Rule  Example: Diagnostic probability from causal probability:  Example:  m is meningitis, s is stiff neck  Note: posterior probability of meningitis still very small  Note: you should still get stiff necks checked out! Why? Example givens 36

Probability Tables 37 M|C=1P door10 door20.5 door30.5 M|C=2P door10 door20 door31 M|C=3P door10 door21 door30 Assume (without loss of generality) candidate picks door 1. CP door11/3 door21/3 door31/3 C|M=door2P door11/3 door20 door32/3

Ghostbusters, Revisited  Let’s say we have two distributions:  Prior distribution over ghost location: P(G)  Let’s say this is uniform  Sensor reading model: P(R | G)  Given: we know what our sensors do  R = reading color measured at (1,1)  E.g. P(R = yellow | G=(1,1)) = 0.1  We can calculate the posterior distribution P(G|r) over ghost locations given a reading using Bayes’ rule: 38

Summary  Probabilistic model:  All we need is joint probability table (JPT)  Can perform inference by enumeration  From JPT we can obtain CPT and marginals  (Can obtain JPT from CPT and marginals)  Bayes Rule can perform inference directly from CPT and marginals  Next several lectures:  How to avoid storing the entire JPT 39

Why might these tables get so big?  Markov property: p(pixel | rest of image) = p(pixel | neighborhood) ?

Example application domain: image repairp Synthesizing a pixel