Probabilistic Reasoning

Slides:

Advertisements

Similar presentations

Lirong Xia Probabilistic reasoning over time Tue, March 25, 2014.

Advertisements

Lirong Xia Hidden Markov Models Tue, March 28, 2014.

1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.

Advanced Artificial Intelligence

QUIZ!!  T/F: Rejection Sampling without weighting is not consistent. FALSE  T/F: Rejection Sampling (often) converges faster than Forward Sampling. FALSE.

CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

CS 188: Artificial Intelligence Fall 2009 Lecture 13: Probability 10/8/2009 Dan Klein – UC Berkeley 1.

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5) March, 27, 2009.

Announcements Homework 8 is out Final Contest (Optional)

CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.

CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.

CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,

CS 188: Artificial Intelligence Fall 2009 Lecture 18: Decision Diagrams 10/29/2009 Dan Klein – UC Berkeley.

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.

CS 188: Artificial Intelligence Fall 2008 Lecture 19: HMMs 11/4/2008 Dan Klein – UC Berkeley 1.

CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.

CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5.2) Nov, 25, 2013.

CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Jim Little Uncertainty 2 Nov 3, 2014 Textbook §6.1.3.

Announcements Project 4: Ghostbusters Homework 7

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.

Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring.

CSE 473: Artificial Intelligence Spring 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart.

CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.

CS 511a: Artificial Intelligence Spring 2013 Lecture 13: Probability/ State Space Uncertainty 03/18/2013 Robert Pless Class directly adapted from Kilian.

Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.

Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.

Probabilistic reasoning over time

Probabilistic Reasoning

From last time: on-policy vs off-policy Take an action Observe a reward Choose the next action Learn (using chosen action) Take the next action Off-policy.

Probability Lirong Xia Spring Probability Lirong Xia Spring 2017.

Our Status We’re done with Part I (for now at least) Search and Planning! Reference AI winter --- logic (exceptions, etc. + so much knowledge --- can.

Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Artificial Intelligence

CS 4/527: Artificial Intelligence

Markov ó Kalman Filter Localization

CS 4/527: Artificial Intelligence

CS 4/527: Artificial Intelligence

CAP 5636 – Advanced Artificial Intelligence

State Estimation Probability, Bayes Filtering

Probabilistic Reasoning over Time

Uncertainty in AI.

CS 188: Artificial Intelligence Spring 2007

Probability Topics Random Variables Joint and Marginal Distributions

Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2008

CAP 5636 – Advanced Artificial Intelligence

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence

Probabilistic Reasoning

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence Fall 2008

Probabilistic Reasoning

CAP 5636 – Advanced Artificial Intelligence

CS 188: Artificial Intelligence

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Hidden Markov Models Markov chains not so useful for most agents

CS 188: Artificial Intelligence Fall 2008

Hidden Markov Models Lirong Xia.

Announcements Assignments HW10 Due Wed 4/17 HW11

Probability Lirong Xia.

Instructor: Vincent Conitzer

Probabilistic reasoning over time

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Presentation transcript:

Probabilistic Reasoning We’ve looked at reasoning using logic expressions. The search space is exponential. Probabilistic reasoning uses other techniques that allow faster execution and estimate the solutions using probability theory. We’ll start with Enumeration, Markov Models, and Hidden Markov Models.

Example: Inference in Ghostbusters A ghost is in the grid somewhere Sensor readings tell how close a square is to the ghost On the ghost: red 1 or 2 away: orange 3 or 4 away: yellow 5+ away: green Sensors are noisy, but we know P(Color | Distance) P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) 0.05 0.15 0.5 0.3 [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Uncertainty General situation: Observed variables (evidence): Agent knows certain things about the state of the world (e.g., sensor readings or symptoms) Unobserved variables: Agent needs to reason about other aspects (e.g. where an object is or what disease is present) Model: Agent knows something about how the known variables relate to the unknown variables Probabilistic reasoning gives us a framework for managing our beliefs and knowledge

Where could this be used instead of Ghostbusters? computer vision

Probabilistic Models Distribution over T,W Constraint over T,W A probabilistic model is a joint distribution over a set of random variables Probabilistic models: (Random) variables with domains Assignments are called outcomes Joint distributions: say whether assignments (outcomes) are likely Normalized: sum to 1.0 Ideally: only certain variables directly interact vs Constraint satisfaction problems: Variables with domains Constraints: state whether assignments are possible T W P hot sun 0.4 rain 0.1 cold 0.2 0.3 Constraint over T,W T W P hot sun rain F cold

Events An event is a set E of outcomes W P hot sun 0.4 rain 0.1 cold 0.2 0.3 An event is a set E of outcomes From a joint distribution, we can calculate the probability of any event Probability that it’s hot AND sunny? Probability that it’s hot? Probability that it’s hot OR sunny? Typically, the events we care about are partial assignments, like P(T=hot)

Marginal Distributions Marginal distributions are sub-tables which eliminate variables Marginalization (summing out): Combine collapsed rows by adding T P hot 0.5 cold T W P hot sun 0.4 rain 0.1 cold 0.2 0.3 W P sun 0.6 rain 0.4

Conditional Distributions Conditional distributions are probability distributions over some variables given fixed values of others Conditional Distributions Joint Distribution W P sun 0.8 rain 0.2 T W P hot sun 0.4 rain 0.1 cold 0.2 0.3 W P sun 0.4 rain 0.6

Normalization Trick SELECT the joint probabilities matching the evidence NORMALIZE the selection (make it sum to one) T W P hot sun 0.4 rain 0.1 cold 0.2 0.3 W P sun 0.4 rain 0.6 T W P cold sun 0.2 rain 0.3

Probability Recap Conditional probability Product rule Chain rule X, Y independent if and only if: X and Y are conditionally independent given Z if and only if:

Inference by Enumeration * Works fine with multiple query variables, too General case: Evidence variables: Query* variable: Hidden variables: We want: All variables Step 1: Select the entries consistent with the evidence Step 2: Sum out H to get joint of Query and evidence Step 3: Normalize

Inference by Enumeration S T W P summer hot sun 0.30 rain 0.05 cold 0.10 winter 0.15 0.20 P(W)? P(W | winter)? P(W | winter, hot)?

Inference by Enumeration Obvious problems: If there are n variables and each one has d values. Worst-case time complexity O(dn) Space complexity O(dn) to store the joint distribution

Bayes’ Rule Two ways to factor a joint distribution over two variables: Dividing, we get: Why is this at all helpful? Lets us build one conditional from its reverse Often one conditional is tricky but the other one is simple Foundation of many systems In the running for most important AI eqn! That’s my rule!

Inference with Bayes’ Rule Example: Diagnostic probability from causal probability: Example: M: meningitis, S: stiff neck Example givens use of Bayes’ rule = .00008/.01007 = .0079

Ghostbusters, Revisited We are given two distributions: Prior distribution over ghost location: P(G) Let’s say this is uniform Sensor reading model: P(R | G) Given: we know what our sensors do R = reading color measured at square (1,1) And if we know P(R = yellow | G=(1,1)) = 0.1 We can calculate the posterior distribution P(G|r) over ghost locations given a reading using Bayes’ rule:

Now we have the machinery for: Markov Models Please retain proper attribution, including the reference to ai.berkeley.edu. Thanks! Note: pretty short lecture, good one to present mini-contest results or anything else not exactly lecture. [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Reasoning over Time or Space Often, we want to reason about a sequence of observations Speech recognition Robot localization User attention Medical monitoring Need to introduce time (or space) into our models

Markov Models Value of X at a given time is called the state Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial state probabilities) Stationarity assumption: transition probabilities the same at all times X1 X2 X3 X4

Joint Distribution of a Markov Model X1 X2 X3 X4 Joint distribution: More generally:

Chain Rule and Markov Models X1 X2 X3 X4 chain of 4 states From the chain rule, every joint distribution over can be written as: Assuming that and results in the expression posited on the previous slide:

Chain Rule and Markov Models X1 X2 X3 X4 chain of T states From the chain rule, every joint distribution over can be written as: Assuming that for all t: gives us the expression posited on the earlier slide:

Implied Conditional Independencies X1 X2 X3 X4 We assumed: and Do we also have ? Yes! Proof:

Markov Models Recap Explicit assumption for all t : Consequence, joint distribution can be written as: Implied conditional independencies: Past variables independent of future variables given the present i.e., if or then:

Example Markov Chain: Weather States: X = {rain, sun} Initial distribution: 1.0 sun CPT P(Xt | Xt-1): Two new ways of representing the same CPT Xt-1 Xt P(Xt|Xt-1) sun 0.9 rain 0.1 0.3 0.7 0.9 0.3 0.9 rain sun sun rain 0.1 0.3 0.7 0.7 0.1

Example Markov Chain: Weather 0.9 0.3 Initial distribution: 1.0 sun (Given) What is the probability distribution after one step? What happens for the second state? rain sun 0.7 0.1

Mini-Forward Algorithm Question: What’s P(X) on some day t? X1 X2 X3 X4 Forward simulation

Example Run of Mini-Forward Algorithm From initial observation of sun From initial observation of rain From yet another initial distribution P(X1): P(X1) P(X2) P(X3) P(X4) P(X) P(X1) P(X2) P(X3) P(X4) P(X) … P(X1) P(X)

Stationary Distributions For most chains: Influence of the initial distribution gets less and less over time. The distribution we end up in is independent of the initial distribution Stationary distribution: The distribution we end up with is called the stationary distribution of the chain It satisfies

Example: Stationary Distributions Question: What’s P(X) at time t = infinity? X1 X2 X3 X4 Xt-1 Xt P(Xt|Xt-1) sun 0.9 rain 0.1 0.3 0.7 Also:

Application of Stationary Distribution: Web Link Analysis PageRank over a web graph Each web page is a state Initial distribution: uniform over pages Transitions: With prob. c, uniform jump to a random page (dotted lines, not all shown) With prob. 1-c, follow a random outlink (solid lines) Stationary distribution Will spend more time on highly reachable pages E.g. many ways to get to the Acrobat Reader download page Somewhat robust to link spam Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors (rank actually getting less important over time) Before: most search engines were merely based on how well a page matches your search words. Note: currently dominated by clickstreams

Coming Up Next up is Hidden Markov Models Then Particle Filters Then Bayesian Networks