1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

Slides:

Advertisements

Similar presentations

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:

Advertisements

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Lirong Xia Probabilistic reasoning over time Tue, March 25, 2014.

Dynamic Bayesian Networks (DBNs)

Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.

Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Introduction of Probabilistic Reasoning and Bayesian Networks

Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Advanced Artificial Intelligence

. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}

Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.

… Hidden Markov Models Markov assumption: Transition model:

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California

. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Temporal Processes Eran Segal Weizmann Institute.

Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.

. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.

CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.

11/14  Continuation of Time & Change in Probabilistic Reasoning Project 4 progress? Grade Anxiety? Make-up Class  On Monday?  On Wednesday?

CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.

CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.

Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.

Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.

Introduction to Bayesian Networks

CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.

UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))

CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.

CS Statistical Machine learning Lecture 24

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.

Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.

Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.

Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.

CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.

CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.

Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.

CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.

Probabilistic reasoning over time

Instructor: Vincent Conitzer

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Probabilistic Reasoning over Time

Probabilistic Reasoning over Time

Probability and Time: Markov Models

Probability and Time: Markov Models

CS 188: Artificial Intelligence Spring 2007

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CAP 5636 – Advanced Artificial Intelligence

Probability and Time: Markov Models

A Short Introduction to the Bayes Filter and Related Models

CS 188: Artificial Intelligence

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Probability and Time: Markov Models

Instructor: Vincent Conitzer

Instructor: Vincent Conitzer

Probabilistic reasoning over time

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Presentation transcript:

1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013

2 Outline Reasoning under uncertainty over time  Hidden Markov Models  Dynamic Bayes Nets

3 Introduction So far we have assumed  The world does not change  Static probability distribution But the world does evolve over time  How can we use probabilistic inference for weather predictions, stock market predictions, patient monitoring, robot localization,...

World dynamics 4

5

6 Dynamic Inference

7 Stochastic Process Set of states: S Stochastic dynamics: P(s t |s t-1,...,s 0 ) Can be viewed as a Bayes Net with one random variable per time-slice s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4  P(s 0 )  P(s 1 |s 0 )  P(s 2 |s 1,s 0 )  P(s 3 |s 2,s 1, s 0 )  P(s 4 |s 3,...,s 0 )

8 Stochastic Process Problems:  Infinitely many variables  Infinitely large CPTs Solutions:  Stationary process: Dynamics do not change over time  Markov assumption: Current state depends only on a finite history of past states

9 k-Order Markov Process Assumption: last k states are sufficient First-order Markov process (Most commonly used)  P(s t |s t-1,...,s 0 )=P(s t |s t-1 ) Second-order Markov process  P(s t |s t-1,...,s 0 )=P(s t |s t-1,s t-2 ) s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4 s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4

10 k-Order Markov Process Advantages  Can specify the entire process using finitely many time slices Example: Two slices sufficient for a first- order Markov process  Graph:  Dynamics: P(s t |s t-1 )  Prior: P(s 0 ) s t-1 stststst

11 Example: Robot Localization Example of a first-order Markov process Robot’s belief about location Only last location is sufficient (no need for history) Problem: uncertainty increases over time Thrun et al

12 Hidden Markov Models In the previous example, the robot could use sensors to reduce location uncertainty In general:  States not directly observable (uncertainty captured by a distribution)  Uncertain dynamics increase state uncertainty  Observations: made via sensors can reduce state uncertainty Solution: Hidden Markov Model

13 First Order Hidden Markov Model (HMM) Set of states: S Set of observations: O Two models of uncertainties:  Transition model: P(s t |s t-1 )  Observation model: P(o t |s t ) Prior: P(s 0 ) s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4 o4o4o4o4 o3o3o3o3 o2o2o2o2 o1o1o1o1 Observed random variables: Hidden variables:

Markov chain vs. HMM Markov chain  Observable random variables 14 s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4 s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4 o4o4o4o4 o3o3o3o3 o2o2o2o2 o1o1o1o1 Observed random variables: Hidden variables: Hidden Markov Model

15 Example: Robot Localization Hidden Markov Model  S: (x,y) coordinates of the robot on the map  O: distances to surrounding obstacles (measured by laser range fingers or sonar)  P(s t |s t-1 ): movement of the robot with uncertainty  P(o t |s t ): uncertainty in the measurements provided by the sensors Localization corresponds to the query: P(s t |o t,...,o 1 ) All observations up to now

16 Inference

17 Monitoring We are interested in the distribution over current states given observations: P(s t |o t,...,o 1 )  Examples: patient monitoring, robot localization Forward algorithm: corresponds to variable elimination  Factors: P(s 0 ), P(s i |s i-1 ), P(o i |s i ) 1≤i≤t  Restrict o 1,...,o t to observations made  Sum out s 0,....,s t-1  ∑ s0...st-1 P(s 0 )∏ 1≤i≤t P(s i |s i-1 )P(o i |s i )

18 Prediction We are interested in distributions over future states given observations: P(s t+k |o t,...,o 1 )  Examples: weather prediction, stock market prediction Forward algorithm: corresponds to variable elimination  Factors: P(s 0 ), P(s i |s i-1 ), P(o i |s i ) 1≤i≤t+k  Restrict o 1,...,o t to observations made  Sum out s 0,....,s t+k-1, o t+1,..,o t+k  ∑ s0...st-1,ot+1,...,ot+k P(s 0 )∏ 1≤i≤t+k P(s i |s i-1 )P(o i |s i )

19 Hindsight Interested in the distribution over a past state given observations  Example: crime scene investigation Forward-backward algorithm: corresponds to variable elimination  Factors: P(s 0 ), P(s i |s i-1 ), P(o i |s i ) 1≤i≤t  Restrict o 1,...,o t to observations made  Sum out s 0,....,s k-1,s k+1,...,s t  ∑ s0...sk-1,sk+1,...st, P(s 0 )∏ 1≤i≤t P(s i |s i-1 )P(o i |s i )

Example: 20 s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4 o4o4o4o4 o3o3o3o3 o2o2o2o2 o1o1o1o1 Exercise: prove the above Hint: use Bayes rule Backward Forward

Example: 21 s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4 o4o4o4o4 o3o3o3o3 o2o2o2o2 o1o1o1o1 Exercise: prove the above Hint: use Bayes rule B A P(A|B) P(B)

Example: 22 s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4 o4o4o4o4 o3o3o3o3 o2o2o2o2 o1o1o1o1  Backward algorithm:  Forward algorithm: marginalization conditional independence

Example: 23 s0s0s0s0 s1s1s1s1 s2s2s2s2 s3s3s3s3 s4s4s4s4 o4o4o4o4 o3o3o3o3 o2o2o2o2 o1o1o1o1  Backward algorithm:  Forward algorithm: marginalization conditional independence 1 Bayes rule C.I. marginalization

24 Most Likely Explanation We are interested in the most likely sequence of states given the observations: argmax s0,...st P(s 0,...,s t |o t,...,o 1 )  Example: speech recognition Viterbi algorithm: Corresponds to a variant of variable elimination Similar to forward algorithm: except selecting the best  Factors: P(s 0 ), P(s i |s i-1 ), P(o i |s i ) 1≤i≤t  Restrict o 1,...,o t to observations made  Max out s 0,....,s t-1  max s0...st-1 P(s 0 )∏ 1≤i≤t P(s i |s i-1 )P(o i |s i )

25 Complexity of Temporal Inference Hidden Markov Models are Bayes Nets with a polytree structure Variable elimination is  Linear with respect to number of time slices  Linear with respect to largest CPT (P(s t |s t-1 ) or P(o t |s t ))

26 Dynamic Bayes Nets What if the number of states or observations are exponential? Dynamic Bayes Nets  Idea: Encode states and observations with several random variables  Advantage: Exploit conditional independence and save time and space  Note: HMMs are just DBNs with one state variable and one observation variable

27 Example: Robot Localization States: (x,y) coordinates and heading θ Observations: laser and sonar readings, la and so

DBN with conditional independence 28

29 DBN Complexity Conditional independence allows us to represent the transition and observation models very compactly! Time and space complexity of inference: conditional independence rarely helps  Inference tends to be exponential in the number of state variables  Intuition: All state variables eventually get correlated  No better than with HMMs

30 Non-Stationary Processes What if the process is not stationary?  Solution: Add new state components until dynamics are stationary  Example: Robot navigation based on (x,y,θ) is nonstationary when velocity varies  Solution: Add velocity to state description (x,y,v,θ)  If velocity varies, then add acceleration,...

31 Non-Markovian Processes What if the process is not Markovian?  Solution: Add new state components until the dynamics are Markovian  Example: Robot navigation based on (x,y,θ) is non-Markovian when influenced by battery level  Solution: Add battery level to state description (x,y,θ,b)

32 Markovian Stationary Processes Problem: Adding components to the state description to force a process to be Markovian and stationary may significantly increase computational complexity Solution: Try to find the smallest description that is self-sufficient (i.e. Markovian and stationary)

33 Summary Stochastic Process  Stationary : no change over time  Markov assumption : dependent on a subset of history not all Hidden Markov Process  Prediction  Monitoring  Hindsight  Most likely explanation Dynamic Bayes Nets What to do if the stationary or Markov assumptions do not hold