CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.

Slides:



Advertisements
Similar presentations
Probabilistic Reasoning over Time
Advertisements

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Dynamic Bayesian Networks (DBNs)
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Robot Localization Using Bayesian Methods
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
Observers and Kalman Filters
Advanced Artificial Intelligence
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
… Hidden Markov Models Markov assumption: Transition model:
10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Stanford CS223B Computer Vision, Winter 2007 Lecture 12 Tracking Motion Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David Stavens.
Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Stanford CS223B Computer Vision, Winter 2007 Lecture 12 Tracking Motion Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David Stavens.
Stanford CS223B Computer Vision, Winter 2006 Lecture 11 Filters / Motion Tracking Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
Particle Filtering. Sensors and Uncertainty Real world sensors are noisy and suffer from missing data (e.g., occlusions, GPS blackouts) Use sensor models.
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Computer vision: models, learning and inference Chapter 19 Temporal models.
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
Probabilistic Robotics Bayes Filter Implementations.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Announcements  Project 3 will be posted ASAP: hunting ghosts with sonar.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.1: Bayes Filter Jürgen Sturm Technische Universität München.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
An Introduction to Kalman Filtering by Arthur Pece
State Estimation and Kalman Filtering
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Lecture 2: Statistical learning primer for biologists
Decision Making Under Uncertainty CMSC 471 – Spring 2014 Class #12– Thursday, March 6 R&N, Chapters , material from Lise Getoor, Jean-Claude.
1 Chapter 15 Probabilistic Reasoning over Time. 2 Outline Time and UncertaintyTime and Uncertainty Inference: Filtering, Prediction, SmoothingInference:
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Kalman Filtering And Smoothing
Tracking with dynamics
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Dynamic Bayesian Network Fuzzy Systems Lifelog management.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
CSE 573: Artificial Intelligence Autumn 2012 Particle Filters for Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell,
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Probabilistic Reasoning over Time
Hidden Markov Models Part 2: Algorithms
CS 188: Artificial Intelligence Spring 2007
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Chapter14-cont..
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models

Re-cap of Lecture VII  Exact inference by variable elimination:  polytime on polytrees, NP-hard on general graphs  space = time, very sensitive to topology  Approximate inference by LW, MCMC:  LW does poorly when there is lots of (downstream) evidence  LW, MCMC generally insensitive to topology  Convergence can be very slow with probabilities close to 1 or 0  Can handle arbitrary combinations of discrete and continuous variables

Re-cap of Lecture IX  Machine Learning  Classification (Naïve Bayes)  Decision trees  Regression (Linear, Smoothing)  Linear Separation (Perceptron, SVMs)  Non-parametric classification (KNN)

Outline  Time and uncertainty  Inference: filtering, prediction, smoothing  Hidden Markov models  Kalman filters (a brief mention)  Dynamic Bayesian networks  Particle filtering

Time and uncertainty

 The world changes: we need to track and predict it  Diabetes management vs vehicle diagnosis  Basic idea: copy state and evidence variables for each time step  = set of unobservable state variables at time t  E.g., BloodSugar t, StomachContents t, etc.  = set of observable evidence variables at time t  E.g., MeasuredBloodSugar t, PulseRate t, FoodEaten t  This assumes discrete time; step size depends on problem  Notation:

Markov processes (Markov chain)  Construct a Bayesian net from these variables: parents?  Markov assumption: depends on bounded set of  First order Markov process:  Second order Markov process:  Sensor Markov assumption:  Stationary process: transition model and sensor model are fixed for all t

Example  First-order Markov assumption not exactly true in real world!  Possible fixes:  1. Increase order of Markov process  2. Augment state, e.g., add Temp t, Pressure t  Example: robot motion.  Augment position and velocity with Battery t

Inference

Inference tasks  Filtering: P(X t |e 1:t )  Belief state--input to the decision process of a rational agent  Prediction: P(X t+k |e 1:t ) for k > 0  Evaluation of possible action sequences;  Like filtering without the evidence  Smoothing: P(X k |e 1:t ) for 0 ≤ k < t  Better estimate of past states, essential for learning  Most likely explanation:  Speech recognition, decoding with a noisy channel

Filtering  Aim: devise a recursive state estimation algorithm  I.e., prediction + estimation. Prediction by summing out X t :  where  Time and space constant ( independent of t )

Filtering example

Smoothing  Divide evidence e 1:t into e 1:k, e k+1:t :  Backward message computed by a backwards recursion:

Smoothing example  Forward-backward algorithm: cache forward messages along the way  Time linear in t (polytree inference), space O(t|f|)

Most likely explanation  Most likely sequence ≠ sequence of most likely states!!!!  Most likely path to each x t+1  = most likely path to some x t plus one more step  Identical to filtering, except f 1:t replaced by  I.e., m 1:i (t) gives the probability of the most likely path to state i  Update has sum replaced by max, giving the Viterbi algorithm:

Viterbi algorithm

Hidden Markov models

 X t is a single, discrete variable (usually E t is too)  Domain of X t is {1;...; S}  Transition matrix T ij = P(X t =j|X t-1 =i) e.g.,  Sensor matrix O t for each time step, diagonal elements P(e t |X t =i)  e.g., with U 1 =true,  Forward and backward messages as column vectors:  Forward-backward algorithm needs time O(S 2 t) and space O(St)

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

Kalman filtering

Inference by stochastic simulation  Modelling systems described by a set of continuous variables,  E.g., tracking a bird flying:  Airplanes, robots, ecosystems, economies, chemical plants, planets, …  Gaussian prior, linear Gaussian transition model and sensor model

Updating Gaussian distributions  Prediction step: if P(X t |e 1:t ) is Gaussian, then prediction is Gaussian. If P(X t+1 |e 1:t ) is Gaussian, then the updated distribution is Gaussian  Hence P(X t |e 1:t ) is multi-variate Gaussian for all t  General (nonlinear, non-Gaussian) process: description of posterior grows unboundedly as

Simple 1-D example  Gaussian random walk on X-axis, s.d., sensor s.d.

General Kalman update  Transition and sensor models  F is the matrix for the transition; the transition noise covariance  H is the matrix for the sensors; the sensor noise covariance  Filter computes the following update:  where is the Kalman gain matrix  and are independent of observation sequence, so compute offline

2-D tracking example: filtering

2-D tracking example: smoothing

Where is breaks  Cannot be applied if the transition model is nonlinear  Extended Kalman Filter models transition as locally linear around  Fails if systems is locally unsmooth

Dynamic Bayesian networks

 X t, E t contain arbitrarily many variables in a replicated Bayes net

DBNs vs. HMMs  Every HMM is a single-variable DBN; every discrete DBN is an HMM  Sparse dependencies: exponentially fewer parameters;  e.g., 20 state variables, three parents each  DBN has parameters, HMM has

DBN vs. Kalman filters  Every Kalman filter model is a DBN, but few DBNs are KFs;  Real world requires non-Gaussian posteriors  E.g., where are bin Laden and my keys? What's the battery charge?

Exact inference in DBNs  Naive method: unroll the network and run any exact algorithm  Problem: inference cost for each update grows with t  Rollup filtering: add slice t+1, “sum out”slice t using variable elimination  Largest factor is O(d n+1 ), update cost O(d n+2 ) (cf. HMM update cost O(d 2n ))

Likelihood weighting for DBN  Set of weighted samples approximates the belief state  LW samples pay no attention to the evidence!  fraction “agreeing” falls exponentially with t  Number of samples required grows exponentially with t

Particle filtering

 Basic idea: ensure that the population of samples ("particles") tracks the high-likelihood regions of the state-space  Replicate particles proportional to likelihood for e t  Widely used for tracking non-linear systems, esp, vision  Also used for simultaneous localization and mapping in mobile robots, dimensional state space

Particle filtering  Assume consistent at time t:  Propagate forward: populations of x t+1 are  Weight samples by their likelihood for e t+1 :  Resample to obtain populations proportional to W :

Particle filtering performance  Approximation error of particle filtering remains bounded over time, at least empirically theoretical analysis is difficult

How particle filter works & applications

Summary  Temporal models use state and sensor variables replicated over time  Markov assumptions and stationarity assumption, so we need  Transition model P(X t |X t-1 )  Sensor model P(E t |X t )  Tasks are filtering, prediction, smoothing, most likely sequence;  All done recursively with constant cost per time step  Hidden Markov models have a single discrete state variable;  Widely used for speech recognition  Kalman filters allow n state variables, linear Gaussian, O(n 3 ) update  Dynamic Bayesian nets subsume HMMs, Kalman filters; exact update intractable  Particle filtering is a good approximate filtering algorithm for DBNs