Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.

Similar presentations


Presentation on theme: "CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models."— Presentation transcript:

1 CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models

2 Re-cap of Lecture VII  Exact inference by variable elimination:  polytime on polytrees, NP-hard on general graphs  space = time, very sensitive to topology  Approximate inference by LW, MCMC:  LW does poorly when there is lots of (downstream) evidence  LW, MCMC generally insensitive to topology  Convergence can be very slow with probabilities close to 1 or 0  Can handle arbitrary combinations of discrete and continuous variables

3 Re-cap of Lecture IX  Machine Learning  Classification (Naïve Bayes)  Decision trees  Regression (Linear, Smoothing)  Linear Separation (Perceptron, SVMs)  Non-parametric classification (KNN)

4 Outline  Time and uncertainty  Inference: filtering, prediction, smoothing  Hidden Markov models  Kalman filters (a brief mention)  Dynamic Bayesian networks  Particle filtering

5 Time and uncertainty

6  The world changes: we need to track and predict it  Diabetes management vs vehicle diagnosis  Basic idea: copy state and evidence variables for each time step  = set of unobservable state variables at time t  E.g., BloodSugar t, StomachContents t, etc.  = set of observable evidence variables at time t  E.g., MeasuredBloodSugar t, PulseRate t, FoodEaten t  This assumes discrete time; step size depends on problem  Notation:

7 Markov processes (Markov chain)  Construct a Bayesian net from these variables: parents?  Markov assumption: depends on bounded set of  First order Markov process:  Second order Markov process:  Sensor Markov assumption:  Stationary process: transition model and sensor model are fixed for all t

8 Example  First-order Markov assumption not exactly true in real world!  Possible fixes:  1. Increase order of Markov process  2. Augment state, e.g., add Temp t, Pressure t  Example: robot motion.  Augment position and velocity with Battery t

9 Inference

10 Inference tasks  Filtering: P(X t |e 1:t )  Belief state--input to the decision process of a rational agent  Prediction: P(X t+k |e 1:t ) for k > 0  Evaluation of possible action sequences;  Like filtering without the evidence  Smoothing: P(X k |e 1:t ) for 0 ≤ k < t  Better estimate of past states, essential for learning  Most likely explanation:  Speech recognition, decoding with a noisy channel

11 Filtering  Aim: devise a recursive state estimation algorithm  I.e., prediction + estimation. Prediction by summing out X t :  where  Time and space constant ( independent of t )

12 Filtering example

13 Smoothing  Divide evidence e 1:t into e 1:k, e k+1:t :  Backward message computed by a backwards recursion:

14 Smoothing example  Forward-backward algorithm: cache forward messages along the way  Time linear in t (polytree inference), space O(t|f|)

15 Most likely explanation  Most likely sequence ≠ sequence of most likely states!!!!  Most likely path to each x t+1  = most likely path to some x t plus one more step  Identical to filtering, except f 1:t replaced by  I.e., m 1:i (t) gives the probability of the most likely path to state i  Update has sum replaced by max, giving the Viterbi algorithm:

16 Viterbi algorithm

17 Hidden Markov models

18  X t is a single, discrete variable (usually E t is too)  Domain of X t is {1;...; S}  Transition matrix T ij = P(X t =j|X t-1 =i) e.g.,  Sensor matrix O t for each time step, diagonal elements P(e t |X t =i)  e.g., with U 1 =true,  Forward and backward messages as column vectors:  Forward-backward algorithm needs time O(S 2 t) and space O(St)

19 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

20 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

21 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

22 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

23 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

24 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

25 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

26 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

27 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

28 Country dance algorithm  Can avoid storing all forward messages in smoothing by running forward algorithm backwards:  Algorithm: forward pass computes f t, backward pass does f i, b i

29 Kalman filtering

30 Inference by stochastic simulation  Modelling systems described by a set of continuous variables,  E.g., tracking a bird flying:  Airplanes, robots, ecosystems, economies, chemical plants, planets, …  Gaussian prior, linear Gaussian transition model and sensor model

31 Updating Gaussian distributions  Prediction step: if P(X t |e 1:t ) is Gaussian, then prediction is Gaussian. If P(X t+1 |e 1:t ) is Gaussian, then the updated distribution is Gaussian  Hence P(X t |e 1:t ) is multi-variate Gaussian for all t  General (nonlinear, non-Gaussian) process: description of posterior grows unboundedly as

32 Simple 1-D example  Gaussian random walk on X-axis, s.d., sensor s.d.

33 General Kalman update  Transition and sensor models  F is the matrix for the transition; the transition noise covariance  H is the matrix for the sensors; the sensor noise covariance  Filter computes the following update:  where is the Kalman gain matrix  and are independent of observation sequence, so compute offline

34 2-D tracking example: filtering

35 2-D tracking example: smoothing

36 Where is breaks  Cannot be applied if the transition model is nonlinear  Extended Kalman Filter models transition as locally linear around  Fails if systems is locally unsmooth

37 Dynamic Bayesian networks

38  X t, E t contain arbitrarily many variables in a replicated Bayes net

39 DBNs vs. HMMs  Every HMM is a single-variable DBN; every discrete DBN is an HMM  Sparse dependencies: exponentially fewer parameters;  e.g., 20 state variables, three parents each  DBN has parameters, HMM has

40 DBN vs. Kalman filters  Every Kalman filter model is a DBN, but few DBNs are KFs;  Real world requires non-Gaussian posteriors  E.g., where are bin Laden and my keys? What's the battery charge?

41 Exact inference in DBNs  Naive method: unroll the network and run any exact algorithm  Problem: inference cost for each update grows with t  Rollup filtering: add slice t+1, “sum out”slice t using variable elimination  Largest factor is O(d n+1 ), update cost O(d n+2 ) (cf. HMM update cost O(d 2n ))

42 Likelihood weighting for DBN  Set of weighted samples approximates the belief state  LW samples pay no attention to the evidence!  fraction “agreeing” falls exponentially with t  Number of samples required grows exponentially with t

43 Particle filtering

44  Basic idea: ensure that the population of samples ("particles") tracks the high-likelihood regions of the state-space  Replicate particles proportional to likelihood for e t  Widely used for tracking non-linear systems, esp, vision  Also used for simultaneous localization and mapping in mobile robots, 10 5 - dimensional state space

45 Particle filtering  Assume consistent at time t:  Propagate forward: populations of x t+1 are  Weight samples by their likelihood for e t+1 :  Resample to obtain populations proportional to W :

46 Particle filtering performance  Approximation error of particle filtering remains bounded over time, at least empirically theoretical analysis is difficult

47 How particle filter works & applications

48 Summary  Temporal models use state and sensor variables replicated over time  Markov assumptions and stationarity assumption, so we need  Transition model P(X t |X t-1 )  Sensor model P(E t |X t )  Tasks are filtering, prediction, smoothing, most likely sequence;  All done recursively with constant cost per time step  Hidden Markov models have a single discrete state variable;  Widely used for speech recognition  Kalman filters allow n state variables, linear Gaussian, O(n 3 ) update  Dynamic Bayesian nets subsume HMMs, Kalman filters; exact update intractable  Particle filtering is a good approximate filtering algorithm for DBNs


Download ppt "CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models."

Similar presentations


Ads by Google