CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Temporal sequences: Hidden Markov Models and Dynamic Bayesian Networks.

CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Temporal sequences: Hidden Markov Models and Dynamic Bayesian Networks

M OTIVATION Observing a stream of data Monitoring (of people, computer systems, etc) Surveillance, tracking Finance & economics Science Questions: Modeling & forecasting Unobserved variables

T IME S ERIES M ODELING Time occurs in steps t=0,1,2,… Time step can be seconds, days, years, etc State variable X t, t=0,1,2,… For partially observed problems, we see observations O t, t=1,2,… and do not see the X’s X’s are hidden variables (aka latent variables)

M ODELING T IME Arrow of time Causality => Bayesian networks are natural models of time series CausesEffects

P ROBABILISTIC M ODELING For now, assume fully observable case What parents? X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3

M ARKOV A SSUMPTION Assume X t+k is independent of all X i for i<t P(X t+k | X 0,…,X t+k-1 ) = P(Xt+k | X t,…,X t+k-1 ) K-th order Markov Chain X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3 Order 0 Order 1 Order 2 Order 3

1 ST ORDER M ARKOV C HAIN MC’s of order k>1 can be converted into a 1st order MC on the variable Y t = {X t,…,X t+k-1 } So w.o.l.o.g., “MC” refers to a 1st order MC Y0Y0 Y1Y1 Y2Y2 Y3Y3 X0X0 X1X1 X2X2 X3X3 X0X0 X1’X1’X2’X2’X3’X3’ X1X1 X2X2 X3X3 X4X4

I NFERENCE IN MC What independence relationships can we read from the BN? X0X0 X1X1 X2X2 X3X3 Observe X 1 X 0 independent of X 2, X 3, … P(X t |X t-1 ) known as transition model

I NFERENCE IN MC Prediction: the probability of future state? P(X t ) =  x0,…,xt-1 P (X 0,…,X t ) =  x0,…,xt-1 P (X 0 )  x1,…,xt P(X i |X i-1 ) =  xt-1 P(X t |X t-1 ) P(X t-1 ) Approach: maintain a belief state b t (X)=P(X t ), use above equation to advance to b t+1 (X) Equivalent to VE algorithm in sequential order [Recursive approach]

B ELIEF STATE EVOLUTION P(X t ) =  xt-1 P(X t |X t-1 ) P(X t-1 ) “Blurs” over time, and (typically) approaches a stationary distribution as t grows Limited prediction power Rate of blurring known as mixing time

S TATIONARY DISTRIBUTIONS For discrete variables Val(X)={1,…,n}: Transition matrix T ij = P(X t =i|X t-1 =j) Belief b t (X) is just a vector b t,i =P(X t =i) Belief update equation: b t+1 = T*b t A stationary distribution b is one in which b = Tb => b is an eigenvector of T with eigenvalue 1 => b is in the null space of (T-I)

H ISTORY D EPENDENCE In Markov models, the state must be chosen so that the future is independent of history given the current state Often this requires adding variables that cannot be directly observed Are these people walking toward you or away from you? What comes next? “the bare” minimum essentials market wipes himself with the rabbit

P ARTIAL O BSERVABILITY Hidden Markov Model (HMM) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Hidden state variables Observed variables P(O t |X t ) called the observation model (or sensor model)

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 O1O1 O2O2 Query variable

P REDICT -U PDATE INTERPRETATION Given old belief state b t-1 (X) Predict : First compute MC update b t ’(X t )=P(X t |o 1:t-1 ) =   x b t-1 (x) P(X t |X t-1 =x) Update : Re-weight to account for observation probabilities: b t (x) = b t ’(x)P(o t |X t =x) X0X0 X1X1 X2X2 O1O1 O2O2 Query variable

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query

P REDICTION P(X t+k |o 1:t ) 2 steps: P(X t |o 1:t ), then P(X t+k |X t ) Filter to time t, then predict as with standard MC X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query

I NTERPRETATION Filtering/prediction: Equivalent to forward variable elimination / belief propagation Smoothing: Equivalent to forward VE/BP up to query variable, then backward VE/BP from last observation back to query variable Running BP to completion gives the smoothed estimates for all variables (forward-backward algorithm)

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation Subject of next lecture X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query returns a path through state space x0,…,x3

A PPLICATIONS OF HMM S IN NLP Speech recognition Hidden phones (e.g., ah eh ee th r) Observed, noisy acoustic features (produced by signal processing)

P HONE O BSERVATION M ODELS Phone t Signal processing Features (24,13,3,59) Features t Model defined to be robust over variations in accent, speed, pitch, noise

P HONE T RANSITION M ODELS Phone t Features t Good models will capture (among other things): Pronunciation of words Subphone structure Coarticulation effects Triphone models = order 3 Markov chain Phone t+1

W ORD S EGMENTATION Words run together when pronounced Unigrams P(w i ) Bigrams P(w i |w i-1 ) Trigrams P(w i |w i-1,w i-2 ) Logical are as confusion a may right tries agent goal the was diesel more object then information- gathering search is Planning purely diagnostic expert systems are very similar computational approach would be represented compactly using tic tac toe a predicate Planning and scheduling are integrated the success of naïve bayes model is just a possible prior source by that time Random 20 word samples from R&N using N-gram models

W HAT ABOUT MODELS WITH MANY VARIABLES ? Say X has n binary variables, O has m binary variables Naively, a distribution over X t may be intractable to represent (2 n entries) Transition models P(X t |X t-1 ) require 2 2n entries Observation models P(O t |X t ) require 2 n+m entries Is there a better way?

E XAMPLE : F AILURE DETECTION Consider a battery meter sensor Battery = true level of battery BMeter = sensor reading Transient failures: send garbage at time t Persistent failures: send garbage forever

E XAMPLE : F AILURE DETECTION Consider a battery meter sensor Battery = true level of battery BMeter = sensor reading Transient failures: send garbage at time t 5555500555… Persistent failures: sensor is broken 5555500000…

D YNAMIC B AYESIAN N ETWORK Template model relates variables on prior time step to the next time step (2-TBN) “Unrolling” the template for all t gives the ground Bayesian network BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t,  )

D YNAMIC B AYESIAN N ETWORK BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t,  ) P(BMeter t =0 | Battery t =5) = 0.03 Transient failure model

R ESULTS ON T RANSIENT F AILURE E(Battery t ) Transient failure occurs Without model With model Meter reads 55555005555…

R ESULTS ON P ERSISTENT F AILURE E(Battery t ) Persistent failure occurs With transient model Meter reads 5555500000…

P ERSISTENT F AILURE M ODEL BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t,  ) P(BMeter t =0 | Battery t =5) = 0.03 Broken t-1 Broken t P(BMeter t =0 | Broken t ) = 1

R ESULTS ON P ERSISTENT F AILURE E(Battery t ) Persistent failure occurs With transient model Meter reads 5555500000… With persistent failure model

H OW TO PERFORM INFERENCE ON DBN? Exact inference on “unrolled” BN E.g. Variable Elimination Typical order: eliminate sequential time steps so that the network isn’t actually constructed Unrolling is done only implicitly BM 1 Ba 1 Ba 0 Br 0 Br 1 BM 2 Ba 2 Br 2 BM 3 Ba 3 Br 3 BM 4 Ba 4 Br 4

E NTANGLEMENT P ROBLEM After n time steps, all n variables in the belief state become dependent! Unless 2-TBN can be partitioned into disjoint subsets (rare) Lost sparsity structure

A PPROXIMATE INFERENCE IN DBN S Limited history updates Assumed factorization of belief state Particle filtering

I NDEPENDENT F ACTORIZATION Idea: assume belief state P( X t ) factors across individual attributes P( X t ) = P(X 1,t )*…*P(X n,t ) Filtering : only maintain factored distributions P(X 1,t | O 1:t ),…,P(X n,t | O 1:t ) Filtering update : P(X k,t | O 1:t ) =  x t-1 P(X k,t | O t, X t-1 ) P( X t-1 | O 1:t-1 ) = marginal probability query over 2- TBN X 1,t-1 X n,t-1 X 1,t X n,t O 1,t O m,t

N EXT TIME Viterbi algorithm Read K&F 13.2 for some context Kalman and particle filtering Read K&F 15.3-4

CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Temporal sequences: Hidden Markov Models and Dynamic Bayesian Networks.

Similar presentations

Presentation on theme: "CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Temporal sequences: Hidden Markov Models and Dynamic Bayesian Networks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Temporal sequences: Hidden Markov Models and Dynamic Bayesian Networks.

Similar presentations

Presentation on theme: "CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Temporal sequences: Hidden Markov Models and Dynamic Bayesian Networks."— Presentation transcript:

Similar presentations

About project

Feedback