Download presentation
Presentation is loading. Please wait.
Published byKristopher Washington Modified over 9 years ago
1
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Temporal sequences: Hidden Markov Models and Dynamic Bayesian Networks
2
M OTIVATION Observing a stream of data Monitoring (of people, computer systems, etc) Surveillance, tracking Finance & economics Science Questions: Modeling & forecasting Unobserved variables
3
T IME S ERIES M ODELING Time occurs in steps t=0,1,2,… Time step can be seconds, days, years, etc State variable X t, t=0,1,2,… For partially observed problems, we see observations O t, t=1,2,… and do not see the X’s X’s are hidden variables (aka latent variables)
4
M ODELING T IME Arrow of time Causality => Bayesian networks are natural models of time series CausesEffects
5
P ROBABILISTIC M ODELING For now, assume fully observable case What parents? X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3
6
M ARKOV A SSUMPTION Assume X t+k is independent of all X i for i<t P(X t+k | X 0,…,X t+k-1 ) = P(Xt+k | X t,…,X t+k-1 ) K-th order Markov Chain X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3 Order 0 Order 1 Order 2 Order 3
7
1 ST ORDER M ARKOV C HAIN MC’s of order k>1 can be converted into a 1st order MC on the variable Y t = {X t,…,X t+k-1 } So w.o.l.o.g., “MC” refers to a 1st order MC Y0Y0 Y1Y1 Y2Y2 Y3Y3 X0X0 X1X1 X2X2 X3X3 X0X0 X1’X1’X2’X2’X3’X3’ X1X1 X2X2 X3X3 X4X4
8
I NFERENCE IN MC What independence relationships can we read from the BN? X0X0 X1X1 X2X2 X3X3 Observe X 1 X 0 independent of X 2, X 3, … P(X t |X t-1 ) known as transition model
9
I NFERENCE IN MC Prediction: the probability of future state? P(X t ) = x0,…,xt-1 P (X 0,…,X t ) = x0,…,xt-1 P (X 0 ) x1,…,xt P(X i |X i-1 ) = xt-1 P(X t |X t-1 ) P(X t-1 ) Approach: maintain a belief state b t (X)=P(X t ), use above equation to advance to b t+1 (X) Equivalent to VE algorithm in sequential order [Recursive approach]
11
B ELIEF STATE EVOLUTION P(X t ) = xt-1 P(X t |X t-1 ) P(X t-1 ) “Blurs” over time, and (typically) approaches a stationary distribution as t grows Limited prediction power Rate of blurring known as mixing time
12
S TATIONARY DISTRIBUTIONS For discrete variables Val(X)={1,…,n}: Transition matrix T ij = P(X t =i|X t-1 =j) Belief b t (X) is just a vector b t,i =P(X t =i) Belief update equation: b t+1 = T*b t A stationary distribution b is one in which b = Tb => b is an eigenvector of T with eigenvalue 1 => b is in the null space of (T-I)
13
H ISTORY D EPENDENCE In Markov models, the state must be chosen so that the future is independent of history given the current state Often this requires adding variables that cannot be directly observed Are these people walking toward you or away from you? What comes next? “the bare” minimum essentials market wipes himself with the rabbit
14
P ARTIAL O BSERVABILITY Hidden Markov Model (HMM) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Hidden state variables Observed variables P(O t |X t ) called the observation model (or sensor model)
15
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3
16
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 O1O1 O2O2 Query variable
17
F ILTERING Name comes from signal processing P(X t |o 1:t ) = xt-1 P(x t-1 |o 1:t-1 ) P(X t |x t-1,o t ) P(X t |X t-1,o t ) = P(o t |X t-1,X t )P(X t |X t-1 )/P(o t |X t-1 ) = P(o t |X t )P(X t |X t-1 ) X0X0 X1X1 X2X2 O1O1 O2O2 Query variable
18
F ILTERING P(X t |o 1:t ) = xt-1 P(x t-1 |o 1:t-1 ) P(o t |X t )P(X t |x t-1 ) Forward recursion If we keep track of belief state b t (X) = P(X t |o 1:t ) => O(|Val(X)| 2 ) updates for each t! X0X0 X1X1 X2X2 O1O1 O2O2 Query variable
19
P REDICT -U PDATE INTERPRETATION Given old belief state b t-1 (X) Predict : First compute MC update b t ’(X t )=P(X t |o 1:t-1 ) = x b t-1 (x) P(X t |X t-1 =x) Update : Re-weight to account for observation probabilities: b t (x) = b t ’(x)P(o t |X t =x) X0X0 X1X1 X2X2 O1O1 O2O2 Query variable
20
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query
21
P REDICTION P(X t+k |o 1:t ) 2 steps: P(X t |o 1:t ), then P(X t+k |X t ) Filter to time t, then predict as with standard MC X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query
22
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query
23
S MOOTHING P(X k |o 1:t ) for k < t P(X k |o 1:k,o k+1:t ) = P(o k+1:t |X k,o 1:k )P(X k |o 1:k )/P(o k+1:t |o 1:k ) = P(o k+1:t |X k )P(X k |o 1:k ) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query Standard filtering to time k
24
S MOOTHING Computing P(o k+1:t |X k ) P(o k+1:t |X k ) = xk+1 P(o k+1:t |X k,x k+1 ) P(x k+1 |X k ) = xk+1 P(o k+1:t |x k+1 ) P(x k+1 |X k ) = xk+1 P(o k+2:t |x k+1 )P(o k+1 |x k+1 )P(x k+1 |X k ) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Given prior states What’s the probability of this sequence? Backward recursion
25
I NTERPRETATION Filtering/prediction: Equivalent to forward variable elimination / belief propagation Smoothing: Equivalent to forward VE/BP up to query variable, then backward VE/BP from last observation back to query variable Running BP to completion gives the smoothed estimates for all variables (forward-backward algorithm)
26
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation Subject of next lecture X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query returns a path through state space x0,…,x3
27
A PPLICATIONS OF HMM S IN NLP Speech recognition Hidden phones (e.g., ah eh ee th r) Observed, noisy acoustic features (produced by signal processing)
28
P HONE O BSERVATION M ODELS Phone t Signal processing Features (24,13,3,59) Features t Model defined to be robust over variations in accent, speed, pitch, noise
29
P HONE T RANSITION M ODELS Phone t Features t Good models will capture (among other things): Pronunciation of words Subphone structure Coarticulation effects Triphone models = order 3 Markov chain Phone t+1
30
W ORD S EGMENTATION Words run together when pronounced Unigrams P(w i ) Bigrams P(w i |w i-1 ) Trigrams P(w i |w i-1,w i-2 ) Logical are as confusion a may right tries agent goal the was diesel more object then information- gathering search is Planning purely diagnostic expert systems are very similar computational approach would be represented compactly using tic tac toe a predicate Planning and scheduling are integrated the success of naïve bayes model is just a possible prior source by that time Random 20 word samples from R&N using N-gram models
31
W HAT ABOUT MODELS WITH MANY VARIABLES ? Say X has n binary variables, O has m binary variables Naively, a distribution over X t may be intractable to represent (2 n entries) Transition models P(X t |X t-1 ) require 2 2n entries Observation models P(O t |X t ) require 2 n+m entries Is there a better way?
32
E XAMPLE : F AILURE DETECTION Consider a battery meter sensor Battery = true level of battery BMeter = sensor reading Transient failures: send garbage at time t Persistent failures: send garbage forever
33
E XAMPLE : F AILURE DETECTION Consider a battery meter sensor Battery = true level of battery BMeter = sensor reading Transient failures: send garbage at time t 5555500555… Persistent failures: sensor is broken 5555500000…
34
D YNAMIC B AYESIAN N ETWORK Template model relates variables on prior time step to the next time step (2-TBN) “Unrolling” the template for all t gives the ground Bayesian network BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t, )
35
D YNAMIC B AYESIAN N ETWORK BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t, ) P(BMeter t =0 | Battery t =5) = 0.03 Transient failure model
36
R ESULTS ON T RANSIENT F AILURE E(Battery t ) Transient failure occurs Without model With model Meter reads 55555005555…
37
R ESULTS ON P ERSISTENT F AILURE E(Battery t ) Persistent failure occurs With transient model Meter reads 5555500000…
38
P ERSISTENT F AILURE M ODEL BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t, ) P(BMeter t =0 | Battery t =5) = 0.03 Broken t-1 Broken t P(BMeter t =0 | Broken t ) = 1
39
R ESULTS ON P ERSISTENT F AILURE E(Battery t ) Persistent failure occurs With transient model Meter reads 5555500000… With persistent failure model
40
H OW TO PERFORM INFERENCE ON DBN? Exact inference on “unrolled” BN E.g. Variable Elimination Typical order: eliminate sequential time steps so that the network isn’t actually constructed Unrolling is done only implicitly BM 1 Ba 1 Ba 0 Br 0 Br 1 BM 2 Ba 2 Br 2 BM 3 Ba 3 Br 3 BM 4 Ba 4 Br 4
41
E NTANGLEMENT P ROBLEM After n time steps, all n variables in the belief state become dependent! Unless 2-TBN can be partitioned into disjoint subsets (rare) Lost sparsity structure
42
A PPROXIMATE INFERENCE IN DBN S Limited history updates Assumed factorization of belief state Particle filtering
43
I NDEPENDENT F ACTORIZATION Idea: assume belief state P( X t ) factors across individual attributes P( X t ) = P(X 1,t )*…*P(X n,t ) Filtering : only maintain factored distributions P(X 1,t | O 1:t ),…,P(X n,t | O 1:t ) Filtering update : P(X k,t | O 1:t ) = x t-1 P(X k,t | O t, X t-1 ) P( X t-1 | O 1:t-1 ) = marginal probability query over 2- TBN X 1,t-1 X n,t-1 X 1,t X n,t O 1,t O m,t
44
N EXT TIME Viterbi algorithm Read K&F 13.2 for some context Kalman and particle filtering Read K&F 15.3-4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.