Hidden Markov chain models (state space model) General problem: have a set of noisy measurements of a process. I.e. two components: A real-world process. Measurements that give some information about the state of the process at a given time. Assumptions: Process is Markovian (previous state summarizes all relevant history). Observational noise is independent. This is the data we’ve got. Z1 Z2 Z3 Zk Zk+1 Zn This is what we’re interested in. (The process) X1 . . . . . . X2 X3 Xk Xk+1 Xn
Hidden Markov model specification We have two components in a hidden Markov chain model: A hidden process which is Markovian: 𝑓 𝑋 1 , 𝑋 2 , 𝑋 3 ,⋯, 𝑋 𝑘−1 , 𝑋 𝑘 ,⋯, 𝑋 𝑛 =𝑓( 𝑋 1 ) 𝑘=2 𝑛 𝑓( 𝑋 𝑘 | 𝑋 𝑘−1 ) For each time point, what is needed is the dependency of the process state on the previous state: 𝑓( 𝑋 𝑘 | 𝑋 𝑘−1 ). This distribution will contain all process-related parameters and is called the system equation. (Dependency on process parameters have been suppressed.) The dependency of each measurement on the process state at the given time: 𝑓( 𝑍 𝑘 | 𝑋 𝑘 ). This is called the observational equation and contains all the observation-related parameters. PS: Both process and measurements can be vectors. The number of components in the measurements can vary and do not need to match the number of components in the process! This is the data we’ve got. Z1 Z2 Z3 Zk-1 Zk Zn X1 . . . . . . X2 X3 Xk-1 Xk Xn This is the process we’re interested in.
Hidden Markov model specification What we want from the process: Inference about the state of the process at a given time The likelihood! If we have the likelihood, we can do inference on the parameters of the process (and the observations), as well as do model selection! 𝐿 𝜃 ≡𝑓( 𝑍 1 , 𝑍 2 , 𝑍 3 ,⋯, 𝑍 𝑘−1 , 𝑍 𝑘 ,⋯, 𝑍 𝑛 )= 𝑓( 𝑍 1 )𝑓( 𝑍 2 |𝑍 1 )𝑓( 𝑍 3 |𝑍 1 ,𝑍 2 )⋯ 𝑓( 𝑍 𝑘 |𝑍 1 ,𝑍 2 , 𝑍 3 ,⋯, 𝑍 𝑘−1 ) ⋯ 𝑓( 𝑍 𝑛 |𝑍 1 ,𝑍 2 , 𝑍 3 ,⋯, 𝑍 𝑛−1 ) PS: This is completely general! This is the data we’ve got. Z1 Z2 Z3 Zk-1 Zk Zn X1 . . . . . . X2 X3 Xk-1 Xk Xn This is the process we’re interested in.
Doing inference on hidden times series models - filtering Data Z1 Z2 Z3 Zk-1 Zk Zn X1 . . . . . . X2 X3 Xk-1 Xk Xn Process Procedure: Go stepwise through the observations from 1 to n. For observation k-1, assume you have the state inference conditioned on all data up to that observation: f(Xk-1|Zk-1,…,Z1). Then: Predict next system state: Use the state equation and the law of total probability calculate f(Xk|Zk-1,…,Z1). Predict next observation: Use the observational equation and the law of total probability to get f(Zk|Zk-1,…,Z1). Update process state with current observation: Use Bayes’ formula to calculate f(Xk|Zk, Zk-1, …,Z1). (You need f(Xk|Zk-1,…,Z1). , f(Zk|Zk-1,…,Z1) and the observational equation 𝑓( 𝑍 𝑘 | 𝑋 𝑘 ) to do this). Go back to step 1, now for Xk+1. Likelihood: Lf(Z1,…, Zn|)=f(Z1) f(Z2|Z1)…f(Zk|Zk-1)... f(Zn|Zn-1).
The Kalman filter Steps i and ii can usually not be performed analytically for continuous states and measurements! However, if the system and observational equations are both linear and normal, this can be done! System equation: 𝑋 𝑘 = 𝐹 𝑘 𝑋 𝑘−1 + 𝑢 𝑘 + 𝜀 𝑘 , where 𝜀 𝑘 ~𝑁 0 , 𝑄 𝑘 Observational equation: 𝑍 𝑘 = 𝐻 𝑘 𝑋 𝑘 + 𝛿 𝑘 , where 𝛿 𝑘 ~𝑁 0 , 𝑅 𝑘 We only need to keep track of the mean and the variance, since everything is normal. Let 𝑥 𝑘|𝑘−1 and 𝑥 𝑘|𝑘 the mean of the distribution of 𝑋 𝑘 given all measurements up to k-1 and k, respectively. Similarly 𝑃 𝑘|𝑘−1 and 𝑃 𝑘|𝑘 denotes the variance of 𝑋 𝑘 given all measurements up to k-1 and k, respectively. Let 𝑧 𝑘|𝑘−1 and 𝑆 𝑘|𝑘−1 denoted the mean and variance of 𝑍 𝑘 given all measurement up to k-1. Predict system: 𝑥 𝑘|𝑘−1 = 𝐹 𝑘 𝑥 𝑘−1|𝑘−1 + 𝑢 𝑘 , 𝑃 𝑘|𝑘−1 = 𝐹 𝑘 𝑃 𝑘−1|𝑘−1 𝐹′ 𝑘 + 𝑄 𝑘 Predict observation: 𝑧 𝑘|𝑘−1 = 𝐻 𝑘 𝑥 𝑘|𝑘−1 , 𝑆 𝑘|𝑘−1 = 𝐻 𝑘 𝑃 𝑘|𝑘−1 𝐻′ 𝑘 + 𝑅 𝑘 Update with current observation: 𝑥 𝑘|𝑘 = 𝑥 𝑘|𝑘−1 + 𝑃 𝑘|𝑘−1 𝐻′ 𝑘 𝑆 𝑘|𝑘−1 −1 ( 𝑍 𝑘 − 𝑧 𝑘|𝑘−1 ), 𝑃 𝑘|𝑘 =(𝐼− 𝑃 𝑘|𝑘−1 𝐻′ 𝑘 𝑆 𝑘|𝑘−1 −1 𝐻 𝑘 ) 𝑃 𝑘|𝑘−1 Go back to step 1, now for Xk+1. 𝐿= 𝑘=1 𝑁 𝑓 𝑁 ( 𝑍 𝑘 |𝑚𝑒𝑎𝑛= 𝐻 𝑘 𝑥 𝑘|𝑘−1 , 𝑣𝑎𝑟= 𝑆 𝑘|𝑘−1 )
Kalman smoother We might also be interested in the inferring the state given *all* the measurements. This can also be done analytically for a linear normal system, using the Kalman smoother. Start with 𝑥 𝑛|𝑛 and 𝑃 𝑘|𝑘 , i.e. inference for the last state given all measurements. Run backwards, using Bayes equation to incorporate the extra measurements step by step: Let 𝐶 𝑘 ≡ 𝑃 𝑘|𝑘 𝐹 ′ 𝑘+1 𝑃 𝑘+1|𝑘 −1 𝑥 𝑘|𝑛 = 𝑥 𝑘|𝑘 + 𝐶 𝑘 𝑥 𝑘+1|𝑛 − 𝑢 𝑘+1 − 𝐹 𝑘+1 𝑥 𝑘|𝑘 𝑃 𝑘|𝑛 = 𝑃 𝑘|𝑘 + 𝐶 𝑘 ( 𝑃 𝑘+1|𝑛 − 𝑃 𝑘+1|𝑘 ) 𝐶′ 𝑘 Go from k to k-1. Typically the uncertainty of this state inference will «bubble up» where you have bad/no observations and narrow down where you have good/many observations.