Instructors: Fei Fang (This Lecture) and Dave Touretzky Artificial Intelligence: Representation and Problem Solving Probabilistic Reasoning (4): Temporal Models 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126 12/8/2018
Probability Models and Probabilistic Inference Bayes’ Net Recap Probability Models and Probabilistic Inference Bayes’ Net Exact Inference Approximate inference: sampling Today: Probabilistic reasoning over time Fei Fang 12/8/2018
Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Outline Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Dynamic Bayes’ Net (DBN) Particle filtering Applications of DBN Special classes of DBN Fei Fang 12/8/2018
Temporal Probabilistic Model Why do we need temporal probabilistic model? The world changes over time, and what happens now impacts what will happen in the future Stock market Weather Sometimes world state become clearer as more evidence is collected over time Diagnosis, e.g., cold vs chronic pharyngitis given coughing How to model the time? View the world as time slices: discrete time steps Fei Fang 12/8/2018
Temporal Probabilistic Model State variables 𝐗 𝑡 (often hidden) State of the environment Not directly observable but defines causal dynamics Evidence variables 𝐄 𝑡 Caused by the state of the environment How to model the example problems? Stock market Weather Diagnosis, e.g., cold vs chronic pharyngitis given coughing Fei Fang 12/8/2018
Temporal Probabilistic Model Transition model: How the world (i.e., state of the environment, 𝐗 𝑡 ) evolves Generally 𝐏( 𝐗 𝑡 | 𝐗 0:𝑡−1 ) Markov assumption: current state only depend on a finite fixed number of previous states 𝐏 𝐗 𝑡 𝐗 0:𝑡−1 =𝐏( 𝐗 𝑡 | 𝐗 𝑡−𝑘:𝑡−1 ) First-order Markov process: current state only depend on the previous state and not earlier states 𝐏 𝐗 𝑡 𝐗 0:𝑡−1 =𝐏( 𝐗 𝑡 | 𝐗 𝑡−1 ) Stationary assumption: 𝐏 𝐗 𝑡 𝐗 𝑡−𝑘:𝑡−1 =𝐏 𝐗 𝑡−1 𝐗 𝑡−𝑘−1:𝑡−2 Markov process or Markov Chain Andrei Andreyevich Markov (1856-1922) Fei Fang 12/8/2018
Temporal Probabilistic Model Sensor model / observation model: How the evidence variables ( 𝐄 𝑡 ) get their values (assume we get observations starting from 𝑡=1) Generally 𝐏 𝐄 𝑡 𝐗 0:𝑡−1 , 𝐄 1:𝑡−1 Sensor Markov assumption: only depend on current state 𝐏 𝐄 𝑡 𝐗 0:𝑡−1 , 𝐄 1:𝑡−1 =𝐏( 𝐄 𝑡 | 𝐗 𝑡 ) Initial state model: Prior probability distribution at time 0, i.e, 𝐏( 𝐗 0 ) For a first-order Markov process with sensor Markov assumption, full joint probability distribution is 𝐏 𝐗 0:𝑡−1 , 𝐄 1:𝑡−1 =𝐏( 𝐗 0 ) 𝑖=1 𝑡 𝐏( 𝐗 𝑖 | 𝐗 𝑖−1 )𝐏( 𝐄 𝑖 | 𝐗 𝑖 ) Fei Fang 12/8/2018
Inference in Temporal Probabilistic Model Filtering / State estimation: Posterior distribution over current state given all evidence so far, 𝐏( 𝐗 𝑡 | 𝐞 1:𝑡 ) Prediction: Posterior distribution over future state given all evidence to date, 𝐏( 𝐗 𝑡+1 | 𝐞 1:𝑡 ) Smoothing: Posterior distribution of past state given all evidence up to present, 𝐏( 𝐗 𝑘 | 𝐞 1:𝑡 ) Most likely explanation argmax 𝐱 1:𝑡 𝑃( 𝐱 1:𝑡 | 𝐞 1:𝑡 ) Learning: Learn transition and sensor model from observations (not covered) Fei Fang 12/8/2018
Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Outline Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Dynamic Bayes’ Net (DBN) Applications of DBN Special classes of DBN Fei Fang 12/8/2018
Hidden Markov Model HMM A first-order Markov process that is stationary Satisfy sensor Markov assumption Single discrete random variable 𝑋 𝑡 to represent state (hidden) Single evidence variables 𝐸 𝑡 Specified by 𝐏( 𝑋 0 ), 𝐏 𝑋 𝑡 𝑋 𝑡−1 , and 𝐏( 𝐸 𝑡 | 𝑋 𝑡 ) Fei Fang 12/8/2018
Example: Umbrella Security guard in a underground installation Want to infer whether it is raining based on whether your director bring a umbrella Random Variables: Hidden variable: 𝑅𝑎𝑖 𝑛 𝑡 , Domain= 𝑡𝑟𝑢𝑒,𝑓𝑎𝑙𝑠𝑒 Evidence variable: 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 , Domain= 𝑡𝑟𝑢𝑒,𝑓𝑎𝑙𝑠𝑒 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018
Hidden Markov Model Matrix representation for 𝐏 𝑋 𝑡 𝑋 𝑡−1 (time invariant) 𝐓 𝑖𝑗 =𝑃 𝑋 𝑡 =𝑗 𝑋 𝑡−1 =𝑖 𝐓 represents 𝐏 𝑋 𝑡 𝑋 𝑡−1 Matrix representation for 𝐏( 𝑒 𝑡 | 𝑋 𝑡 ) (depend on evidence) 𝐎 𝑡,𝑖𝑖 =𝑃 𝐸 𝑡 = 𝑒 𝑡 𝑋 𝑡 =𝑖 , 𝐎 𝑡,𝑖𝑗 =0,∀𝑖≠𝑗 𝐎 𝑡 represents 𝐏( 𝑒 𝑡 | 𝑋 𝑡 ) If 𝑈 1 =𝑡, 𝑈 3 =𝑓, then 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018
Inference in HMM: Filtering Filtering / State estimation: Posterior distribution over current state given all evidence so far, 𝐏 𝑋 𝑡 𝑒 1:𝑡 𝐏(𝑅𝑎𝑖 𝑛 𝑡 |𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 1 ,…,𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 ) 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018
Inference in HMM: Filtering Filtering / State estimation: Posterior distribution over current state given all evidence so far, 𝐏 𝑋 𝑡 𝑒 1:𝑡 𝐏( 𝑋 0 ) is given 𝐏 𝑋 1 𝑒 1 ? 𝐏 𝑋 2 𝑒 1:2 ? Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 Fei Fang 12/8/2018
Inference in HMM: Filtering Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 𝐏 𝑋 𝑡+1 𝑒 1:𝑡+1 Fei Fang 12/8/2018
Inference in HMM: Filtering So given 𝐏 𝑋 𝑡 𝑒 1:𝑡 , we can compute 𝐏 𝑋 𝑡+1 𝑒 1:𝑡+1 according to Denote 𝐏( 𝑋 𝑡 | 𝑒 1:𝑡 ) by 𝐟 1:𝑡 . Since 𝑋 𝑡 is discrete valued, 𝐟 1:𝑡 can be viewed as a vector. The 𝑗 𝑡ℎ element of 𝐟 1:𝑡+1 is So using the matrix representation for HMM, we have This is not matrix multiplication 𝐏 𝑋 𝑡+1 𝑒 1:𝑡+1 =𝛼𝐏 𝑒 𝑡+1 𝑋 𝑡+1 𝑥 𝑡 𝐏 𝑋 𝑡+1 𝑥 𝑡 𝑃( 𝑥 𝑡 | 𝑒 1:𝑡 ) (Forward message) This is matrix multiplication Fei Fang 12/8/2018
Inference in HMM: Filtering Filtering / State estimation: Posterior distribution over current state given all evidence so far, 𝐏 𝑋 𝑡 𝑒 1:𝑡 Set 𝐟 1:0 ←𝐏( 𝑋 0 ) Recursively compute 𝐟 1:𝑡+1 ←𝛼 𝐎 𝑡+1 𝐓 T 𝐟 1:𝑡 Return 𝐟 1:𝑡 (Forward operation) 𝐎 𝑡+1 is determined by 𝑒 𝑡+1 Fei Fang 12/8/2018
Example: Umbrella 𝐏 𝑅 0 =〈0.5,0.5〉, 𝐟 1:0 =〈0.5,0.5〉 𝐏 𝑅 0 =〈0.5,0.5〉, 𝐟 1:0 =〈0.5,0.5〉 Given 𝑈 1 =𝑡𝑟𝑢𝑒, 𝐟 1:1 = Given 𝑈 2 =𝑡𝑟𝑢𝑒, 𝐟 1:2 = 𝐟 1:𝑡+1 ←𝛼 𝐎 𝑡+1 𝐓 T 𝐟 1:𝑡 Evidence: 𝑈 1 =𝑡, 𝑈 2 =𝑡 𝐓= 0.7 0.3 0.3 0.7 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 𝐎 1 = 𝐎 2 = 0.9 0 0 0.2 Fei Fang 12/8/2018
Quiz 1 𝐟 1:3 =𝛼 𝐎 3 𝐓 T 𝐟 1:2 We have computed 𝐟 1:2 ≈ 0.883,0.117 , if 𝑈 3 = 𝑓𝑎𝑙𝑠𝑒, what do we know about 𝑃 𝑟 3 𝑒 1:3 ? A: 𝑃 𝑟 3 𝑒 1:3 =0.883 B:𝑃 𝑟 3 𝑒 1:3 <0.883 C:𝑃 𝑟 3 𝑒 1:3 >0.883 𝑟 3 : 𝑅 3 =𝑡𝑟𝑢𝑒 Evidence: 𝑈 1 =𝑡, 𝑈 2 =𝑡 𝐓= 0.7 0.3 0.3 0.7 𝐎 1 = 𝐎 2 = 0.9 0 0 0.2 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 𝐎 3 = 0.1 0 0 0.8 Fei Fang 12/8/2018
Inference in HMM: Find Most Likely Explanation Most likely explanation argmax 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) Viterbi Algorithm (Dynamic-Programming based algorithm) Applications: decoding in communications, speech recognition, bioinformatics etc Andrew James Viterbi (1935-present) Fei Fang 12/8/2018
State-time graph: each node represent ( 𝑥 𝑡 ,𝑡) Viterbi Algorithm State-time graph: each node represent ( 𝑥 𝑡 ,𝑡) Task: Given evidence sequence, find most likely path Intuition: If most likely path from time 1 to 𝑡 is known, then it is easy to find the most likely path from time 1 to 𝑡+1 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑓𝑎𝑙𝑠𝑒 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 Fei Fang 12/8/2018
Viterbi Algorithm Most likely explanation argmax 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) Based on the intuition, is it possible to represent max 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) in a recursive manner (i.e., computed from max 𝑥 1:𝑡−1 𝑃( 𝑥 1:𝑡−1 | 𝑒 1:𝑡−1 ) )? Unfortunately, No Rewrite We notice max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝑃( 𝑥 1:𝑡−1 , 𝑥 𝑡 | 𝑒 1:𝑡 ) can be computed recursively max 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) = max 𝑥 𝑡 max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝑃( 𝑥 1:𝑡−1 , 𝑥 𝑡 | 𝑒 1:𝑡 ) Fei Fang 12/8/2018
Viterbi Algorithm max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝐏( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 | 𝑒 1:𝑡+1 ) =𝛼𝑃( 𝑒 𝑡+1 | 𝑋 𝑡+1 ) max 𝑥 𝑡 (𝐏 𝑋 𝑡+1 𝑥 𝑡 max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝑃( 𝑥 1 ,…, 𝑥 𝑡−1 , 𝑥 𝑡 | 𝑒 1:𝑡 ) ) Posterior probability of most likely path with end node ( 𝑥 𝑡+1 ,𝑡+1) can be found by checking the most likely paths with end node 𝑥 𝑡 ,𝑡 ,∀ 𝑥 𝑡 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑓𝑎𝑙𝑠𝑒 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 Fei Fang 12/8/2018
Viterbi Algorithm max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝐏( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 | 𝑒 1:𝑡+1 ) =𝛼𝑃( 𝑒 𝑡+1 | 𝑋 𝑡+1 ) max 𝑥 𝑡 (𝐏 𝑋 𝑡+1 𝑥 𝑡 max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝑃( 𝑥 1 ,…, 𝑥 𝑡−1 , 𝑥 𝑡 | 𝑒 1:𝑡 ) ) Denote max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝐏( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 | 𝑒 1:𝑡+1 ) as 𝐦 1:𝑡+1 Since 𝑋 𝑡 is discrete valued, 𝐦 1:𝑡 can be viewed as a vector. The 𝑗 𝑡ℎ element of 𝐦 1:𝑡+1 is Fei Fang 12/8/2018
Viterbi Algorithm So each node in the state-time graph is associated with a value 𝐦 1:𝑡 𝑖 = max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝐏( 𝑥 1 ,…, 𝑥 𝑡−1 , 𝑋 𝑡 | 𝑒 1:𝑡 ) , which can be computed recursively max 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) = max 𝑖 𝐦 1:𝑡 (𝑖) How to get the most likely path argmax 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) ? Highlight the “edge” that leads to the maximum on the state-time graph (Note: Normalization coefficient 𝛼 can be ignored) Recall 𝐟 1:𝑡 =𝐏( 𝑋 𝑡 | 𝑒 1:𝑡 ) 𝐦 1:𝑡+1 𝑗 =𝛼 𝐎 𝑡+1,𝑗𝑗 max 𝑖 ( 𝐓 𝑖𝑗 𝐦 1:𝑡 (𝑖)) 𝐦 1:1 =𝐏 𝑋 1 𝑒 1 = 𝐟 1:1 =𝛼𝐏 𝑒 1 𝑋 1 𝐏 𝑋 1 =𝛼𝐏 𝑒 1 𝑋 1 𝑥 0 𝐏 𝑋 1 𝑥 0 𝑃( 𝑥 0 ) Fei Fang 12/8/2018
Viterbi Algorithm 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑓𝑎𝑙𝑠𝑒 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑓𝑎𝑙𝑠𝑒 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝐓= 0.7 0.3 0.3 0.7 𝐦 1:1 = 𝐟 1:𝑡 = 0.818,0.182 𝐎 1 = 𝐎 2 = 0.9 0 0 0.2 𝐎 3 = 0.1 0 0 0.8 𝐦 1:𝑡+1 = max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝐏( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 | 𝑒 1:𝑡+1 ) 𝐦 1:𝑡+1 𝑗 =𝛼 𝐎 𝑡+1,𝑗𝑗 max 𝑖 ( 𝐓 𝑖𝑗 𝐦 1:𝑡 (𝑖)) Fei Fang 12/8/2018
Quiz 2 𝐦 1:1 = 𝐟 1:1 =𝐏 𝑋 1 𝑒 1 =𝐻 𝐦 1:𝑡+1 𝑗 =𝛼 𝐎 𝑡+1,𝑗𝑗 max 𝑖 ( 𝐓 𝑖𝑗 𝐦 1:𝑡 (𝑖)) Assume an HMM with two hidden states, + and − and two observations state 𝐿 and 𝐻. What’s the most probable state sequence for observation sequence 𝐿,𝐻 given 𝑃 𝑋 0 =+ =1? A: (+,+) B: (+,−) C: (−,+) D: (−,−) 𝑋 𝑡−1 𝑃( 𝑋 𝑡 =+) + 0.7 − 0.6 𝑋 𝑡−1 𝑋 𝑡 𝑋 𝑡 𝑃( 𝐸 𝑡 =𝐿) + 0.1 − 0.8 𝐸 𝑡 Fei Fang 12/8/2018
Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Outline Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Dynamic Bayes’ Net (DBN) Applications of DBN Special classes of DBN Fei Fang 12/8/2018
Kalman Filter A glimpse for probabilistic modeling with continuous variables Estimates the internal state of a linear dynamic system from a series of noisy measurements We will only consider a simple case State variable 𝑋 𝑡 (hidden) Evidence variable 𝑍 𝑡 (observation) First-order Markov process Stationary process Linear Gaussian distribution Example: consumer confidence index Measured by consumer survey Rudolf E. Kálmán (1930-2016) Fei Fang 12/8/2018
Recall 1-D Gaussian distribution Kalman Filter Recall 1-D Gaussian distribution Mean 𝜇, variance 𝜎 2 (standard deviation 𝜎) (pdf) 𝑃 𝑥 = 1 𝜎 2𝜋 𝑒 − 𝑥−𝜇 2 𝜎 2 𝐏( 𝑋 0 ), 𝐏( 𝑋 𝑡 | 𝑋 𝑡−1 ), 𝐏( 𝑍 𝑡 | 𝑋 𝑡 ) are all Gaussian 𝑃 𝑥 0 =𝛼 𝑒 1 2 ( 𝑥 0 − 𝜇 0 2 𝜎 0 2 ) 𝑃 𝑥 𝑡+1 | 𝑥 𝑡 =𝛼 𝑒 1 2 ( 𝑥 𝑡+1 − 𝑥 𝑡 2 𝜎 𝑥 2 ) 𝑃 𝑧 𝑡 𝑥 𝑡 =𝛼 𝑒 1 2 ( 𝑧 𝑡 − 𝑥 𝑡 2 𝜎 𝑧 2 ) Fei Fang 12/8/2018
𝐏( 𝑋 𝑡 | 𝑍 1:𝑡 ) is also Gaussian Kalman Filter 𝐏( 𝑋 𝑡 | 𝑍 1:𝑡 ) is also Gaussian Let 𝜇 𝑡 and 𝜎 𝑡 be the mean and variance of 𝐏( 𝑋 𝑡 | 𝑍 1:𝑡 ), then Interpretation 𝜇 𝑡 is a weighted mean of 𝑧 𝑡+1 and 𝜇 𝑡 . If observation is unreliable ( 𝜎 𝑧 is large), then 𝜇 𝑡+1 is closer to 𝜇 𝑡 , otherwise closer to 𝑧 𝑡+1 𝜎 𝑡+1 2 is independent of the observation 𝑧 𝑡+1 𝜇 𝑡+1 = 𝜎 𝑡 2 + 𝜎 𝑥 2 𝑧 𝑡+1 + 𝜎 𝑧 2 𝜇 𝑡 𝜎 𝑡 2 + 𝜎 𝑥 2 + 𝜎 𝑧 2 𝜎 𝑡+1 2 = 𝜎 𝑡 2 + 𝜎 𝑥 2 𝜎 𝑧 2 𝜎 𝑡 2 + 𝜎 𝑥 2 + 𝜎 𝑧 2 (see detailed derivation in textbook) Fei Fang 12/8/2018
Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Outline Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Dynamic Bayes’ Net (DBN) Applications of DBN Special classes of DBN Fei Fang 12/8/2018
Dynamic Bayesian Networks A Bayes’ Net that represents a temporal probability model Any temporal probability model can be represented as a DBN DBN to represent knowledge of the domain and describe the structure of the problem First-order Markov Chain Second-order Markov Chain Fei Fang 12/8/2018
Dynamic Bayesian Networks For simplicity, here we consider the case where variables and their links are replicated from slice to slice and DBN represents a first-order Markov process that is stationary Such a DBN is specified by 𝐏( 𝐗 0 ), 𝐏 𝐗 𝑡 𝐗 𝑡−1 , and 𝐏( 𝐗 𝑡 | 𝐄 𝑡 ) HMM and Kalman Filters are special cases of DBN Any discrete-variable DBN can be cast as a HMM By introducing metavariables However, use DBN ensures the sparsity of the model Fei Fang 12/8/2018
Approximate inference Inference in DBN Exact inference “Unroll” the network, apply exact inference techniques directly Approximate inference Variate of likelihood weighting (not very efficient) Particle filtering (commonly used) Fei Fang 12/8/2018
Particle Filtering (Not Required) One step of particle filtering: given 𝑁 samples of 𝐗 𝑡 , denoted as 𝑆 and evidence 𝐞 𝑡+1 , get 𝑁 samples of 𝐗 𝑡+1 Get a set of 𝑁 samples, denoted as 𝑆′, for 𝐗 𝑡+1 and associate each sample with a weight For each sample of 𝐗 𝑡 in 𝑆, sample the value of 𝐗 𝑡+1 based on 𝑃( 𝐗 𝑡+1 | 𝐗 𝑡 ) and compute the weight as 𝑃( 𝐞 𝑡+1 | 𝐗 𝑡+1 ) Resample based on the weight to get a new set of 𝑁 samples for 𝐗 𝑡+1 , denoted as 𝑆′′ Each new sample is selected from 𝑆′. The probability of sampling 𝑠∈ 𝑆′ is proportional to its weight Sampled with replacement, i.e., one item can be sampled multiple times Kalman filter: exact update of the belief state for linear dynamical systems Particle filter: approximate update for general systems Fei Fang 12/8/2018
Particle Filtering (Not Required) Approximate inference using particle filtering for multiple time steps: Apply one-step particle filtering in every time step, recursively update the set of samples Initialize 𝑆 based on 𝐏( 𝐗 0 ) Kalman filter: exact update of the belief state for linear dynamical systems Particle filter: approximate update for general systems Fei Fang 12/8/2018
Example: Umbrella (Not Required) 𝑅 0 𝑅 1 𝑅 1 𝑅 1 Propagate Weight Resample Fei Fang 12/8/2018
Particle Filtering (Not Required) We can prove that if the 𝑁 samples given initially approximates 𝑃( 𝐱 𝑡 | 𝐞 1:𝑡 ), i.e., 𝑁 𝐱 𝑡 | 𝐞 1:𝑡 𝑁 ≈𝑃( 𝐱 𝑡 | 𝐞 1:𝑡 ), then the new samples approximates 𝑃( 𝐱 𝑡+1 | 𝐞 1:𝑡+1 ), i.e., 𝑁 𝐱 𝑡+1 | 𝐞 1:𝑡+1 𝑁 ≈𝑃( 𝑥 𝑡+1 | 𝑒 1:𝑡+1 ) (see details in the textbook) By induction, particle filtering is consistent: it provides the correct probabilities as 𝑁→∞ In practice, particle filtering works very well Kalman filter: exact update of the belief state for linear dynamical systems Particle filter: approximate update for general systems Fei Fang 12/8/2018
Quiz 3 (Not Required) Using particle sampling for the Umbrella example, if in one step, we get 100 samples with + and total weight 1, and 400 samples with − and total weight 2 after propagating and weighting (before resampling), which of the following best estimates the number of samples with + after resampling? A: 100 B: 400 C: 167 D: 333 Each new sample is selected from 𝑆′. The probability of sampling 𝑠∈𝑆′ is proportional to its weight Sampled with replacement, i.e., one item can be sampled multiple times Fei Fang 12/8/2018
Applications of DBN: Place and Object Recognition Which are hidden variables? Torralba, et al. ICCV, 2003. Context-based vision system for place and object recognition Fei Fang 12/8/2018
Applications of DBN: Place and Object Recognition low-level features Use scene context to disambiguate objective recognition Inferring object types based on scene and object features Context priming to decide which object detectors to run Torralba, et al. ICCV, 2003. Context-based vision system for place and object recognition Fei Fang 12/8/2018
Applications of DBN: Infer and Predict Poaching Activity Not surprisingly, naively applying existing ML algorithms does not work well. The reason is two-fold. The first reason is that the dataset is quite sparse. Unlike image classification or movie recommendation, the amount of data we have is quite limited. Second, for the data we have, it is hard to determine their labels. If patrollers found poaching activities in an area, then we label the area as attacked without doubt. However, if the patrollers did not find any poaching activities, we are not sure if we should label it as not attacked, because the patroller may have missed the sign of poaching and the poached animals cannot report the instance by themselves. Attacked Not Attacked 12/8/2018
Applications of DBN: Infer and Predict Poaching Activity Domain knowledge Poaching activity is impacted by ranger patrol effort, as well as features such as animal density Detection probability is also impacted by ranger patrol effort and a subset of these features Ranger patrol Probability of attack on target j Area habitat Animal density Area slope Detection probability Distance to rivers / roads … Nguyen et al. Capture: A new predictive anti-poaching tool for wildlife protection. In AAMAS, 2016 12/8/2018
Applications of DBN: Infer and Predict Poaching Activity 𝑎 𝑡,𝑖 : Whether there is poaching 𝑐 𝑡,𝑖 : Ranger patrol effort 𝑜 𝑡,𝑖 : Whether poaching sign is found 𝑥 𝑡,𝑖 : features, e.g., distance from road, animal density etc Nguyen et al. Capture: A new predictive anti-poaching tool for wildlife protection. In AAMAS, 2016 Fei Fang 12/8/2018
Applications of DBN: Predict Urban Crime Opportunistic criminals: Wander around and seek opportunities to commit crimes 𝐷 𝑖 𝑡 : #defenders (known) 𝑋 𝑖 𝑡 : #criminals (hidden) 𝑌 𝑖 𝑡 : #crimes (known) 𝑡 𝑡+1 Fei Fang 12/8/2018
Summary Applications of DBN Place and Object Recognition Temporal Models Dynamic Bayes’ Net (DBN) Particle Filtering Hidden Markov Models (HMM) Kalman Filter Viterbi Algorithm Applications of DBN Place and Object Recognition Infer and Predict Poaching Activity Predict Urban Crime Fei Fang 12/8/2018
Some slides are borrowed from previous slides made by Tai Sing Lee Acknowledgment Some slides are borrowed from previous slides made by Tai Sing Lee Fei Fang 12/8/2018
Material in the backup slides in this lecture are not required Fei Fang 12/8/2018
max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝑃( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 = 𝑥 𝑡+1 | 𝑒 1:𝑡+1 ) =? Viterbi Algorithm max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝑃( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 = 𝑥 𝑡+1 | 𝑒 1:𝑡+1 ) =? Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 If 𝑓 𝑎 ≥0,∀𝑎 and 𝑔 𝑎,𝑏 ≥0, ∀𝑎,𝑏, then max 𝑎,𝑏 𝑓 𝑎 𝑔 𝑎,𝑏 = max 𝑎 (𝑓 𝑎 max 𝑏 𝑔 𝑎,𝑏 ) Fei Fang 12/8/2018
Inference in HMM: Prediction Prediction: Posterior distribution over future state given all evidence to date, 𝐏( 𝑋 𝑡+1 | 𝑒 1:𝑡 ) 𝐏(𝑅𝑎𝑖 𝑛 𝑡+1 |𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 1 ,…,𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 ) 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018
Inference in HMM: Prediction We know So 𝐏 𝑋 𝑡+1 𝑒 1:𝑡 = Can further compute 𝐏 𝑋 𝑡+𝑘+1 𝑒 1:𝑡 through recursive computation 𝐟 1:𝑡 =𝐏( 𝑋 𝑡 | 𝑒 1:𝑡 ) 𝐟 1:𝑡+1 =𝛼 𝐎 𝑡+1 𝐓 T 𝐟 1:𝑡 Fei Fang 12/8/2018
Inference in HMM: Smoothing Smoothing: Posterior distribution of past state given all evidence up to present, 𝐏( 𝑋 𝑘 | 𝑒 1:𝑡 ) 𝐏(𝑅𝑎𝑖 𝑛 𝑘 |𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 1 ,…,𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 ) 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018
Inference in HMM: Smoothing Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 𝐏 𝑋 𝑘 𝑒 1:𝑡 =? Denoted 𝐏 𝑒 𝑘+1:𝑡 𝑋 𝑘 by 𝐛 𝑘+1:𝑡 , then (Backward message) This is not matrix multiplication. If view them as vectors, then the equation is valid if × represents pointwise multiplication Fei Fang 12/8/2018
Inference in HMM: Smoothing Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 𝐛 𝑘+1:𝑡 =𝐏 𝑒 𝑘+1:𝑡 𝑋 𝑘 =? Fei Fang 12/8/2018
Inference in HMM: Smoothing So given 𝐛 𝑘+2:𝑡 =𝐏 𝑒 𝑘+2:𝑡 𝑋 𝑘+1 , we can compute 𝐛 𝑘+1:𝑡 = 𝐏 𝑒 𝑘+1:𝑡 𝑋 𝑘 according to Since 𝑋 𝑡 is discrete valued, 𝐛 𝑘+1:𝑡 can be viewed as a vector. The 𝑖 𝑡ℎ element of 𝐛 𝑘+1:𝑡 is So using the matrix representation for HMM, we have This is not matrix multiplication 𝐏 𝑒 𝑘+1:𝑡 𝑋 𝑘 = 𝑥 𝑘+1 𝑃 𝑒 𝑘+1 𝑥 𝑘+1 𝑃 𝑒 𝑘+2:𝑡 𝑥 𝑘+1 𝐏( 𝑥 𝑘+1 | 𝑋 𝑘 ) This is matrix multiplication Fei Fang 12/8/2018
Inference in HMM: Smoothing Smoothing: Posterior distribution of past state given all evidence up to present, 𝐏( 𝑋 𝑘 | 𝑒 1:𝑡 ) Set 𝐟 1:0 ←𝐏( 𝑋 0 ) and 𝐛 𝑡+1:𝑡 ←𝟏 Recursively compute 𝐟 1:𝑡+1 ←𝛼 𝐎 𝑡+1 𝐓 T 𝐟 1:𝑡 𝐛 𝑘+1:𝑡 ←𝐓 𝐎 𝑘+1 𝐛 𝑘+2:𝑡 Return 𝐏 𝑋 𝑘 𝑒 1:𝑡 =𝛼 𝐛 𝑘+1:𝑡 × 𝐟 1:𝑘 Smoothing for all 𝑘∈{1..𝑡}: Posterior distribution of all past states given all evidence up to present Forward-Backward Algorithm: Store all the 𝐟 and 𝐛, return 𝐏 𝑋 𝑘 𝑒 1:𝑡 for all 𝑘 (Forward operation) 𝐎 𝑡+1 is determined by 𝑒 𝑡+1 (Backward operation) Fei Fang 12/8/2018
General Inference in Temporal Models 𝐟 1:𝑡+1 =𝛼FORWARD( 𝐟 1:𝑡 , 𝐞 𝑡+1 ) 𝐛 𝑘+1:𝑡 =BACKWARD( 𝐛 𝑘+2:𝑡 , 𝐞 𝑡+1 ) Filtering: 𝐏 𝐗 𝑡 𝐞 1:𝑡 = 𝐟 1:𝑡 Prediction: 𝐏 𝐗 𝑡+𝑘+1 𝐞 1:𝑡 = 𝐱 𝑡+𝑘 𝐏 𝐗 𝑡+𝑘+1 𝐱 𝑡+𝑘 𝑃 𝐱 𝑡+𝑘 𝐞 1:𝑡 Smoothing: 𝐏 𝐗 𝑘 𝐞 1:𝑡 =𝛼 𝐛 𝑘+1:𝑡 × 𝐟 1:𝑘 Forward-backward algorithm for smoothing the whole sequence Find most likely explanation: Viterbi algorithm Fei Fang 12/8/2018