Instructors: Fei Fang (This Lecture) and Dave Touretzky

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

Dynamic Bayesian Networks (DBNs)
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Introduction of Probabilistic Reasoning and Bayesian Networks
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
Advanced Artificial Intelligence
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
… Hidden Markov Models Markov assumption: Transition model:
10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
Markov Localization & Bayes Filtering
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Computer vision: models, learning and inference Chapter 19 Temporal models.
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Probabilistic Robotics Bayes Filter Implementations.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Mobile Robot Localization (ch. 7)
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
1 Chapter 15 Probabilistic Reasoning over Time. 2 Outline Time and UncertaintyTime and Uncertainty Inference: Filtering, Prediction, SmoothingInference:
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Tracking with dynamics
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Today.
Statistical Models for Automatic Speech Recognition
Instructor: Vincent Conitzer
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Markov ó Kalman Filter Localization
Probabilistic Reasoning Over Time
Probabilistic Reasoning over Time
Probabilistic Reasoning over Time
Probability and Time: Markov Models
Hidden Markov Models Part 2: Algorithms
Probability and Time: Markov Models
Statistical Models for Automatic Speech Recognition
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Probability and Time: Markov Models
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Hidden Markov Models Markov chains not so useful for most agents
Expectation-Maximization & Belief Propagation
LECTURE 15: REESTIMATION, EM AND MIXTURES
Chapter14-cont..
Probability and Time: Markov Models
Instructor: Vincent Conitzer
Instructor: Vincent Conitzer
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

Instructors: Fei Fang (This Lecture) and Dave Touretzky Artificial Intelligence: Representation and Problem Solving Probabilistic Reasoning (4): Temporal Models 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126 12/8/2018

Probability Models and Probabilistic Inference Bayes’ Net Recap Probability Models and Probabilistic Inference Bayes’ Net Exact Inference Approximate inference: sampling Today: Probabilistic reasoning over time Fei Fang 12/8/2018

Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Outline Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Dynamic Bayes’ Net (DBN) Particle filtering Applications of DBN Special classes of DBN Fei Fang 12/8/2018

Temporal Probabilistic Model Why do we need temporal probabilistic model? The world changes over time, and what happens now impacts what will happen in the future Stock market Weather Sometimes world state become clearer as more evidence is collected over time Diagnosis, e.g., cold vs chronic pharyngitis given coughing How to model the time? View the world as time slices: discrete time steps Fei Fang 12/8/2018

Temporal Probabilistic Model State variables 𝐗 𝑡 (often hidden) State of the environment Not directly observable but defines causal dynamics Evidence variables 𝐄 𝑡 Caused by the state of the environment How to model the example problems? Stock market Weather Diagnosis, e.g., cold vs chronic pharyngitis given coughing Fei Fang 12/8/2018

Temporal Probabilistic Model Transition model: How the world (i.e., state of the environment, 𝐗 𝑡 ) evolves Generally 𝐏( 𝐗 𝑡 | 𝐗 0:𝑡−1 ) Markov assumption: current state only depend on a finite fixed number of previous states 𝐏 𝐗 𝑡 𝐗 0:𝑡−1 =𝐏( 𝐗 𝑡 | 𝐗 𝑡−𝑘:𝑡−1 ) First-order Markov process: current state only depend on the previous state and not earlier states 𝐏 𝐗 𝑡 𝐗 0:𝑡−1 =𝐏( 𝐗 𝑡 | 𝐗 𝑡−1 ) Stationary assumption: 𝐏 𝐗 𝑡 𝐗 𝑡−𝑘:𝑡−1 =𝐏 𝐗 𝑡−1 𝐗 𝑡−𝑘−1:𝑡−2 Markov process or Markov Chain Andrei Andreyevich Markov (1856-1922) Fei Fang 12/8/2018

Temporal Probabilistic Model Sensor model / observation model: How the evidence variables ( 𝐄 𝑡 ) get their values (assume we get observations starting from 𝑡=1) Generally 𝐏 𝐄 𝑡 𝐗 0:𝑡−1 , 𝐄 1:𝑡−1 Sensor Markov assumption: only depend on current state 𝐏 𝐄 𝑡 𝐗 0:𝑡−1 , 𝐄 1:𝑡−1 =𝐏( 𝐄 𝑡 | 𝐗 𝑡 ) Initial state model: Prior probability distribution at time 0, i.e, 𝐏( 𝐗 0 ) For a first-order Markov process with sensor Markov assumption, full joint probability distribution is 𝐏 𝐗 0:𝑡−1 , 𝐄 1:𝑡−1 =𝐏( 𝐗 0 ) 𝑖=1 𝑡 𝐏( 𝐗 𝑖 | 𝐗 𝑖−1 )𝐏( 𝐄 𝑖 | 𝐗 𝑖 ) Fei Fang 12/8/2018

Inference in Temporal Probabilistic Model Filtering / State estimation: Posterior distribution over current state given all evidence so far, 𝐏( 𝐗 𝑡 | 𝐞 1:𝑡 ) Prediction: Posterior distribution over future state given all evidence to date, 𝐏( 𝐗 𝑡+1 | 𝐞 1:𝑡 ) Smoothing: Posterior distribution of past state given all evidence up to present, 𝐏( 𝐗 𝑘 | 𝐞 1:𝑡 ) Most likely explanation argmax 𝐱 1:𝑡 𝑃( 𝐱 1:𝑡 | 𝐞 1:𝑡 ) Learning: Learn transition and sensor model from observations (not covered) Fei Fang 12/8/2018

Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Outline Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Dynamic Bayes’ Net (DBN) Applications of DBN Special classes of DBN Fei Fang 12/8/2018

Hidden Markov Model HMM A first-order Markov process that is stationary Satisfy sensor Markov assumption Single discrete random variable 𝑋 𝑡 to represent state (hidden) Single evidence variables 𝐸 𝑡 Specified by 𝐏( 𝑋 0 ), 𝐏 𝑋 𝑡 𝑋 𝑡−1 , and 𝐏( 𝐸 𝑡 | 𝑋 𝑡 ) Fei Fang 12/8/2018

Example: Umbrella Security guard in a underground installation Want to infer whether it is raining based on whether your director bring a umbrella Random Variables: Hidden variable: 𝑅𝑎𝑖 𝑛 𝑡 , Domain= 𝑡𝑟𝑢𝑒,𝑓𝑎𝑙𝑠𝑒 Evidence variable: 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 , Domain= 𝑡𝑟𝑢𝑒,𝑓𝑎𝑙𝑠𝑒 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018

Hidden Markov Model Matrix representation for 𝐏 𝑋 𝑡 𝑋 𝑡−1 (time invariant) 𝐓 𝑖𝑗 =𝑃 𝑋 𝑡 =𝑗 𝑋 𝑡−1 =𝑖 𝐓 represents 𝐏 𝑋 𝑡 𝑋 𝑡−1 Matrix representation for 𝐏( 𝑒 𝑡 | 𝑋 𝑡 ) (depend on evidence) 𝐎 𝑡,𝑖𝑖 =𝑃 𝐸 𝑡 = 𝑒 𝑡 𝑋 𝑡 =𝑖 , 𝐎 𝑡,𝑖𝑗 =0,∀𝑖≠𝑗 𝐎 𝑡 represents 𝐏( 𝑒 𝑡 | 𝑋 𝑡 ) If 𝑈 1 =𝑡, 𝑈 3 =𝑓, then 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018

Inference in HMM: Filtering Filtering / State estimation: Posterior distribution over current state given all evidence so far, 𝐏 𝑋 𝑡 𝑒 1:𝑡 𝐏(𝑅𝑎𝑖 𝑛 𝑡 |𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 1 ,…,𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 ) 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018

Inference in HMM: Filtering Filtering / State estimation: Posterior distribution over current state given all evidence so far, 𝐏 𝑋 𝑡 𝑒 1:𝑡 𝐏( 𝑋 0 ) is given 𝐏 𝑋 1 𝑒 1 ? 𝐏 𝑋 2 𝑒 1:2 ? Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 Fei Fang 12/8/2018

Inference in HMM: Filtering Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 𝐏 𝑋 𝑡+1 𝑒 1:𝑡+1 Fei Fang 12/8/2018

Inference in HMM: Filtering So given 𝐏 𝑋 𝑡 𝑒 1:𝑡 , we can compute 𝐏 𝑋 𝑡+1 𝑒 1:𝑡+1 according to Denote 𝐏( 𝑋 𝑡 | 𝑒 1:𝑡 ) by 𝐟 1:𝑡 . Since 𝑋 𝑡 is discrete valued, 𝐟 1:𝑡 can be viewed as a vector. The 𝑗 𝑡ℎ element of 𝐟 1:𝑡+1 is So using the matrix representation for HMM, we have This is not matrix multiplication 𝐏 𝑋 𝑡+1 𝑒 1:𝑡+1 =𝛼𝐏 𝑒 𝑡+1 𝑋 𝑡+1 𝑥 𝑡 𝐏 𝑋 𝑡+1 𝑥 𝑡 𝑃( 𝑥 𝑡 | 𝑒 1:𝑡 ) (Forward message) This is matrix multiplication Fei Fang 12/8/2018

Inference in HMM: Filtering Filtering / State estimation: Posterior distribution over current state given all evidence so far, 𝐏 𝑋 𝑡 𝑒 1:𝑡 Set 𝐟 1:0 ←𝐏( 𝑋 0 ) Recursively compute 𝐟 1:𝑡+1 ←𝛼 𝐎 𝑡+1 𝐓 T 𝐟 1:𝑡 Return 𝐟 1:𝑡 (Forward operation) 𝐎 𝑡+1 is determined by 𝑒 𝑡+1 Fei Fang 12/8/2018

Example: Umbrella 𝐏 𝑅 0 =〈0.5,0.5〉, 𝐟 1:0 =〈0.5,0.5〉 𝐏 𝑅 0 =〈0.5,0.5〉, 𝐟 1:0 =〈0.5,0.5〉 Given 𝑈 1 =𝑡𝑟𝑢𝑒, 𝐟 1:1 = Given 𝑈 2 =𝑡𝑟𝑢𝑒, 𝐟 1:2 = 𝐟 1:𝑡+1 ←𝛼 𝐎 𝑡+1 𝐓 T 𝐟 1:𝑡 Evidence: 𝑈 1 =𝑡, 𝑈 2 =𝑡 𝐓= 0.7 0.3 0.3 0.7 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 𝐎 1 = 𝐎 2 = 0.9 0 0 0.2 Fei Fang 12/8/2018

Quiz 1 𝐟 1:3 =𝛼 𝐎 3 𝐓 T 𝐟 1:2 We have computed 𝐟 1:2 ≈ 0.883,0.117 , if 𝑈 3 = 𝑓𝑎𝑙𝑠𝑒, what do we know about 𝑃 𝑟 3 𝑒 1:3 ? A: 𝑃 𝑟 3 𝑒 1:3 =0.883 B:𝑃 𝑟 3 𝑒 1:3 <0.883 C:𝑃 𝑟 3 𝑒 1:3 >0.883 𝑟 3 : 𝑅 3 =𝑡𝑟𝑢𝑒 Evidence: 𝑈 1 =𝑡, 𝑈 2 =𝑡 𝐓= 0.7 0.3 0.3 0.7 𝐎 1 = 𝐎 2 = 0.9 0 0 0.2 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 𝐎 3 = 0.1 0 0 0.8 Fei Fang 12/8/2018

Inference in HMM: Find Most Likely Explanation Most likely explanation argmax 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) Viterbi Algorithm (Dynamic-Programming based algorithm) Applications: decoding in communications, speech recognition, bioinformatics etc Andrew James Viterbi (1935-present) Fei Fang 12/8/2018

State-time graph: each node represent ( 𝑥 𝑡 ,𝑡) Viterbi Algorithm State-time graph: each node represent ( 𝑥 𝑡 ,𝑡) Task: Given evidence sequence, find most likely path Intuition: If most likely path from time 1 to 𝑡 is known, then it is easy to find the most likely path from time 1 to 𝑡+1 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑓𝑎𝑙𝑠𝑒 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 Fei Fang 12/8/2018

Viterbi Algorithm Most likely explanation argmax 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) Based on the intuition, is it possible to represent max 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) in a recursive manner (i.e., computed from max 𝑥 1:𝑡−1 𝑃( 𝑥 1:𝑡−1 | 𝑒 1:𝑡−1 ) )? Unfortunately, No  Rewrite We notice max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝑃( 𝑥 1:𝑡−1 , 𝑥 𝑡 | 𝑒 1:𝑡 ) can be computed recursively max 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) = max 𝑥 𝑡 max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝑃( 𝑥 1:𝑡−1 , 𝑥 𝑡 | 𝑒 1:𝑡 ) Fei Fang 12/8/2018

Viterbi Algorithm max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝐏( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 | 𝑒 1:𝑡+1 ) =𝛼𝑃( 𝑒 𝑡+1 | 𝑋 𝑡+1 ) max 𝑥 𝑡 (𝐏 𝑋 𝑡+1 𝑥 𝑡 max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝑃( 𝑥 1 ,…, 𝑥 𝑡−1 , 𝑥 𝑡 | 𝑒 1:𝑡 ) ) Posterior probability of most likely path with end node ( 𝑥 𝑡+1 ,𝑡+1) can be found by checking the most likely paths with end node 𝑥 𝑡 ,𝑡 ,∀ 𝑥 𝑡 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑓𝑎𝑙𝑠𝑒 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 Fei Fang 12/8/2018

Viterbi Algorithm max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝐏( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 | 𝑒 1:𝑡+1 ) =𝛼𝑃( 𝑒 𝑡+1 | 𝑋 𝑡+1 ) max 𝑥 𝑡 (𝐏 𝑋 𝑡+1 𝑥 𝑡 max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝑃( 𝑥 1 ,…, 𝑥 𝑡−1 , 𝑥 𝑡 | 𝑒 1:𝑡 ) ) Denote max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝐏( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 | 𝑒 1:𝑡+1 ) as 𝐦 1:𝑡+1 Since 𝑋 𝑡 is discrete valued, 𝐦 1:𝑡 can be viewed as a vector. The 𝑗 𝑡ℎ element of 𝐦 1:𝑡+1 is Fei Fang 12/8/2018

Viterbi Algorithm So each node in the state-time graph is associated with a value 𝐦 1:𝑡 𝑖 = max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡−1 𝐏( 𝑥 1 ,…, 𝑥 𝑡−1 , 𝑋 𝑡 | 𝑒 1:𝑡 ) , which can be computed recursively max 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) = max 𝑖 𝐦 1:𝑡 (𝑖) How to get the most likely path argmax 𝑥 1:𝑡 𝑃( 𝑥 1:𝑡 | 𝑒 1:𝑡 ) ? Highlight the “edge” that leads to the maximum on the state-time graph (Note: Normalization coefficient 𝛼 can be ignored) Recall 𝐟 1:𝑡 =𝐏( 𝑋 𝑡 | 𝑒 1:𝑡 ) 𝐦 1:𝑡+1 𝑗 =𝛼 𝐎 𝑡+1,𝑗𝑗 max 𝑖 ( 𝐓 𝑖𝑗 𝐦 1:𝑡 (𝑖)) 𝐦 1:1 =𝐏 𝑋 1 𝑒 1 = 𝐟 1:1 =𝛼𝐏 𝑒 1 𝑋 1 𝐏 𝑋 1 =𝛼𝐏 𝑒 1 𝑋 1 𝑥 0 𝐏 𝑋 1 𝑥 0 𝑃( 𝑥 0 ) Fei Fang 12/8/2018

Viterbi Algorithm 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑓𝑎𝑙𝑠𝑒 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑈𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝑓𝑎𝑙𝑠𝑒 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 𝐓= 0.7 0.3 0.3 0.7 𝐦 1:1 = 𝐟 1:𝑡 = 0.818,0.182 𝐎 1 = 𝐎 2 = 0.9 0 0 0.2 𝐎 3 = 0.1 0 0 0.8 𝐦 1:𝑡+1 = max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝐏( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 | 𝑒 1:𝑡+1 ) 𝐦 1:𝑡+1 𝑗 =𝛼 𝐎 𝑡+1,𝑗𝑗 max 𝑖 ( 𝐓 𝑖𝑗 𝐦 1:𝑡 (𝑖)) Fei Fang 12/8/2018

Quiz 2 𝐦 1:1 = 𝐟 1:1 =𝐏 𝑋 1 𝑒 1 =𝐻 𝐦 1:𝑡+1 𝑗 =𝛼 𝐎 𝑡+1,𝑗𝑗 max 𝑖 ( 𝐓 𝑖𝑗 𝐦 1:𝑡 (𝑖)) Assume an HMM with two hidden states, + and − and two observations state 𝐿 and 𝐻. What’s the most probable state sequence for observation sequence 𝐿,𝐻 given 𝑃 𝑋 0 =+ =1? A: (+,+) B: (+,−) C: (−,+) D: (−,−) 𝑋 𝑡−1 𝑃( 𝑋 𝑡 =+) + 0.7 − 0.6 𝑋 𝑡−1 𝑋 𝑡 𝑋 𝑡 𝑃( 𝐸 𝑡 =𝐿) + 0.1 − 0.8 𝐸 𝑡 Fei Fang 12/8/2018

Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Outline Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Dynamic Bayes’ Net (DBN) Applications of DBN Special classes of DBN Fei Fang 12/8/2018

Kalman Filter A glimpse for probabilistic modeling with continuous variables Estimates the internal state of a linear dynamic system from a series of noisy measurements We will only consider a simple case State variable 𝑋 𝑡 (hidden) Evidence variable 𝑍 𝑡 (observation) First-order Markov process Stationary process Linear Gaussian distribution Example: consumer confidence index Measured by consumer survey Rudolf E. Kálmán (1930-2016) Fei Fang 12/8/2018

Recall 1-D Gaussian distribution Kalman Filter Recall 1-D Gaussian distribution Mean 𝜇, variance 𝜎 2 (standard deviation 𝜎) (pdf) 𝑃 𝑥 = 1 𝜎 2𝜋 𝑒 − 𝑥−𝜇 2 𝜎 2 𝐏( 𝑋 0 ), 𝐏( 𝑋 𝑡 | 𝑋 𝑡−1 ), 𝐏( 𝑍 𝑡 | 𝑋 𝑡 ) are all Gaussian 𝑃 𝑥 0 =𝛼 𝑒 1 2 ( 𝑥 0 − 𝜇 0 2 𝜎 0 2 ) 𝑃 𝑥 𝑡+1 | 𝑥 𝑡 =𝛼 𝑒 1 2 ( 𝑥 𝑡+1 − 𝑥 𝑡 2 𝜎 𝑥 2 ) 𝑃 𝑧 𝑡 𝑥 𝑡 =𝛼 𝑒 1 2 ( 𝑧 𝑡 − 𝑥 𝑡 2 𝜎 𝑧 2 ) Fei Fang 12/8/2018

𝐏( 𝑋 𝑡 | 𝑍 1:𝑡 ) is also Gaussian Kalman Filter 𝐏( 𝑋 𝑡 | 𝑍 1:𝑡 ) is also Gaussian Let 𝜇 𝑡 and 𝜎 𝑡 be the mean and variance of 𝐏( 𝑋 𝑡 | 𝑍 1:𝑡 ), then Interpretation 𝜇 𝑡 is a weighted mean of 𝑧 𝑡+1 and 𝜇 𝑡 . If observation is unreliable ( 𝜎 𝑧 is large), then 𝜇 𝑡+1 is closer to 𝜇 𝑡 , otherwise closer to 𝑧 𝑡+1 𝜎 𝑡+1 2 is independent of the observation 𝑧 𝑡+1 𝜇 𝑡+1 = 𝜎 𝑡 2 + 𝜎 𝑥 2 𝑧 𝑡+1 + 𝜎 𝑧 2 𝜇 𝑡 𝜎 𝑡 2 + 𝜎 𝑥 2 + 𝜎 𝑧 2 𝜎 𝑡+1 2 = 𝜎 𝑡 2 + 𝜎 𝑥 2 𝜎 𝑧 2 𝜎 𝑡 2 + 𝜎 𝑥 2 + 𝜎 𝑧 2 (see detailed derivation in textbook) Fei Fang 12/8/2018

Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Outline Temporal Probability Model Hidden Markov Model (HMM) Kalman Filter Dynamic Bayes’ Net (DBN) Applications of DBN Special classes of DBN Fei Fang 12/8/2018

Dynamic Bayesian Networks A Bayes’ Net that represents a temporal probability model Any temporal probability model can be represented as a DBN DBN to represent knowledge of the domain and describe the structure of the problem First-order Markov Chain Second-order Markov Chain Fei Fang 12/8/2018

Dynamic Bayesian Networks For simplicity, here we consider the case where variables and their links are replicated from slice to slice and DBN represents a first-order Markov process that is stationary Such a DBN is specified by 𝐏( 𝐗 0 ), 𝐏 𝐗 𝑡 𝐗 𝑡−1 , and 𝐏( 𝐗 𝑡 | 𝐄 𝑡 ) HMM and Kalman Filters are special cases of DBN Any discrete-variable DBN can be cast as a HMM By introducing metavariables However, use DBN ensures the sparsity of the model Fei Fang 12/8/2018

Approximate inference Inference in DBN Exact inference “Unroll” the network, apply exact inference techniques directly Approximate inference Variate of likelihood weighting (not very efficient) Particle filtering (commonly used) Fei Fang 12/8/2018

Particle Filtering (Not Required) One step of particle filtering: given 𝑁 samples of 𝐗 𝑡 , denoted as 𝑆 and evidence 𝐞 𝑡+1 , get 𝑁 samples of 𝐗 𝑡+1 Get a set of 𝑁 samples, denoted as 𝑆′, for 𝐗 𝑡+1 and associate each sample with a weight For each sample of 𝐗 𝑡 in 𝑆, sample the value of 𝐗 𝑡+1 based on 𝑃( 𝐗 𝑡+1 | 𝐗 𝑡 ) and compute the weight as 𝑃( 𝐞 𝑡+1 | 𝐗 𝑡+1 ) Resample based on the weight to get a new set of 𝑁 samples for 𝐗 𝑡+1 , denoted as 𝑆′′ Each new sample is selected from 𝑆′. The probability of sampling 𝑠∈ 𝑆′ is proportional to its weight Sampled with replacement, i.e., one item can be sampled multiple times Kalman filter: exact update of the belief state for linear dynamical systems Particle filter: approximate update for general systems Fei Fang 12/8/2018

Particle Filtering (Not Required) Approximate inference using particle filtering for multiple time steps: Apply one-step particle filtering in every time step, recursively update the set of samples Initialize 𝑆 based on 𝐏( 𝐗 0 ) Kalman filter: exact update of the belief state for linear dynamical systems Particle filter: approximate update for general systems Fei Fang 12/8/2018

Example: Umbrella (Not Required) 𝑅 0 𝑅 1 𝑅 1 𝑅 1 Propagate Weight Resample Fei Fang 12/8/2018

Particle Filtering (Not Required) We can prove that if the 𝑁 samples given initially approximates 𝑃( 𝐱 𝑡 | 𝐞 1:𝑡 ), i.e., 𝑁 𝐱 𝑡 | 𝐞 1:𝑡 𝑁 ≈𝑃( 𝐱 𝑡 | 𝐞 1:𝑡 ), then the new samples approximates 𝑃( 𝐱 𝑡+1 | 𝐞 1:𝑡+1 ), i.e., 𝑁 𝐱 𝑡+1 | 𝐞 1:𝑡+1 𝑁 ≈𝑃( 𝑥 𝑡+1 | 𝑒 1:𝑡+1 ) (see details in the textbook) By induction, particle filtering is consistent: it provides the correct probabilities as 𝑁→∞ In practice, particle filtering works very well Kalman filter: exact update of the belief state for linear dynamical systems Particle filter: approximate update for general systems Fei Fang 12/8/2018

Quiz 3 (Not Required) Using particle sampling for the Umbrella example, if in one step, we get 100 samples with + and total weight 1, and 400 samples with − and total weight 2 after propagating and weighting (before resampling), which of the following best estimates the number of samples with + after resampling? A: 100 B: 400 C: 167 D: 333 Each new sample is selected from 𝑆′. The probability of sampling 𝑠∈𝑆′ is proportional to its weight Sampled with replacement, i.e., one item can be sampled multiple times Fei Fang 12/8/2018

Applications of DBN: Place and Object Recognition Which are hidden variables? Torralba, et al. ICCV, 2003. Context-based vision system for place and object recognition Fei Fang 12/8/2018

Applications of DBN: Place and Object Recognition low-level features Use scene context to disambiguate objective recognition Inferring object types based on scene and object features Context priming to decide which object detectors to run Torralba, et al. ICCV, 2003. Context-based vision system for place and object recognition Fei Fang 12/8/2018

Applications of DBN: Infer and Predict Poaching Activity Not surprisingly, naively applying existing ML algorithms does not work well. The reason is two-fold. The first reason is that the dataset is quite sparse. Unlike image classification or movie recommendation, the amount of data we have is quite limited. Second, for the data we have, it is hard to determine their labels. If patrollers found poaching activities in an area, then we label the area as attacked without doubt. However, if the patrollers did not find any poaching activities, we are not sure if we should label it as not attacked, because the patroller may have missed the sign of poaching and the poached animals cannot report the instance by themselves. Attacked Not Attacked 12/8/2018

Applications of DBN: Infer and Predict Poaching Activity Domain knowledge Poaching activity is impacted by ranger patrol effort, as well as features such as animal density Detection probability is also impacted by ranger patrol effort and a subset of these features Ranger patrol Probability of attack on target j Area habitat Animal density Area slope Detection probability Distance to rivers / roads … Nguyen et al. Capture: A new predictive anti-poaching tool for wildlife protection. In AAMAS, 2016 12/8/2018

Applications of DBN: Infer and Predict Poaching Activity 𝑎 𝑡,𝑖 : Whether there is poaching 𝑐 𝑡,𝑖 : Ranger patrol effort 𝑜 𝑡,𝑖 : Whether poaching sign is found 𝑥 𝑡,𝑖 : features, e.g., distance from road, animal density etc Nguyen et al. Capture: A new predictive anti-poaching tool for wildlife protection. In AAMAS, 2016 Fei Fang 12/8/2018

Applications of DBN: Predict Urban Crime Opportunistic criminals: Wander around and seek opportunities to commit crimes 𝐷 𝑖 𝑡 : #defenders (known) 𝑋 𝑖 𝑡 : #criminals (hidden) 𝑌 𝑖 𝑡 : #crimes (known) 𝑡 𝑡+1 Fei Fang 12/8/2018

Summary Applications of DBN Place and Object Recognition Temporal Models Dynamic Bayes’ Net (DBN) Particle Filtering Hidden Markov Models (HMM) Kalman Filter Viterbi Algorithm Applications of DBN Place and Object Recognition Infer and Predict Poaching Activity Predict Urban Crime Fei Fang 12/8/2018

Some slides are borrowed from previous slides made by Tai Sing Lee Acknowledgment Some slides are borrowed from previous slides made by Tai Sing Lee Fei Fang 12/8/2018

Material in the backup slides in this lecture are not required Fei Fang 12/8/2018

max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝑃( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 = 𝑥 𝑡+1 | 𝑒 1:𝑡+1 ) =? Viterbi Algorithm max 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 𝑃( 𝑥 1 ,…, 𝑥 𝑡 , 𝑋 𝑡+1 = 𝑥 𝑡+1 | 𝑒 1:𝑡+1 ) =? Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 If 𝑓 𝑎 ≥0,∀𝑎 and 𝑔 𝑎,𝑏 ≥0, ∀𝑎,𝑏, then max 𝑎,𝑏 𝑓 𝑎 𝑔 𝑎,𝑏 = max 𝑎 (𝑓 𝑎 max 𝑏 𝑔 𝑎,𝑏 ) Fei Fang 12/8/2018

Inference in HMM: Prediction Prediction: Posterior distribution over future state given all evidence to date, 𝐏( 𝑋 𝑡+1 | 𝑒 1:𝑡 ) 𝐏(𝑅𝑎𝑖 𝑛 𝑡+1 |𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 1 ,…,𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 ) 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018

Inference in HMM: Prediction We know So 𝐏 𝑋 𝑡+1 𝑒 1:𝑡 = Can further compute 𝐏 𝑋 𝑡+𝑘+1 𝑒 1:𝑡 through recursive computation 𝐟 1:𝑡 =𝐏( 𝑋 𝑡 | 𝑒 1:𝑡 ) 𝐟 1:𝑡+1 =𝛼 𝐎 𝑡+1 𝐓 T 𝐟 1:𝑡 Fei Fang 12/8/2018

Inference in HMM: Smoothing Smoothing: Posterior distribution of past state given all evidence up to present, 𝐏( 𝑋 𝑘 | 𝑒 1:𝑡 ) 𝐏(𝑅𝑎𝑖 𝑛 𝑘 |𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 1 ,…,𝑢𝑚𝑏𝑟𝑒𝑙𝑙 𝑎 𝑡 ) 𝑅 0 𝑃( 𝑅 0 ) 𝑡 0.5 Fei Fang 12/8/2018

Inference in HMM: Smoothing Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 𝐏 𝑋 𝑘 𝑒 1:𝑡 =? Denoted 𝐏 𝑒 𝑘+1:𝑡 𝑋 𝑘 by 𝐛 𝑘+1:𝑡 , then (Backward message) This is not matrix multiplication. If view them as vectors, then the equation is valid if × represents pointwise multiplication Fei Fang 12/8/2018

Inference in HMM: Smoothing Bayes’ Rule: 𝑃 𝑏 𝑎 = 𝑃 𝑎|𝑏 𝑃(𝑏) 𝑃(𝑎) Product Rule: 𝑃 𝑎∧𝑏 =𝑃 𝑎 𝑏 𝑃 𝑏 Sum Rule: 𝑃 𝑎 = 𝑘 𝑃 𝑎∧ 𝑏 𝑘 𝐛 𝑘+1:𝑡 =𝐏 𝑒 𝑘+1:𝑡 𝑋 𝑘 =? Fei Fang 12/8/2018

Inference in HMM: Smoothing So given 𝐛 𝑘+2:𝑡 =𝐏 𝑒 𝑘+2:𝑡 𝑋 𝑘+1 , we can compute 𝐛 𝑘+1:𝑡 = 𝐏 𝑒 𝑘+1:𝑡 𝑋 𝑘 according to Since 𝑋 𝑡 is discrete valued, 𝐛 𝑘+1:𝑡 can be viewed as a vector. The 𝑖 𝑡ℎ element of 𝐛 𝑘+1:𝑡 is So using the matrix representation for HMM, we have This is not matrix multiplication 𝐏 𝑒 𝑘+1:𝑡 𝑋 𝑘 = 𝑥 𝑘+1 𝑃 𝑒 𝑘+1 𝑥 𝑘+1 𝑃 𝑒 𝑘+2:𝑡 𝑥 𝑘+1 𝐏( 𝑥 𝑘+1 | 𝑋 𝑘 ) This is matrix multiplication Fei Fang 12/8/2018

Inference in HMM: Smoothing Smoothing: Posterior distribution of past state given all evidence up to present, 𝐏( 𝑋 𝑘 | 𝑒 1:𝑡 ) Set 𝐟 1:0 ←𝐏( 𝑋 0 ) and 𝐛 𝑡+1:𝑡 ←𝟏 Recursively compute 𝐟 1:𝑡+1 ←𝛼 𝐎 𝑡+1 𝐓 T 𝐟 1:𝑡 𝐛 𝑘+1:𝑡 ←𝐓 𝐎 𝑘+1 𝐛 𝑘+2:𝑡 Return 𝐏 𝑋 𝑘 𝑒 1:𝑡 =𝛼 𝐛 𝑘+1:𝑡 × 𝐟 1:𝑘 Smoothing for all 𝑘∈{1..𝑡}: Posterior distribution of all past states given all evidence up to present Forward-Backward Algorithm: Store all the 𝐟 and 𝐛, return 𝐏 𝑋 𝑘 𝑒 1:𝑡 for all 𝑘 (Forward operation) 𝐎 𝑡+1 is determined by 𝑒 𝑡+1 (Backward operation) Fei Fang 12/8/2018

General Inference in Temporal Models 𝐟 1:𝑡+1 =𝛼FORWARD( 𝐟 1:𝑡 , 𝐞 𝑡+1 ) 𝐛 𝑘+1:𝑡 =BACKWARD( 𝐛 𝑘+2:𝑡 , 𝐞 𝑡+1 ) Filtering: 𝐏 𝐗 𝑡 𝐞 1:𝑡 = 𝐟 1:𝑡 Prediction: 𝐏 𝐗 𝑡+𝑘+1 𝐞 1:𝑡 = 𝐱 𝑡+𝑘 𝐏 𝐗 𝑡+𝑘+1 𝐱 𝑡+𝑘 𝑃 𝐱 𝑡+𝑘 𝐞 1:𝑡 Smoothing: 𝐏 𝐗 𝑘 𝐞 1:𝑡 =𝛼 𝐛 𝑘+1:𝑡 × 𝐟 1:𝑘 Forward-backward algorithm for smoothing the whole sequence Find most likely explanation: Viterbi algorithm Fei Fang 12/8/2018