Download presentation
Presentation is loading. Please wait.
1
1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming
2
2 Contents Markov Models Hidden Markov Models HMMs as Generative Processes Markov Assumptions for HMMs The 3 Problems of HMMs HMMs for Speech Recognition Kalman filters
3
3 Markov Models
4
4 Markov Process Stochastic process of a temporal sequence Probability distribution of the variable q at time t depends on the variable q at times t-1 to 1 First order Markov process State transition from state depends on previous state: P[qt=j|qt-1=i, qt-2=k,…] = P[qt=j|qt-1=i] State transition is independent of time: aij = P[qt=j|qt-1=i]
5
5 Markov Models Markov Model Model of a Markov process with discrete state Given the observed sequence, state sequence uniquely defined. Probability of the state sequence 's1 s3 s1 s2 s2 s3' given the observation sequence 'A C A B B C' is 1
6
6 Markov Models (Graphical View) A Markov model: A Markov model unfolded in time:
7
7 Example of Markov Model Markov chain with 3 states 3 states : sunny, cloudy, rain 0.2 sunny cloudyrain 0.3 0.4 0.3 0.6 0.1 0.2 0.8 0.1 0.8 0.1 0.20.60.2 0.3 0.4 sunnycloudyrain sunny cloudy rain Weather Of Today Weather of Tomorrow
8
8 Example of Markov Model (cont’) Probability of a sequence S Compute product of successive probabilities Ex. How is weather for next 2 days (today : sunny)? Possible answer : sunny-sunny with 64% sunnycloudy rain 0.1 0.2 P(sunny,cloudy,rain) = P(sunny)P(cloudy|sunny)P(rain|cloudy) = 1.0 x 0.1 x 0.2 = 0.02 sunny 0.8 P(sunny,sunny,sunny) = P(sunny)P(sunny|sunny)P(sunny|cloudy) = 1.0 x 0.8 x 0.8 = 0.64
9
9 Hidden Markov Models
10
10 Hidden Markov Model Hidden Markov Model State is not observed (hidden) Observable symptom (output) Transition probabilities between states Depend only on previous state: Emission probabilties Depend only on the current state: (where x t is observed)
11
11 Markov Assumptions Emissions Probability to emit x t at time t in state q t = i does not depend on anything else: Transitions Probability to go from state j to state i at time t does not depend on anything else Probability does not depend on time t:
12
12 Hidden Markov Models (Graphical View) Hidden Markov Model Hidden Markov model unfolded in time
13
13 HMM as Generative Processes HMM can be use to generate sequences Define a set of starting states with initial probabilities P(q 0 = i) Define a set of final states For each sequence to generate: Select an initial state j according to P(q 0 ) Select the next state i according to P(q t = i|q t-1 =j) Emit an output according to the emission distribution P(x t |q t = i) If i is a final state, then stop, otherwise loop to step 2
14
14 Coin Toss Model 2-Coins model Description State S={S 1, S 2 } : two different biased coins Each state characterized by probability distribution of heads and tails States transitions characterized by state transition matrix Observation symbol V={H, T} (H: Head, T: Tail) given hidden
15
15 Urn and Ball Model Each urn contain colored balls (4 distinct colors) Basic step Choose urn according to some probabilistic procedure Get a ball from the urn Record (observe) its color Replace the ball Repeat the above procedure. Colors of selected balls are observed but sequence of choosing urns is hidden
16
16 The 3 Problems of HMMs
17
17 The 3 Problems of HMMs HMM model gives rise to 3 different problems: The Evaluation Problem Given HMM parameterized by, compute likelihood of a sequence The Decoding Problem Given HMM parameterized by, compute optimal path Q through the state space given a sequence X: The Learning Problem Given an HMM parameterized by and a set of sequences Xn, select parameters such that:
18
18 The Evaluation Problem Finding Probability of Observation Sphinx quiz Sphinx in castle. The sphinx proposes a quiz. The sphinx unseen to you shows a card from 4 kinds (spade, heart, diamond, clover) every day. It depends on her feeling at the day which card is chosen. The feeling change pattern and preference for each feeling are known. After 3 cards are shown, you must answer probability of the observation sequence
19
19 The Evaluation Problem Straightforward way Straightforward way Enumerating every possible state sequence of length T(the number of observation) P( ) = P( ) + P( ) + … + P( ) Time complexity : 2 * T * N T Time complexity is too high Consider Use probability of partial observation
20
20 The Evaluation Problem Forward Variable Approach Forward variable Save probability of partial observation sequence in state matrix. Forward variable in Sj Use “Forward Variable” in previous states Calculate each transition probability with forward variable and emittion probability. Sum all calculations.
21
21 The Evaluation Problem Forward Variable Approach Forward variable Probability of having generated sequence and being in state i at time t
22
22 The Evaluation Problem Forward Variable Approach Reminder: Initial condition: ->prior probabilities of each state i Compute for each state i and each time t of a given sequence Compute likelihood as follows: Sum α T (i) ’s to get P(O|λ)
23
23 The Evaluation Problem Forward Variable Approach Let’s Do it. Assume prior probability P( )=P( )=.5 (,1) = P( ) * P( | ) =.5 *.2 (,1) = P( ) * P( | ) =.5 *.1 (,2) = (,1)* P( | )*P( | ) + (,1) * P( | ) * P( | ) ……
24
24 The Decoding Problem Finding Best State Sequence Sphinx quiz The sphinx changes a quiz. Same condition as before After 3 cards are shown, you must find the sequence of her feelings (maximum likely state sequence) Answer is : ? ? …
25
25 The Decoding Problem Choosing Individually most likely states Find individually most likely state Find most likely first state, Find most likely second state, and so on In Quiz We get Problem No guarantee that path is valid one when HMM has state transition with zero probability individually chosen … … … … individually chosen zero prob. transition
26
26 The Decoding Problem Viterbi algorithm Find single best state sequence path Maximize P(Q|X, ), i.e. maximize P(Q,X| ) Based on dynamic programming methods Dynamic programming Similar to shortest path algorithm Use “Viterbi Variable” in previous states Have maximum probability of partial sequence Have sequence of its states Calculate each transition probability with Viterbi variable and emittion probability. Choose state in previous states, which has maximum result
27
27 The Decoding Problem Viterbi algorithm Viterbi algorithm finds the best state sequence Viterbi variable:
28
28 The Decoding Problem Viterbi algorithm step 1 : Initialization 1 ( i ) = i b i (O 1 ) for 1≤i≤N, ( is initial prob, b is output prob.) 1 ( i ) = 0 (sequnce of best path) step 2 : Induction t ( j ) = Max[ t-1 ( i ) a ij ] b j (O t ), 1≤j≤N t ( j ) = argmax[ t-1 ( i ) a ij ], 1≤j≤N (store backtrace) step 3 : Termination P * = Max[ T (s)] q T * = argmax[ T (s)] step 4 : Path (state sequence) backtracking (t=T-1..1) q t * = t+1 (q t+1 * ) 1 2 3 states
29
29 The Decoding Problem Viterbi algorithm Let’s Do it Step 1: Initialization 1 ( ) = P( ) * P( | ) =.5 *.2 =.1 1 ( ) = P( ) * P( | ) =.5 *.1 =.05 Step 2: Induction 1 ( ) * P( | ) * P ( | ) =.1 *.8 *.6 = 0.048 1 ( ) * P( | ) * P ( | ) =.05 *.6 *.6 = 0.018 2 ( ) = 0.048 ……
30
30 The Learning Problem Parameter Estimation Problems Sphinx quiz Sphinx changes a quiz again!! No information about “condition of feeling changes” and “choosing card” With many card sequences you have to find best model which give best conditions of “feeling changes” and “choosing card”
31
31 The Learning Problem Baum-Welch Method Find : model parameter Locally maximize it by iterative hill-climbing algorithm Work out probability of observations using some model. Find which state transitions, symbol emissions used most. By increasing probability of those, choose revised model which gives higher probability to observations Training !
32
32 The Learning Problem Baum-Welch Method Baum-Welch Method Algorithms Step 1 : Begin with some model (perhaps pre-selected or just chosen randomly) Step 2 : Run O through current model to estimate expectations of each model parameter Step 3 : Change model to maximize values of paths used a lot Step 4 : Repeat this process, until converging on optimal values for the model parameter
33
33 The Learning Problem Baum-Welch Method Let’s Do it Step 1: Choose initial model Step 2: Run O through current model to estimate expectations of each model parameter Step 3 : Change model to maximize values of paths used a lot Step 4 : Repeat this process, until converging on optimal values for the model parameter
34
34 HMMs for Applications
35
35 Sequential Data Often highly variable, but has embedded structure Information contained in the structure
36
36 More examples Text, on-line handwriting, music notes, DNA sequence, program codes
37
37 HMMs for Speech Recognition Find a sequence of phonemes (or words) given an acoustic sequence ex. “How to wreak a nice beach.” ex. “How to recognize speech.” Idea: use a phoneme model
38
38 Phoneme model Phoneme Smallest unit of sound Distinct meaning Consonant Vowel Phoneme model Observed speech signals Find sequence of states Maximize P(signals|states)
39
39 Embbeded Training of HMMs For each acoustic sequence in training set, create new HMM as concatenation of the HMMs representing underlying sequence of phonemes. Maximize likelihood of training sentences.
40
40 HMMs: Decoding a Sentence Decide what is accepted vocabulary Optionally add a language model: P(word sequence) Efficient algorithm to find optimal path in decoding HMM:
41
41 A demo of HMM application http://www.mmk.e-technik.tu-muenchen.de/rotdemo.h tml http://www.mmk.e-technik.tu-muenchen.de/rotdemo.h tml This demo shows the image retrieval system, which e nables the user to search a grayscale image databas e intuitively by presenting simple sketches. You can find the detailed description of this demo at: http://www.mmk.e-technik.tu-muenchen.de/demo/imagedb/th eory.html http://www.mmk.e-technik.tu-muenchen.de/demo/imagedb/th eory.html
42
42 Kalman Filter
43
43 Kalman Filter? What is the Kalman Filter? A technique that can be used to recursively estimate unobservable quantities called state variables, { x t }, from an observed time series { y t }. What is it used for? Tracking missiles Extracting lip motion from video Lots of computer vision applications Economics Navigation
44
44 Problem? Estimating the location of a ship “Suppose that you are lost at sea during the night and have no idea at all of your location.” Problem? Inherent measuring device inaccuracies. Your measurement has somewhat uncertainty!
45
45 Uncertainty Conditional density of position based on measured value z 1 Assume Gaussian distribution z 1 : Measured position x : Real position Q: What can be a measure of uncertainty?
46
46 Measurements You make a measurement Also, your friend make a measurement Question 1. Which one is the better? Question 2. What’s the best way to combine these measurements
47
47 Combine measurements Uncertainty is decreased by combining the two pieces of Information !!
48
48 Optimal estimate at t 2,,is equal to the best prediction of its value before z 2 is taken,, plus a correction term of an optimal weighting value times the difference between z 2 and the best prediction of its value before it is actually taken,. What does it mean?
49
49 Moving? Suppose you’re moving u is a nominal velocity w is a noisy term The “noise” w will be modeled as a white Gaussian noise with a mean of zero and variance of. Best prediction Best estimate
50
50 Summary Process Model Describes how the state changes over time Measurement Model Where you are from what you see !!! Predictor-corrector Predicting the new state and its uncertainty Correcting with the new measurement
51
51 Appendix – derivation
52
52 References You can find useful materials about HMM from CS570 AI Lecture Note(2003) http://www.idiap.ch/~bengio/ http://www.idiap.ch/~bengio/ http://speech.chungbuk.ac.kr/~owkwon/ http://speech.chungbuk.ac.kr/~owkwon/ You can find useful materials about Kalman Filter from http://www.cs.unc.edu/~welch/kalman http://www.cs.unc.edu/~welch/kalman Maybeck, 1979, “Stochastic models, estimation, and control” Greg Welch, and Gray Bishop, 2001, “An introduction to the Kalman Filter”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.