Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming.

Similar presentations


Presentation on theme: "1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming."— Presentation transcript:

1 1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming

2 2 Contents  Markov Models  Hidden Markov Models  HMMs as Generative Processes  Markov Assumptions for HMMs  The 3 Problems of HMMs  HMMs for Speech Recognition  Kalman filters

3 3 Markov Models

4 4 Markov Process  Stochastic process of a temporal sequence  Probability distribution of the variable q at time t depends on the variable q at times t-1 to 1  First order Markov process  State transition from state depends on previous state:  P[qt=j|qt-1=i, qt-2=k,…] = P[qt=j|qt-1=i]  State transition is independent of time:  aij = P[qt=j|qt-1=i]

5 5 Markov Models  Markov Model  Model of a Markov process with discrete state  Given the observed sequence, state sequence uniquely defined.  Probability of the state sequence 's1 s3 s1 s2 s2 s3' given the observation sequence 'A C A B B C' is 1

6 6 Markov Models (Graphical View)  A Markov model:  A Markov model unfolded in time:

7 7 Example of Markov Model  Markov chain with 3 states  3 states : sunny, cloudy, rain 0.2 sunny cloudyrain 0.3 0.4 0.3 0.6 0.1 0.2 0.8 0.1 0.8 0.1 0.20.60.2 0.3 0.4 sunnycloudyrain sunny cloudy rain Weather Of Today Weather of Tomorrow

8 8 Example of Markov Model (cont’)  Probability of a sequence S  Compute product of successive probabilities  Ex. How is weather for next 2 days (today : sunny)?  Possible answer : sunny-sunny with 64% sunnycloudy rain 0.1 0.2 P(sunny,cloudy,rain) = P(sunny)P(cloudy|sunny)P(rain|cloudy) = 1.0 x 0.1 x 0.2 = 0.02 sunny 0.8 P(sunny,sunny,sunny) = P(sunny)P(sunny|sunny)P(sunny|cloudy) = 1.0 x 0.8 x 0.8 = 0.64

9 9 Hidden Markov Models

10 10 Hidden Markov Model  Hidden Markov Model  State is not observed (hidden)  Observable symptom (output)  Transition probabilities between states  Depend only on previous state:  Emission probabilties  Depend only on the current state: (where x t is observed)

11 11 Markov Assumptions  Emissions  Probability to emit x t at time t in state q t = i does not depend on anything else:  Transitions  Probability to go from state j to state i at time t does not depend on anything else  Probability does not depend on time t:

12 12 Hidden Markov Models (Graphical View)  Hidden Markov Model  Hidden Markov model unfolded in time

13 13 HMM as Generative Processes  HMM can be use to generate sequences  Define a set of starting states with initial probabilities P(q 0 = i)  Define a set of final states  For each sequence to generate:  Select an initial state j according to P(q 0 )  Select the next state i according to P(q t = i|q t-1 =j)  Emit an output according to the emission distribution P(x t |q t = i)  If i is a final state, then stop, otherwise loop to step 2

14 14 Coin Toss Model  2-Coins model  Description  State S={S 1, S 2 } : two different biased coins  Each state characterized by probability distribution of heads and tails  States transitions characterized by state transition matrix  Observation symbol V={H, T} (H: Head, T: Tail) given hidden

15 15 Urn and Ball Model  Each urn contain colored balls (4 distinct colors)  Basic step  Choose urn according to some probabilistic procedure  Get a ball from the urn  Record (observe) its color  Replace the ball  Repeat the above procedure.  Colors of selected balls are observed but sequence of choosing urns is hidden

16 16 The 3 Problems of HMMs

17 17 The 3 Problems of HMMs  HMM model gives rise to 3 different problems:  The Evaluation Problem  Given HMM parameterized by, compute likelihood of a sequence  The Decoding Problem  Given HMM parameterized by, compute optimal path Q through the state space given a sequence X:  The Learning Problem  Given an HMM parameterized by and a set of sequences Xn, select parameters such that:

18 18 The Evaluation Problem Finding Probability of Observation  Sphinx quiz  Sphinx in castle.  The sphinx proposes a quiz.  The sphinx unseen to you shows a card from 4 kinds (spade, heart, diamond, clover) every day.  It depends on her feeling at the day which card is chosen.  The feeling change pattern and preference for each feeling are known.  After 3 cards are shown, you must answer probability of the observation sequence

19 19 The Evaluation Problem Straightforward way  Straightforward way  Enumerating every possible state sequence of length T(the number of observation)  P( ) = P( ) + P( ) + … + P( )  Time complexity : 2 * T * N T  Time complexity is too high  Consider  Use probability of partial observation

20 20 The Evaluation Problem Forward Variable Approach  Forward variable  Save probability of partial observation sequence in state matrix.  Forward variable in Sj  Use “Forward Variable” in previous states  Calculate each transition probability with forward variable and emittion probability.  Sum all calculations.

21 21 The Evaluation Problem Forward Variable Approach  Forward variable  Probability of having generated sequence and being in state i at time t

22 22 The Evaluation Problem Forward Variable Approach  Reminder:  Initial condition:  ->prior probabilities of each state i  Compute for each state i and each time t of a given sequence  Compute likelihood as follows: Sum α T (i) ’s to get P(O|λ)

23 23 The Evaluation Problem Forward Variable Approach  Let’s Do it.  Assume prior probability P( )=P( )=.5  (,1) = P( ) * P( | ) =.5 *.2  (,1) = P( ) * P( | ) =.5 *.1  (,2) = (,1)* P( | )*P( | ) + (,1) * P( | ) * P( | ) ……

24 24 The Decoding Problem Finding Best State Sequence  Sphinx quiz  The sphinx changes a quiz.  Same condition as before  After 3 cards are shown, you must find the sequence of her feelings (maximum likely state sequence)  Answer is : ? ? …

25 25 The Decoding Problem Choosing Individually most likely states  Find individually most likely state  Find most likely first state,  Find most likely second state, and so on  In Quiz  We get  Problem  No guarantee that path is valid one when HMM has state transition with zero probability individually chosen … … … … individually chosen zero prob. transition

26 26 The Decoding Problem Viterbi algorithm  Find single best state sequence path  Maximize P(Q|X, ), i.e. maximize P(Q,X| )  Based on dynamic programming methods  Dynamic programming  Similar to shortest path algorithm  Use “Viterbi Variable” in previous states  Have maximum probability of partial sequence  Have sequence of its states  Calculate each transition probability with Viterbi variable and emittion probability.  Choose state in previous states, which has maximum result

27 27 The Decoding Problem Viterbi algorithm  Viterbi algorithm finds the best state sequence  Viterbi variable:

28 28 The Decoding Problem Viterbi algorithm  step 1 : Initialization   1 ( i ) =  i b i (O 1 ) for 1≤i≤N, (  is initial prob, b is output prob.)   1 ( i ) = 0 (sequnce of best path)  step 2 : Induction   t ( j ) = Max[  t-1 ( i ) a ij ] b j (O t ), 1≤j≤N   t ( j ) = argmax[  t-1 ( i ) a ij ], 1≤j≤N (store backtrace)  step 3 : Termination  P * = Max[  T (s)]  q T * = argmax[  T (s)]  step 4 : Path (state sequence) backtracking (t=T-1..1)  q t * =  t+1 (q t+1 * ) 1 2 3 states

29 29 The Decoding Problem Viterbi algorithm  Let’s Do it  Step 1: Initialization   1 ( ) = P( ) * P( | ) =.5 *.2 =.1   1 ( ) = P( ) * P( | ) =.5 *.1 =.05  Step 2: Induction   1 ( ) * P( | ) * P ( | ) =.1 *.8 *.6 = 0.048   1 ( ) * P( | ) * P ( | ) =.05 *.6 *.6 = 0.018   2 ( ) = 0.048 ……

30 30 The Learning Problem Parameter Estimation Problems  Sphinx quiz  Sphinx changes a quiz again!!  No information about “condition of feeling changes” and “choosing card”  With many card sequences you have to find best model which give best conditions of “feeling changes” and “choosing card”

31 31 The Learning Problem Baum-Welch Method  Find  : model parameter  Locally maximize it by iterative hill-climbing algorithm  Work out probability of observations using some model.  Find which state transitions, symbol emissions used most.  By increasing probability of those, choose revised model which gives higher probability to observations  Training !

32 32 The Learning Problem Baum-Welch Method  Baum-Welch Method Algorithms  Step 1 : Begin with some model (perhaps pre-selected or just chosen randomly)  Step 2 : Run O through current model to estimate expectations of each model parameter  Step 3 : Change model to maximize values of paths used a lot  Step 4 : Repeat this process, until converging on optimal values for the model parameter

33 33 The Learning Problem Baum-Welch Method  Let’s Do it  Step 1: Choose initial model  Step 2: Run O through current model to estimate expectations of each model parameter  Step 3 : Change model to maximize values of paths used a lot  Step 4 : Repeat this process, until converging on optimal values for the model parameter

34 34 HMMs for Applications

35 35 Sequential Data  Often highly variable, but has embedded structure  Information contained in the structure

36 36 More examples  Text, on-line handwriting, music notes, DNA sequence, program codes

37 37 HMMs for Speech Recognition  Find a sequence of phonemes (or words) given an acoustic sequence  ex. “How to wreak a nice beach.”  ex. “How to recognize speech.”  Idea: use a phoneme model

38 38 Phoneme model  Phoneme  Smallest unit of sound  Distinct meaning  Consonant  Vowel  Phoneme model  Observed speech signals  Find sequence of states  Maximize P(signals|states)

39 39 Embbeded Training of HMMs  For each acoustic sequence in training set, create new HMM as concatenation of the HMMs representing underlying sequence of phonemes.  Maximize likelihood of training sentences.

40 40 HMMs: Decoding a Sentence  Decide what is accepted vocabulary  Optionally add a language model: P(word sequence)  Efficient algorithm to find optimal path in decoding HMM:

41 41 A demo of HMM application  http://www.mmk.e-technik.tu-muenchen.de/rotdemo.h tml http://www.mmk.e-technik.tu-muenchen.de/rotdemo.h tml  This demo shows the image retrieval system, which e nables the user to search a grayscale image databas e intuitively by presenting simple sketches.  You can find the detailed description of this demo at:  http://www.mmk.e-technik.tu-muenchen.de/demo/imagedb/th eory.html http://www.mmk.e-technik.tu-muenchen.de/demo/imagedb/th eory.html

42 42 Kalman Filter

43 43 Kalman Filter?  What is the Kalman Filter?  A technique that can be used to recursively estimate unobservable quantities called state variables, { x t }, from an observed time series { y t }.  What is it used for?  Tracking missiles  Extracting lip motion from video  Lots of computer vision applications  Economics  Navigation

44 44 Problem?  Estimating the location of a ship “Suppose that you are lost at sea during the night and have no idea at all of your location.” Problem? Inherent measuring device inaccuracies. Your measurement has somewhat uncertainty!

45 45 Uncertainty  Conditional density of position based on measured value z 1  Assume Gaussian distribution z 1 : Measured position x : Real position Q: What can be a measure of uncertainty?

46 46 Measurements  You make a measurement  Also, your friend make a measurement Question 1. Which one is the better? Question 2. What’s the best way to combine these measurements

47 47 Combine measurements Uncertainty is decreased by combining the two pieces of Information !!

48 48 Optimal estimate at t 2,,is equal to the best prediction of its value before z 2 is taken,, plus a correction term of an optimal weighting value times the difference between z 2 and the best prediction of its value before it is actually taken,. What does it mean?

49 49 Moving?  Suppose you’re moving u is a nominal velocity w is a noisy term The “noise” w will be modeled as a white Gaussian noise with a mean of zero and variance of. Best prediction Best estimate

50 50 Summary  Process Model  Describes how the state changes over time  Measurement Model  Where you are from what you see !!!  Predictor-corrector  Predicting the new state and its uncertainty  Correcting with the new measurement

51 51 Appendix – derivation

52 52 References  You can find useful materials about HMM from  CS570 AI Lecture Note(2003)  http://www.idiap.ch/~bengio/ http://www.idiap.ch/~bengio/  http://speech.chungbuk.ac.kr/~owkwon/ http://speech.chungbuk.ac.kr/~owkwon/  You can find useful materials about Kalman Filter from  http://www.cs.unc.edu/~welch/kalman http://www.cs.unc.edu/~welch/kalman  Maybeck, 1979, “Stochastic models, estimation, and control”  Greg Welch, and Gray Bishop, 2001, “An introduction to the Kalman Filter”


Download ppt "1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming."

Similar presentations


Ads by Google