Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Similar presentations


Presentation on theme: "Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations."— Presentation transcript:

1 Hidden Markov Models

2 A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations {Y t |t  T } = {Y 1, Y 2,..., Y T }

3 Some basic problems: from the observations {Y 1, Y 2,..., Y T } 1.Determine the sequence of states {X 1, X 2,..., X T }. (Assuming the model) - Viterbi path - State probabilities given observations {Y 1, Y 2,..., Y T 2.Determine (or estimate) the parameters of the stochastic process that is generating the states and the observations.;

4 Computing Likelihood Let  ij = P[X t+1 = j|X t = i] and  = (  ij ) = the M  M transition matrix. Let = P[X 1 = i] and = the initial distribution over the states.

5 P[X 1 = i 1,X 2 = i 2..,X T = i T, Y 1 = y 1, Y 2 = y 2,..., Y T = y T ] = P[X = i, Y = y] = Computing Likelihood

6 Therefore P[Y 1 = y 1, Y 2 = y 2,..., Y T = y T ] = P[Y = y]

7 In the case when Y 1, Y 2,..., Y T are continuous random variables or continuous random vectors, Let f(y| ) denote the conditional distribution of Y t given X t = i. Then the joint density of Y 1, Y 2,..., Y T is given by = f(y 1, y 2,..., y T ) = f(y) where = f(y t | )

8 Efficient Methods for computing Likelihood The Forward Method

9 The Backward Procedure

10 Prediction of states from the observations and the model:

11 The Viterbi Algorithm (Viterbi Paths) The Viterbi Path is the sequence of States X 1 = i 1, X 2 = i 2,..., X T = i T That maximizes P[X 1 = i 1,..., X T = i T, Y 1 = y 1,..., Y T = y T ] for a given set of observations Y 1 = y 1, Y 2 = y 2,..., Y T = y T

12 Summary of calculations of Viterbi Path 1. i 1 = 1, 2, …, M 2. i t+1 = 1, 2, …, M; t = 1,…, T-2 3. HMM generator (normal).xls

13 Estimation of Parameters of a Hidden Markov Model If both the sequence of observations Y 1, Y 2,..., Y T and the sequence of States X 1, X 2,..., X T is observed Y 1 = y 1, Y 2 = y 2,..., Y T = y T, X 1 = i 1, X 2 = i 2,..., X T = i T, then the Likelihood is given by:

14 the log-Likelihood is given by:

15 In this case the Maximum Likelihood estimates are: = the MLE of  i computed from the observations yt where X t = i.

16 MLE (states unknown) If only the sequence of observations Y 1 = y 1, Y 2 = y 2,..., Y T = y T are observed then the Likelihood is given by:

17 It is difficult to find the Maximum Likelihood Estimates directly from the Likelihood function. The Techniques that are used are 1. The Segmental K-means Algorithm 2. The Baum-Welch (E-M) Algorithm

18 The Segmental K-means Algorithm In this method the parameters are adjusted to maximize where is the Viterbi path

19 Consider this with the special case Case: The observations {Y 1, Y 2,..., Y T } are continuous Multivariate Normal with mean vector and covariance matrix when, i.e.

20 1.Pick arbitrarily M centroids a 1, a 2, … a M. Assign each of the T observations y t (kT if multiple realizations are observed) to a state i t by determining : 2.Then

21 3. And 4.Calculate the Viterbi path (i 1, i 2, …, i T ) based on the parameters of step 2 and 3. 5.If there is a change in the sequence (i 1, i 2, …, i T ) repeat steps 2 to 4.

22 The Baum-Welch (E-M) Algorithm The E-M algorithm was designed originally to handle “Missing observations”. In this case the missing observations are the states {X 1, X 2,..., X T }. Assuming a model, the states are estimated by finding their expected values under this model. (The E part of the E-M algorithm).

23 With these values the model is estimated by Maximum Likelihood Estimation (The M part of the E-M algorithm). The process is repeated until the estimated model converges.

24 The E-M Algorithm Let denote the joint distribution of Y,X. Consider the function: Starting with an initial estimate of. A sequence of estimates are formed by finding to maximize with respect to.

25 The sequence of estimates converge to a local maximum of the likelihood.

26 In the case of an HMM the log-Likelihood is given by:

27 Recall and Expected no. of transitions from state i.

28 Let Expected no. of transitions from state i to state j.

29 The E-M Re-estimation Formulae Case 1: The observations {Y 1, Y 2,..., Y T } are discrete with K possible values and

30

31 Case 2: The observations {Y 1, Y 2,..., Y T } are continuous Multivariate Normal with mean vector and covariance matrix when, i.e.

32

33 Measuring distance between two HMM’s Let and denote the parameters of two different HMM models. We now consider defining a distance between these two models.

34 The Kullback-Leibler distance Consider the two discrete distributions and ( and in the continuous case) then define

35 and in the continuous case:

36 These measures of distance between the two distributions are not symmetric but can be made symmetric by the following:

37 In the case of a Hidden Markov model. where The computation of in this case is formidable

38 Juang and Rabiner distance Let denote a sequence of observations generated from the HMM with parameters: Let denote the optimal (Viterbi) sequence of states assuming HMM model.

39 Then define: and


Download ppt "Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations."

Similar presentations


Ads by Google