Download presentation
Presentation is loading. Please wait.
1
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7
2
S. Maarschalkerweerd & A. Tjhang2 Overview last lecture Hidden Markov Models Different algorithms: – Viterbi – Forward – Backward
3
S. Maarschalkerweerd & A. Tjhang3 Overview today Parameter estimation for HMMs – Baum-Welch algorithm HMM model structure More complex Markov chains Numerical stability of HMM algorithms
4
S. Maarschalkerweerd & A. Tjhang4 Specifying a HMM model Most difficult problem using HMMs is specifying the model – Design of the structure – Assignment of parameter values
5
S. Maarschalkerweerd & A. Tjhang5 Specifying a HMM model Most difficult problem using HMMs is specifying the model – Design of the structure – Assignment of parameter values
6
S. Maarschalkerweerd & A. Tjhang6 Parameter estimation for HMMs Estimate transition and emission probabilities a kl and e k (b) Two ways of learning: – Estimation when state sequence is known – Estimation when paths are unknown Assume that we have a set of example sequences (training sequences x 1, …x n )
7
S. Maarschalkerweerd & A. Tjhang7 Parameter estimation for HMMs Assume that x 1 …x n independent. So P(x 1,…,x n | ) = P(x j | ) Since log ab = log a + logb n j=1
8
S. Maarschalkerweerd & A. Tjhang8 Estimation when state sequence is known Easier than estimation when paths unknown A kl = number of transitions k to l in trainingdata + r kl E k (b) = number of emissions of b from k in training data + r k (b)
9
S. Maarschalkerweerd & A. Tjhang9 Estimation when paths are unknown More complex than when paths are known We can’t use maximum likelihood estimators Instead, an iterative algorithm is used – Baum-Welch
10
S. Maarschalkerweerd & A. Tjhang10 The Baum-Welch algorithm We don’t know real values of A kl and E k (b) 1. Estimate A kl and E k (b) 2. Update a kl and e k (b) 3. Repeat with new model parameters a kl and e k (b)
11
S. Maarschalkerweerd & A. Tjhang11 Baum-Welch algorithm Forward valueBackward value
12
S. Maarschalkerweerd & A. Tjhang12 Baum-Welch algorithm Now that we have estimated A kl and E k (b), use maximum likelihood estimators to compute a kl and e k (b) We use these values to estimate A kl and E k (b) in the next iteration Continue doing this iteration until change is very small or max number of iterations is exceeded
13
S. Maarschalkerweerd & A. Tjhang13 Baum-Welch algorithm
14
S. Maarschalkerweerd & A. Tjhang14 Example
15
S. Maarschalkerweerd & A. Tjhang15
16
S. Maarschalkerweerd & A. Tjhang16 Drawbacks ML estimators – Vulnerable to overfitting if not enough data – Estimations can be undefined if never used in training set (use pseudocounts) Baum-Welch – Local maximum instead of global maximum can be found, depending on starting values of parameters – This problem will be worse for large HMMs
17
S. Maarschalkerweerd & A. Tjhang17 Modelling of labelled sequences Only -- and ++ are calculated Better than using ML estimators, when many different classes are present
18
S. Maarschalkerweerd & A. Tjhang18 Specifying a HMM model Most difficult problem using HMMs is specifying the model – Design of the structure – Assignment of parameter values
19
S. Maarschalkerweerd & A. Tjhang19 Design of the structure Design: how to connect states by transitions A good HMM is based on the knowledge about the problem under investigation Local maxima are biggest disadvantage in models that are fully connected After deleting a transition from model Baum- Welch will still work: set transition probability to zero
20
S. Maarschalkerweerd & A. Tjhang20 Example 1 Geometric distribution p 1-p
21
S. Maarschalkerweerd & A. Tjhang21 Example 2 Model distribution of length between 2 and 10
22
S. Maarschalkerweerd & A. Tjhang22 Example 3
23
S. Maarschalkerweerd & A. Tjhang23 Silent states States that do not emit symbols Also in other places in HMM B
24
S. Maarschalkerweerd & A. Tjhang24 Example Silent states
25
S. Maarschalkerweerd & A. Tjhang25 Silent states Advantage: – Less estimations of transition probabilities needed Drawback: – Limits the possibilities of defining a model
26
S. Maarschalkerweerd & A. Tjhang26 More complex Markov chains So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol More complex – High order Markov chains – Inhomogeneous Markov chains
27
S. Maarschalkerweerd & A. Tjhang27 High order Markov chains An nth order Markov process Probability of a symbol in a sequence depends on the probability of the previous n symbols An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet A n of n- tuples, because P(AB|B) = P(A|B)
28
S. Maarschalkerweerd & A. Tjhang28 Example A second order Markov chain with two different symbols {A,B} This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB} Sometimes the framework of high order model is convenient
29
S. Maarschalkerweerd & A. Tjhang29 Gene candidates in DNA: -sequence of triplets of nucleotides: startcodon nr. of non-stopcodons stopcodon -open reading frame (ORF) An ORF can be either a gene or a non-coding ORF (NORF) Finding prokaryotic genes
30
S. Maarschalkerweerd & A. Tjhang30 Finding prokaryotic genes Experiment: – DNA from bacterium E.coli – Dataset contains 1100 genes (900 used for training, 200 for testing) Two models: – Normal model with first order Markov chains – Also first order Markov chains, but codons instead of nucleotides are used as symbol
31
S. Maarschalkerweerd & A. Tjhang31 Finding prokaryotic genes Outcomes:
32
S. Maarschalkerweerd & A. Tjhang32 Inhomogeneous Markov chains Using the position information in the codon – Three models for position 1, 2 and 3 CAT GCA P(C)a CA a AT a TG a GC a CA P(C)a 2 CA a 3 AT a 1 TG a 2 GC a 3 CA HomogeneousInhomogeneous 1 2 3
33
S. Maarschalkerweerd & A. Tjhang33 Numerical Stability of HMM algorithms Multiplying many probabilities can cause numerical problems: – Underflow errors – Wrong numbers are calculated Solutions: – Log transformation – Scaling of probabilities
34
S. Maarschalkerweerd & A. Tjhang34 The log transformation Compute log probabilities – Log 10 -100000 = -100000 – Underflow problem is essentially solved Sum operation is often faster than product operation In the Viterbi algorithm:
35
S. Maarschalkerweerd & A. Tjhang35 Scaling of probabilities Scale f and b variables Forward variable: – For each i a scaling variable s i is defined – New f variables are defined: – New forward recursion:
36
S. Maarschalkerweerd & A. Tjhang36 Scaling of probabilities Backward variable – Scaling has to be with same numbers as forward variable – New backward recursion: This normally works well, however underflow errors can still occur in models with many silent states (chapter 5)
37
S. Maarschalkerweerd & A. Tjhang37 Summary Hidden Markov Models Parameter estimation – State sequence known – State sequence unknown Model structure – Silent states More complex Markov chains Numerical stability
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.