Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7.

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Hidden Markov Model.
Hidden Markov Models Eine Einführung.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
Lecture 5: Learning models using EM
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Hidden Markov Models for Sequence Analysis 4
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
HMM Model Structure Presentation by Durga Yeluri.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
(H)MMs in gene prediction and similarity searches.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Hidden Markov Models BMI/CS 576
Hidden Markov Models - Training
Hidden Markov Models (HMMs)
CSE 5290: Algorithms for Bioinformatics Fall 2009
Introduction to HMM (cont)
Hidden Markov Models By Manish Shrivastava.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter

Elze de Groot2 Overview Parameter estimation for HMMs –Baum-Welch algorithm HMM model structure More complex Markov chains Numerical stability of HMM algorithms

Elze de Groot3 Specifying a HMM model Most difficult problem using HMMs is specifying the model –Design of the structure –Assignment of parameter values

Elze de Groot4 Specifying a HMM model Most difficult problem using HMMs is specifying the model –Design of the structure –Assignment of parameter values

Elze de Groot5 Parameter estimation for HMMs Estimate transition and emission probabilities a kl and e k (b) Two ways of learning: –Estimation when state sequence is known –Estimation when paths are unknown Assume that we have a set of example sequences (training sequences x 1, …x n )

Elze de Groot6 Parameter estimation for HMMs Assume that x 1 …x n independent. joint probability Log space Since log ab = log a + logb

Elze de Groot7 Estimation when state sequence is known Easier than estimation when paths unknown A kl = number of transitions k to l in trainingdata + r kl E k (b) = number of emissions of b from k in training data + r k (b)

Elze de Groot8 Estimation when paths are unknown More complex than when paths are known We can’t use maximum likelihood estimators Instead, an iterative algorithm is used –Baum-Welch

Elze de Groot9 The Baum-Welch algorithm We don’t know real values of A kl and E k (b) 1.Estimate A kl and E k (b) 2.Update a kl and e k (b) 3.Repeat with new model parameters a kl and e k (b)

Elze de Groot10 Baum-Welch algorithm Forward valueBackward value

Elze de Groot11 Baum-Welch algorithm Now that we have estimated A kl and E k (b), use maximum likelihood estimators to compute a kl and e k (b) We use these values to estimate A kl and E k (b) in the next iteration Continue doing this iteration until change is very small or max number of iterations is exceeded

Elze de Groot12 Baum-Welch algorithm

Elze de Groot13 Example Estimated model with 300 rolls and rolls

Elze de Groot14 Drawbacks ML estimators –Vulnerable to overfitting if not enough data –Estimations can be undefined if never used in training set (so use of pseudocounts) Baum-Welch –Many local maximums instead of global maximum can be found, depending on starting values of parameters –This problem will be worse for large HMMs

Elze de Groot15 Viterbi Training Most probable path derived using viterbi algorithm Continue until none of paths change Finds value of θ that maximises contribution to likelihood Performs less well than baum welch

Elze de Groot16 Modelling of labelled sequences Only -- and ++ are calculated Better than using ML estimators, when many different classes are present

Elze de Groot17 Specifying a HMM model Most difficult problem using HMMs is specifying the model –Design of the structure –Assignment of parameter values

Elze de Groot18 Design of the structure Design: how to connect states by transitions A good HMM is based on the knowledge about the problem under investigation Local maxima are biggest disadvantage in models that are fully connected After deleting a transition from model Baum- Welch will still work: set transition probability to zero

Elze de Groot19 Example 1 Geometric distribution p 1-p

Elze de Groot20 Example 2 Model distribution of length between 2 and 10

Elze de Groot21 Example 3 Negative binomial distribution p=0.99 n≤5

Elze de Groot22 Silent states States that do not emit symbols Also in other places in HMM B 

Elze de Groot23 Example Silent states

Elze de Groot24 Silent states Advantage: –Less estimations of transition probabilities needed Drawback: –Limits the possibilities of defining a model

Elze de Groot25 Silent states Change in forward algorithm For ‘real’ states the same For silent states set Starting from lowest numbered silent state l add for all silent states k<l

Elze de Groot26 More complex Markov chains So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol More complex –High order Markov chains –Inhomogeneous Markov chains

Elze de Groot27 High order Markov chains An nth order Markov process Probability of a symbol in a sequence depends on the probability of the previous n symbols An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet A n of n-tuples, because: P(AB|B) = P(A|B)

Elze de Groot28 Example A second order Markov chain with two different symbols {A,B} This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB} Sometimes the framework of high order model is convenient

Elze de Groot29 Gene candidates in DNA: -sequence of triplets of nucleotides: startcodon nr. of non-stopcodons stopcodon -open reading frame (ORF) An ORF can be either a gene or a non- coding ORF (NORF) Finding prokaryotic genes

Elze de Groot30 Finding prokaryotic genes Experiment: –DNA from bacterium E.coli –Dataset contains 1100 genes (900 used for training, 200 for testing) Two models: –Normal model with first order Markov chains –Also first order Markov chains, but codons instead of nucleotides are used as symbol

Elze de Groot31 Finding prokaryotic genes Outcomes:

Elze de Groot32 Inhomogeneous Markov chains Using the position information in the codon –Three models for position 1, 2 and 3 CAT GCA P(C)a CA a AT a TG a GC a CA P(C)a 2 CA a 3 AT a 1 TG a 2 GC a 3 CA HomogeneousInhomogeneous 1 2 3

Elze de Groot33 Numerical Stability of HMM algorithms Multiplying many probabilities can cause numerical problems: –Underflow errors –Wrong numbers are calculated Solutions: –Log transformation –Scaling of probabilities

Elze de Groot34 The log transformation Compute log probabilities –Log = –Underflow problem is essentially solved Sum operation is often faster than product operation In the Viterbi algorithm:

Elze de Groot35 Scaling of probabilities Scale f and b variables Forward variable: –For each i a scaling variable s i is defined –New f variables are defined: –New forward recursion:

Elze de Groot36 Scaling of probabilities Backward variable –Scaling has to be with same numbers as forward variable –New backward recursion: This normally works well, however underflow errors can still occur in models with many silent states (chapter 5)

Elze de Groot37 Summary Hidden Markov Models Parameter estimation –State sequence known –State sequence unknown Model structure –Silent states More complex Markov chains Numerical stability