Lecture 2 Hidden Markov Model. Hidden Markov Model Motivation: We have a text partly written by Shakespeare and partly “written” by a monkey, we want.

Slides:



Advertisements
Similar presentations
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Statistical NLP: Lecture 11
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Model Most pages of the slides are from lecture notes from Prof. Serafim Batzoglou’s course in Stanford: CS 262: Computational Genomics (Winter.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Hidden Markov Models.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Class 5 Hidden Markov models. Markov chains Read Durbin, chapters 1 and 3 Time is divided into discrete intervals, t i At time t, system is in one of.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
CS5263 Bioinformatics Lecture 10: Markov Chain and Hidden Markov Models.
1 Hidden Markov Models Hsin-Min Wang Institute of Information Science, Academia Sinica References: 1.L. R. Rabiner and B. H. Juang,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CONTEXT DEPENDENT CLASSIFICATION
Presentation transcript:

Lecture 2 Hidden Markov Model

Hidden Markov Model Motivation: We have a text partly written by Shakespeare and partly “written” by a monkey, we want to write a program that can tell the which part was written by Shakespeare and which part by the monkey.

21 century human-like monkey typing

Probability of event X occurring is P(X) Conditional probability –P(X|Y) : the probability of X occurring given Y Joint probability –P(X,Y) = P(X|Y)P(Y) –P(X,Y|Z) = P(X|Y,Z)P(Y|Z) Marginal probability –P(X) =  Y P(X|Y)P(Y) Review on Probabilities

Usually we want to know probability of observation O given supposition (model) M; P(O|M) Reverse problem: given O we want to know probability that M is correct: the posterior probability P(M|O) Baye’s theorem: for any two event X, Y –P(X|Y) = P(Y|X)P(X)/P(Y) –P(M|O) = P(O|M)P(M)/P(O) Posterior Probability

Definition of Hidden Markov Model The Hidden Markov Model (HMM) is a finite set of states, each of which is associated with a probability distribution. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the state visible to an external observer and therefore states are ``hidden'' from the observer; hence the name Hidden Markov Model.

Examples Text written by Shakespeare and monkey Dice thrown by a dealer with two dice, one fair and one loaded A DNA sequence with coding and non-coding segments

Case Observed Hidden state symbols Text alphabet Shakespeare/monkey Dice 1-6 (rolled fair dice/loaded dice numbers) DNA A,C,G,T coding/non-coding (bases) Examples (cont’d)

In order to define an HMM completely, following elements are needed. The number of states of the model, {q i |i=1,2,..,N}. The number of observation symbols in the alphabet, {o k |k=1,2,…,M}. A set of state transition probabilities where q t denotes the current state. Transition probabilities should satisfy the normal stochastic constraints, and A

A emission probability distribution in each of the states, where  k denotes the k th observation symbol in the alphabet, and o t the current parameter vector. Following stochastic constraints must be satisfied and B b j (k) is the probability of state j taking the symbol k

The initial state distribution,, where Therefore we can use the compact notation  (A,B,  ) to denote an HMM with discrete probability distributions. Notation Sequence of observations: O = o 1, o 2, …, o T Sequence of (hidden) states: Q = q 1, q 2, …, q T

[ M x ] Match state x. Has K emission probabilities. [ D x ] Delete state x. Non-emitter. [ I x ] Insert state x. Has K emission probabilities. [ B ] Begin state (for entering main model). Non-emitter. [ E ] End state (for exiting main model). [S] Start state. Non-emitter. [N] N-terminal unaligned sequence state. Emits on transition with K emission probabilities. Non-emitter. [C] C-terminal unaligned sequence state. Emits on transition with K emission probabilities. [J] Joining segment unaligned sequence state. Emits on transition with K emission probabilities. HMM scheme with K (DNA 4/protein 20) symbols © 2001 Per Kraulis

(1)The Markov assumption First order transition probabilities are Model with only 1 st order transition probabilities are called 1 st order Markov model. K th order Markov model involves k th order transition probabilities (2) The stationarity assumption State transition probabilities are independent of time. For any t 1 and t 2

(Cont’d) (3) The output independence assumption Current observation is statistically independent of the previous observations. Given a sequence of observations, Then, for an HMM set  A,B,  the probability for O to happen is This assumption has limited validity and in some cases may become a severe weakness of HMM.

Given the HMM set  (A,B,  ), and the observe sequence O = o 1, o 2,…o T, there are three problems of interest. (1)The Evaluation Problem: what is the probability p={O} that the observations are generated by the model? (3) The Learning Problem : Given a model  and a sequence of observations O, how should we adjust the model parameters in order to maximize the probability p={O  } ? (2) The Decoding Problem : Given a model and a sequence of observations O, what is the most likely state sequence Q = q 1, q 2,…q T that produced the observations? Three basic problems of HMMs

Example of Decoding Problem Have observation sequence O, find state sequence Q. (1)Text Shakespeare (s) or monkey (m) O =..aefjkuhrgnandshefoundhappinesssdmcamoe… Q =..mmmmmmssssssssssssssssssssssssssssmmmmmm… (2) Dice fair (F) or loaded (L) dice O = … … Q = …LLLLLLLLLLLLFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLL … (3) DNA coding (C) or non-coding (N) O = …AACCTTCCGCGCAATATAGGTAACCCCGG… Q = …NNCCCCCCCCCCCCCCCCCNNNNNNNN…

The Viterbi Algorithm Given sequence O of observations and a model, we want to find state sequence Q* with the maximum likelihood of observing O. Let Q t = q 1, q 2,…q t and O t = o 1, o 2,…o t. Suppose Q t-1 is a partial state sequence that gives maximum likelihood for observing the partial sequence O t-1, Define the quantity  t (i) = max Q t-1 p{Q t-1, q t =i, O t-1 |  } This can be computed recursively by starting with  1 (j) =  j  b j (o 1 ), for every j  t+1 (j) = b j (o t+1 ) max k (  t (k) a kj ) for every j

The Viterbi Algorithm (cont’d) Keep trb j (t+1) = arg max k (  t (k) a kj ) for later traceback. The last “best” state is given by q* T = arg max k (  T (k)) Earlier states in the sequence is obtained by traceback: q* t-1 = trb t (t) Then sequence Q* giving the maximum likelihood of observing O is Q* = q* 1, q* 2,…q* T

Example: Loaded Die Two states: j = fair (F) or loaded (L) die Symbols: k = 1,2,3,4,5,6 Transition probability (for example) – a FF =.95, a FL =.05, a LF =.10 a LL =.90 Emission probability –b F (k) = 1/6, k = 1,..,6 (all faces equal) –b L (6) = 1/2, k=6; rest b L (k) = 1/10 (6 face favored)

Testing the Viterbi Algorithm A sequence of 300 tosses of fair and loaded dice

Normally, the transition probabilities are not known, and not all the emission probabilities are known. If there are data for which even the hidden states are known, then the data can be used to train parameters in the HMM set  (A,B,  ). In the case of gene recognition in DNA sequence, we use known genes for training. Training

In prokaryotic DNA we have only two kinds of regions (ignore regulatory sequences): coding (+) and non-coding (-), and four letters, A,C,G,T So we have 8 states: k= A+,C+,G+,T+,A-,C-,G-,T- and 4 observable symbols: i = A,C,G,T Transition probability a kl = E kl /(  m E km ) where E kl is the total number of k to l transitions in all the training sequences Emission probability = 0 or 1 e.g. b A+ (A) = 1, b A+ (C)= 0 (Oversimplified) example: genes in DNA

For better result, remember that protein genes are coded in (three letter) codons, and letter usage in the 1 st, 2 nd and 3 rd positions in a codon are different. Hence use 12 states: k = A-,C-,G-,T-,A f +,C f +,G f +,T f +; f=1,2,3 Transition probability trained as before Basis for gene-finding software such as GENEMARK (Oversimplified) example: genes in DNA (cont’d)

Assume we are always using HMM, and let  denote the parameters (transition and emission probabilities). For observation O, determine  using the maximum likelihood criterion  ML = argmax  P(O|  ) If   is used to generate a set of observables {O i }, then the log-likelihood  Oi P(O i |   ) log P(O i |  ) is maximized by  =   This gives a way to find  ML by iteration (the Baum-Welch Algorithm). Maximum Likelihood

Suppose there is a probability distribution P(  ) of the parameters. Then from Bayes’ theorem, given the observation O, the posteriori probability P(  |O) = P(O|  ) P(  )/P(O) Since P(O) is independent of, the best is given by the maximum a posteriori probability estimate  MAP = argmax  P(O|  ) P(  ) Maximum a posteriori probability

Original papers by Krogh et al. –Krogh, A., Brown, M., Mian, I. S., Sjander, K., & Haussler, D. (1994a). Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235, –Krogh, A., Mian, I. S., & Haussler, D. (1994b). A hidden Markov model that finds genes in e. coli DNA. Nucleic Acids Research, 22, Book (that I find most readable) –R. Durbin, S.R. Eddy, A. Krogh and G. Mitchison “Biological sequence analysis”, (Cambridge UP, 1998) References and books

This lecture partly based on article: Cold Spring Harbor –Computational Genomics Course - Profile hidden Markov models, lecture.html The Center for Computational Biology University of Washington in St. Louis School of Medicine Good websites for HMM

Where to find software ech/Section6/Recognition/myers.h mm.html Google: Hidden Markov Model Software GeneMark –opal.biology.gatech.edu/GeneMark /