Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Slides:



Advertisements
Similar presentations
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model.
Hidden Markov Models Chapter 11. CG “islands” The dinucleotide “CG” is rare –C in a “CG” often gets “methylated” and the resulting C then mutates to T.
Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Model Most pages of the slides are from lecture notes from Prof. Serafim Batzoglou’s course in Stanford: CS 262: Computational Genomics (Winter.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CpG islands in DNA sequences
Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems.
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Linear-Space Alignment. Linear-space alignment Using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N – k) PLUS the backpointers.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Sequence Alignment Cont’d. Needleman-Wunsch with affine gaps Initialization:V(i, 0) = d + (i – 1)  e V(0, j) = d + (j – 1)  e Iteration: V(i, j) = max{
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Time Warping Hidden Markov Models Lecture 2, Thursday April 3, 2003.
Hidden Markov Models—Variants Conditional Random Fields 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Sequence Alignment Cont’d. Linear-space alignment Iterate this procedure to the left and right! N-k * M/2 k*k*
Hidden Markov Models.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Sequence Alignment Cont’d. CS262 Lecture 4, Win06, Batzoglou Indexing-based local alignment (BLAST- Basic Local Alignment Search Tool) 1.SEED Construct.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CS5263 Bioinformatics Lecture 11: Markov Chain and Hidden Markov Models.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
CS5263 Bioinformatics Lecture 12: Hidden Markov Models and applications.
CS5263 Bioinformatics Lecture 10: Markov Chain and Hidden Markov Models.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Computational Genomics II: Sequence Modeling & Gene Finding with.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Eric Xing © Eric CMU, Machine Learning Mixture Model, HMM, and Expectation Maximization Eric Xing Lecture 9, August 14, 2010 Reading:
Eric Xing © Eric CMU, Machine Learning Structured Models: Hidden Markov Models versus Conditional Random Fields Eric Xing Lecture 13,
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
CSCI2950-C Lecture 2 September 11, Comparative Genomic Hybridization (CGH) Measuring Mutations in Cancer.
Lecture 16, CS5671 Hidden Markov Models (“Carnivals with High Walls”) States (“Stalls”) Emission probabilities (“Odds”) Transitions (“Routes”) Sequences.
Hidden Markov Models BMI/CS 576
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Hidden Markov Model ..
CSE 5290: Algorithms for Bioinformatics Fall 2009
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Presentation transcript:

Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

Probability review

Markov chains Markov property: P(X 0, X 1 … X t ) = P(X 0 )P(X 1 |X 0 )…P(X t |X t-1 ) Formally: – State space = list of possible states – Transition matrix = probability of moving from one X to another – Initial distribution = initial value of X

Example: The dishonest casino A casino has two dice: Fair die P(1) = P(2) = P(3) = P(5) = P(6) = 1/6 Loaded die P(1) = P(2) = P(3) = P(5) = 1/10 P(6) = 1/2 Casino player switches back-&-forth between fair and loaded die once every 20 turns Game: 1.You bet $1 2.You roll (always with a fair die) 3.Casino player rolls (maybe with fair die, maybe with loaded die) 4.Highest number wins $2

Question # 1 – Evaluation GIVEN A sequence of rolls by the casino player QUESTION How likely is this sequence, given our model of how the casino works? This is the EVALUATION problem in HMMs

Question # 2 – Decoding GIVEN A sequence of rolls by the casino player QUESTION What portion of the sequence was generated with the fair die, and what portion with the loaded die? This is the DECODING question in HMMs

Question # 3 – Learning GIVEN A sequence of rolls by the casino player QUESTION How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back? This is the LEARNING question in HMMs

Definition of a Hidden Markov Model Definition: A hidden Markov model (HMM) Alphabet  = { b 1, b 2, …, b M } Set of states Q = { 1,..., K } Transition probabilities between any two states a ij = transition prob from state i to state j a i1 + … + a iK = 1, for all states i = 1…K Start probabilities a 0i a 01 + … + a 0K = 1 Emission probabilities within each state e i (b) = P( x i = b |  i = k) e i (b 1 ) + … + e i (b M ) = 1, for all states i = 1…K K 1 … 2

The dishonest casino model FAIRLOADED P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/2

HMM properties: Memory-less At each time step t, the only thing that affects future states is the current state  t P(  t+1 =k | “whatever happened so far”) = P(  t+1 =k |  1,  2, …,  t, x 1, x 2, …, x t )= P(  t+1 =k |  t ) K 1 … 2

HMM properties: A parse of a sequence Given a sequence x = x 1 ……x N, A parse of x is a sequence of states  =  1, ……,  N 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

The three main questions on HMMs 1.Evaluation GIVEN a HMM M, and a sequence x, FIND Prob[ x | M ] 2.Decoding GIVENa HMM M, and a sequence x, FINDthe sequence  of states that maximizes P[ x,  | M ] 3.Learning GIVENa HMM M, with unspecified transition/emission probs., and a sequence x, FINDparameters  = (e i (.), a ij ) that maximize P[ x |  ]

Let’s not be confused by notation P[ x | M ]: The probability that sequence x was generated by the model The model is: architecture (#states, etc) + parameters  = a ij, e i (.) So, P[ x |  ] and P[ x ] are the same, when the architecture, and the entire model, respectively, are implied Similarly, P[ x,  | M ] and P[ x,  ] are the same In the LEARNING problem we always write P[ x |  ] to emphasize that we are seeking the  that maximizes P[ x |  ]

Problem 1: Evaluation Find the likelihood of a parse

Likelihood of a parse Given a sequence x = x 1 ……x N and a parse  =  1, ……,  N, To find how likely is the parse: (given our HMM) P(x,  ) = P(x 1, …, x N,  1, ……,  N ) = P(x N,  N |  N-1 ) P(x N-1,  N-1 |  N-2 )……P(x 2,  2 |  1 ) P(x 1,  1 ) = P(x N |  N ) P(  N |  N-1 ) ……P(x 2 |  2 ) P(  2 |  1 ) P(x 1 |  1 ) P(  1 ) = a 0  1 a  1  2 ……a  N-1  N e  1 (x 1 )……e  N (x N ) 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

Example: the dishonest casino Let the sequence of rolls be: x = 1, 2, 1, 5, 6, 2, 1, 6, 2, 4 Then, what is the likelihood of  = Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair? (say initial probs a 0Fair = ½, a oLoaded = ½) ½  P(1 | Fair) P(Fair | Fair) P(2 | Fair) P(Fair | Fair) … P(4 | Fair) = ½  (1/6) 10  (0.95) 9 = = 0.5  10 -9

Example: the dishonest casino So, the likelihood the die is fair in all this run is just  OK, but what is the likelihood of  = Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded? ½  P(1 | Loaded) P(Loaded, Loaded) … P(4 | Loaded) = ½  (1/10) 8  (1/2) 2 (0.95) 9 = = 7.9  Therefore, it is after all 6.59 times more likely that the die is fair all the way, than that it is loaded all the way.

Example: the dishonest casino Let the sequence of rolls be: x = 1, 6, 6, 5, 6, 2, 6, 6, 3, 6 Now, what is the likelihood  = F, F, …, F? ½  (1/6) 10  (0.95) 9 = 0.5  10 -9, same as before What is the likelihood  = L, L, …, L? ½  (1/10) 4  (1/2) 6 (0.95) 9 = = 0.5  So, it is 100 times more likely the die is loaded

Problem 2: Decoding Find the best parse (states) of a sequence

Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We can use dynamic programming! Let V k (i) = max {  1,…,i-1} P[x 1 …x i-1,  1, …,  i-1, x i,  i = k] = Probability of most likely sequence of states ending at state  i = k 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

Decoding – main idea Given that for all states k, and for a fixed position i, V k (i) = max {  1,…,i-1} P[x 1 …x i-1,  1, …,  i-1, x i,  i = k], what is V k (i+1)? From definition, V l (i+1) = max {  1,…,i} P[ x 1 …x i,  1, …,  i, x i+1,  i+1 = l ] = max {  1,…,i} P(x i+1,  i+1 = l | x 1 …x i,  1,…,  i ) P[x 1 …x i,  1,…,  i ] = max {  1,…,i} P(x i+1,  i+1 = l |  i ) P[x 1 …x i-1,  1, …,  i-1, x i,  i ] = max k P(x i+1,  i+1 = l |  i = k) max {  1,…,i-1} P[x 1 …x i-1,  1,…,  i-1, x i,  i =k] = e l (x i+1 ) max k a kl V k (i)

Dynamic programming approach Similar to “aligning” a set of states to a sequence Time: O(K 2 N) Space: O(KN) x 1 x 2 x 3 ………………………………………..x N State 1 2 K V j (i)

The algorithm Input: x = x 1 ……x N Initialization: V 0 (0) = 1(0 is the imaginary first position) V k (0) = 0, for all k > 0 Iteration: V j (i) = e j (x i )  max k a kj V k (i-1) Ptr j (i) = argmax k a kj V k (i-1) Termination: P(x,  *) = max k V k (N) Traceback:  N * = argmax k V k (N)  i-1 * = Ptr  i (i)

A practical detail Underflows are a significant problem P[ x 1,…., x i,  1, …,  i ] = a 0  1 a  1  2 ……a  i e  1 (x 1 )……e  i (x i ) These numbers become extremely small – underflow Solution: Take the logs of all values V l (i) = log e k (x i ) + max k [ V k (i-1) + log a kl ]

Example Let x be a sequence with a portion of ~ 1/6 6’s, followed by a portion of ~ ½ 6’s… x = … … Then, it is not hard to show that optimal parse is (exercise): FFF…………………...F LLL………………………...L 6 letters “123456” parsed as F, contribute.95 6  (1/6) 6 = 1.6  parsed as L, contribute.95 6  (1/2) 1  (1/10) 5 = 0.4  “162636” parsed as F, contribute.95 6  (1/6) 6 = 1.6  parsed as L, contribute.95 6  (1/2) 3  (1/10) 3 = 9.0  10 -5

Problem 3: Learning Re-estimate the parameters of the model based on training data

Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where we have good (experimental) annotations of the CpG islands GIVEN:the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls 2.Estimation when the “right answer” is unknown Examples: GIVEN:the porcupine genome; we don’t know how frequent are the CpG islands there, neither do we know their composition GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice QUESTION:Update the parameters  of the model to maximize P(x|  )

1.When the right answer is known Given x = x 1 …x N for which the true  =  1 …  N is known, Define: A kl = # times k  l transition occurs in  E k (b) = # times state k in  emits b in x We can show that the maximum likelihood parameters  are: A kl E k (b) a kl = ––––– e k (b) = –––––––  i A ki  c E k (c)

2.When the right answer is unknown We don’t know the true A kl, E k (b) Idea: We estimate our “best guess” on what A kl, E k (b) are We update the parameters of the model, based on our guess We repeat

2.When the right answer is unknown Starting with our best guess of parameters  : Given x = x 1 …x N for which the true  =  1 …  N is unknown, We can reach a more likely parameter set  if we repeat many times. Principle: EXPECTATION MAXIMIZATION 1.Estimate A kl, E k (b) in the training data 2.Update  according to A kl, E k (b) 3.Repeat 1 & 2, until convergence