Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Slides:



Advertisements
Similar presentations
Pair-HMMs and CRFs Chuong B. Do CS262, Winter 2009 Lecture #8.
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Hidden Markov Model.
Hidden Markov Models Eine Einführung.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Profiles for Sequences
Hidden Markov Model Most pages of the slides are from lecture notes from Prof. Serafim Batzoglou’s course in Stanford: CS 262: Computational Genomics (Winter.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
CpG islands in DNA sequences
Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Lecture 6, Thursday April 17, 2003
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
S. Maarschalkerweerd & A. Tjhang1 Probability Theory and Basic Alignment of String Sequences Chapter
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Heuristic alignment algorithms and cost matrices
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Linear-Space Alignment. Linear-space alignment Using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N – k) PLUS the backpointers.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Conditional Random Fields
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Time Warping Hidden Markov Models Lecture 2, Thursday April 3, 2003.
Hidden Markov Models—Variants Conditional Random Fields 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Proteins, Pair HMMs, and Alignment. CS262 Lecture 8, Win06, Batzoglou A state model for alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC.
Hidden Markov Models.
Conditional Random Fields 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
CS262 Lecture 9, Win07, Batzoglou Conditional Random Fields A brief description.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Variants of HMMs. Higher-order HMMs How do we model “memory” larger than one time point? P(  i+1 = l |  i = k)a kl P(  i+1 = l |  i = k,  i -1 =
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Introduction to Profile Hidden Markov Models
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Comp. Genomics Recitation 3 The statistics of database searching.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Sequence Alignment.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Learning Sequence Motif Models Using Expectation Maximization (EM)
Variants of HMMs.
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

CS262 Lecture 8, Win07, Batzoglou Substitutions of Amino Acids Mutation rates between amino acids have dramatic differences!

CS262 Lecture 8, Win07, Batzoglou Substitution Matrices BLOSUM matrices: 1.Start from BLOCKS database (curated, gap-free alignments) 2.Cluster sequences according to > X% identity 3.Calculate A ab : # of aligned a-b in distinct clusters, correcting by 1/mn, where m, n are the two cluster sizes 4.Estimate P(a) = (  b A ab )/(  c≤d A cd ); P(a, b) = A ab /(  c≤d A cd )

CS262 Lecture 8, Win07, Batzoglou Probabilistic interpretation of an alignment An alignment is a hypothesis that the two sequences are related by evolution Goal: Produce the most likely alignment Assert the likelihood that the sequences are indeed related

CS262 Lecture 8, Win07, Batzoglou A Pair HMM for alignments M P(x i, y j ) I P(x i ) J P(y j ) 1 – 2  1 –      This model generates two sequences simultaneously Match/Mismatch state M: P(x, y) reflects substitution frequencies between pairs of amino acids Insertion states I, J: P(x), P(y) reflect frequencies of each amino acid  : set so that 1/2  is avg. length before next gap  : set so that 1/(1 –  ) is avg. length of a gap M Model M optional

CS262 Lecture 8, Win07, Batzoglou A Pair HMM for unaligned sequences I P(x i ) J P(y j ) 11 Two sequences are independently generated from one another P(x, y | R) = P(x 1 )…P(x m ) P(y 1 )…P(y n ) =  i P(x i )  j P(y j ) R Model R

CS262 Lecture 8, Win07, Batzoglou To compare ALIGNMENT vs. RANDOM hypothesis Every pair of letters contributes:M (1 – 2  ) P(x i, y j ) when matched  P(x i ) P(y j ) when gappedR P(x i ) P(y j ) in random model Focus on comparison of P(x i, y j ) vs. P(x i ) P(y j ) M P(x i, y j ) I P(x i ) J P(y j ) 1 – 2  1 –     I P(x i ) J P(y j ) 1 1

CS262 Lecture 8, Win07, Batzoglou To compare ALIGNMENT vs. RANDOM hypothesis Every pair of letters contributes:M (1 – 2  ) P(x i, y j ) when matched  P(x i ) P(y j ) when gappedR P(x i ) P(y j ) in random model Focus on comparison of P(x i, y j ) vs. P(x i ) P(y j ) M P(x i, y j ) I P(x i ) J P(y j ) 1 – 2   (1 –  ) (1 – 2  )  I P(x i ) J P(y j ) – 2  Equivalent!

CS262 Lecture 8, Win07, Batzoglou To compare ALIGNMENT vs. RANDOM hypothesis Every pair of letters contributes:M (1 – 2  ) P(x i, y j ) when matched  P(x i ) P(y j ) when gappedR P(x i ) P(y j ) in random model Focus on comparison of P(x i, y j ) vs. P(x i ) P(y j ) M P(x i, y j )/ P(x i ) P(y j ) I1I1 J1J1 1 – 2   (1 –  ) (1 – 2  )  I P(x i ) J P(y j ) – 2  Equivalent!

CS262 Lecture 8, Win07, Batzoglou To compare ALIGNMENT vs. RANDOM hypothesis Idea: We will divide alignment score by the random score, and take logarithms Let P(x i, y j ) s(x i, y j ) = log ––––––––– + log (1 – 2  ) P(x i ) P(y j )  (1 –  ) P(x i ) d = – log ––––––––––––– (1 – 2  ) P(x i )  P(x i ) e = – log –––––– P(x i ) = Defn substitution score = Defn gap initiation penalty = Defn gap extension penalty

CS262 Lecture 8, Win07, Batzoglou The meaning of alignment scores The Viterbi algorithm for Pair HMMs corresponds exactly to global alignment DP with affine gaps V M (i, j) = max { V M (i – 1, j – 1), V I ( i – 1, j – 1), V j ( i – 1, j – 1) } + s(x i, y j ) V I (i, j) = max { V M (i – 1, j) – d, V I ( i – 1, j) – e } V J (i, j) = max { V M (i, j – 1) – d, V I ( i, j – 1) – e }  s(.,.) (1 – 2  ) ~how often a pair of letters substitute one another   1/mean length of next gap   (1 –  ) / (1 – 2  ) 1/mean arrival time of next gap

CS262 Lecture 8, Win07, Batzoglou The meaning of alignment scores Match/mismatch scores: P(x i, y j ) s(a, b)  log –––––––––– (ignore log(1 – 2  ) for the moment) P(x i ) P(y j ) Example: DNA regions between human and mouse genes have average conservation of 80% 1.What is the substitution score for a match? P(a, a) + P(c, c) + P(g, g) + P(t, t) = 0.8  P(x, x) = 0.2 P(a) = P(c) = P(g) = P(t) = 0.25 s(x, x) = log [ 0.2 / ] = What is the substitution score for a mismatch? P(a, c) +…+P(t, g) = 0.2  P(x, y  x) = 0.2/12 = s(x, y  x) = log[ / ] = What ratio matches/(matches + mism.) gives score 0? x(#match) – y(#mism) = (#match) – (#mism) = 0 #match = 1.137(#mism) matches = 53.2%

CS262 Lecture 8, Win07, Batzoglou The meaning of alignment scores The global alignment algorithm we learned, corresponds to:  Find the most likely alignment under the 3-state pHMM The score of an alignment corresponds to:  Log-likehood ratio between P(best alignment| alignment model), and P(sequences were generated independently)

CS262 Lecture 8, Win07, Batzoglou Substitution Matrices BLOSUM matrices: 1.Start from BLOCKS database (curated, gap-free alignments) 2.Cluster sequences according to > X% identity 3.Calculate A ab : # of aligned a-b in distinct clusters, correcting by 1/mn, where m, n are the two cluster sizes 4.Estimate P(a) = (  b A ab )/(  c≤d A cd ); P(a, b) = A ab /(  c≤d A cd )

CS262 Lecture 8, Win07, Batzoglou BLOSUM matrices BLOSUM 50 BLOSUM 62 (The two are scaled differently)

CS262 Lecture 8, Win07, Batzoglou Conditional Random Fields A brief description of a relatively new kind of graphical model

CS262 Lecture 8, Win07, Batzoglou Let’s look at an HMM again Why are HMMs convenient to use?  Because we can do dynamic programming with them! “Best” state sequence for 1…i interacts with “best” sequence for i+1…N using K 2 arrows V l (i+1) = e l (i+1) max k V k (i) a kl = max k ( V k (i) + [ e(l, i+1) + a(k, l) ] ) (where e(.,.) and a(.,.) are logs) Total likelihood of all state sequences for 1…i+1 can be calculated from total likelihood for 1…i by only summing up K 2 arrows 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xNxN 2 1 K 2

CS262 Lecture 8, Win07, Batzoglou Let’s look at an HMM again Some shortcomings of HMMs  Can’t model state duration Solution: explicit duration models (Semi-Markov HMMs)  Unfortunately, state  i cannot “look” at any letter other than x i ! Strong independence assumption: P(  i | x 1 …x i-1,  1 …  i-1 ) = P(  i |  i-1 ) 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xNxN 2 1 K 2

CS262 Lecture 8, Win07, Batzoglou Let’s look at an HMM again Another way to put this, features used in objective function P(x,  ):  a kl, e k (b), where b    At position i: all K 2 a kl features, and all K e l (x i ) features play a role  OK forget probabilistic interpretation for a moment  “Given that prev. state is k, current state is l, how much is current score?” V l (i) = V k (i – 1) + (a(k, l) + e(l, i)) = V k (i – 1) + g(k, l, x i ) Let’s generalize g!!! V k (i – 1) + g(k, l, x, i) 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xNxN 2 1 K 2

CS262 Lecture 8, Win07, Batzoglou “Features” that depend on many pos. in x What do we put in g(k, l, x, i)?  The “higher” g(k, l, x, i), the more we like going from k to l at position i Richer models using this additional power  Examples Casino player looks at previous 100 pos’ns; if > 50 6s, he likes to go to Fair g(Loaded, Fair, x, i) += 1[x i-100, …, x i-1 has > 50 6s]  w DON’T_GET_CAUGHT Genes are close to CpG islands; for any state k, g(k, exon, x, i) += 1[x i-1000, …, x i+1000 has > 1/16 CpG]  w CG_RICH_REGION x1x1 x2x2 x3x3 x6x6 x4x4 x5x5 x7x7 x 10 x8x8 x9x9 ii  i-1

CS262 Lecture 8, Win07, Batzoglou “Features” that depend on many pos. in x Conditional Random Fields—Features 1.Define a set of features that you think are important  All features should be functions of current state, previous state, x, and position i  Example: Old features: transition k  l, emission b from state k Plus new features: prev 100 letters have 50 6s  Number the features 1…n: f 1 (k, l, x, i), …, f n (k, l, x, i) features are indicator true/false variables  Find appropriate weights w 1,…, w n for when each feature is true weights are the parameters of the model 2.Let’s assume for now each feature has a weight w j  Then, g(k, l, x, i) =  j=1…n f j (k, l, x, i)  w j x1x1 x2x2 x3x3 x6x6 x4x4 x5x5 x7x7 x 10 x8x8 x9x9

CS262 Lecture 8, Win07, Batzoglou “Features” that depend on many pos. in x Define V k (i): Optimal score of “parsing” x 1 …x i and ending in state k Then, assuming V k (i) is optimal for every k at position i, it follows that V l (i+1) = max k [V k (i) + g(k, l, x, i+1)] Why? Even though at pos’n i+1 we “look” at arbitrary positions in x, we are only “affected” by the choice of ending state k Therefore, Viterbi algorithm again finds optimal (highest scoring) parse for x 1 …x N x1x1 x2x2 x3x3 x6x6 x4x4 x5x5 x7x7 x 10 x8x8 x9x9

CS262 Lecture 8, Win07, Batzoglou “Features” that depend on many pos. in x Score of a parse depends on all of x at each position Can still do Viterbi because state  i only “looks” at prev. state  i-1 and the constant sequence x 11 x1x1 22 x2x2 33 x3x3 44 x4x4 55 x5x5 66 x6x6 … 11 x1x1 22 x2x2 33 x3x3 44 x4x4 55 x5x5 66 x6x6 … HMM CRF

CS262 Lecture 8, Win07, Batzoglou How many parameters are there, in general? Arbitrarily many parameters!  For example, let f j (k, l, x, i) depend on x i-5, x i-4, …, x i+5 Then, we would have up to K  |  | 11 parameters!  Advantage: powerful, expressive model Example: “if there are more than 50 6’s in the last 100 rolls, but in the surrounding 18 rolls there are at most 3 6’s, this is evidence we are in Fair state” Interpretation: casino player is afraid to be caught, so switches to Fair when he sees too many 6’s Example: “if there are any CG-rich regions in the vicinity (window of 2000 pos) then favor predicting lots of genes in this region”  Question: how do we train these parameters?

CS262 Lecture 8, Win07, Batzoglou Conditional Training Hidden Markov Model training:  Given training sequence x, “true” parse   Maximize P(x,  ) Disadvantage:  P(x,  ) = P(  | x) P(x) Quantity we care about so as to get a good parse Quantity we don’t care so much about because x is always given

CS262 Lecture 8, Win07, Batzoglou Conditional Training P(x,  ) = P(  | x) P(x) P(  | x) = P(x,  ) / P(x) Recall F(j, x,  ) = # times feature f j occurs in (x,  ) =  i=1…N f j (k, l, x, i) ; count f j in x,  In HMMs, let’s denote by w j the weight of j th feature: w j = log(a kl ) or log(e k (b)) Then, HMM: P(x,  ) = exp [  j=1…n w j  F(j, x,  ) ] CRF:Score(x,  ) = exp [  j=1…n w j  F(j, x,  ) ]

CS262 Lecture 8, Win07, Batzoglou Conditional Training In HMMs, P(  | x) = P(x,  ) / P(x) P(x,  ) = exp [  j=1…n w j  F(j, x,  ) ] P(x) =   exp [  j=1…n w j  F(j, x,  ) ] =: Z Then, in CRF we can do the same to normalize Score(x,  ) into a prob. P CRF (  | x) = exp [  j=1…n w j  F(j, x,  ) ] / Z QUESTION: Why is this a probability???

CS262 Lecture 8, Win07, Batzoglou Conditional Training 1.We need to be given a set of sequences x and “true” parses  2.Calculate Z by a sum-of-paths algorithm similar to HMM We can then easily calculate P(  | x) 3.Calculate partial derivative of P(  | x) w.r.t. each parameter w j (not covered—akin to forward/backward) Update each parameter with gradient descent! 4.Continue until convergence to optimal set of weights P(  | x) = exp [  j=1…n w j  F(j, x,  ) ] / Zis convex!!!

CS262 Lecture 8, Win07, Batzoglou Conditional Random Fields—Summary 1.Ability to incorporate complicated non-local feature sets Do away with some independence assumptions of HMMs Parsing is still equally efficient 2.Conditional training Train parameters that are best for parsing, not modeling Need labeled examples—sequences x and “true” parses  (Can train on unlabeled sequences, however it is unreasonable to train too many parameters this way) Training is significantly slower—many iterations of forward/backward