CSE182-L10 HMM applications.

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Hidden Markov Model.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models Chapter 11. CG “islands” The dinucleotide “CG” is rare –C in a “CG” often gets “methylated” and the resulting C then mutates to T.
Hidden Markov models and its application to bioinformatics.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Hidden Markov Models.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Profiles for Sequences
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
A Hidden Markov Model for Progressive Multiple Alignment Ari Löytynoja and Michel C. Milinkovitch Appeared in BioInformatics, Vol 19, no.12, 2003 Presented.
Biochemistry and Molecular Genetics Computational Bioscience Program Consortium for Comparative Genomics University of Colorado School of Medicine
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CSE182-L10 Gene Finding.
Similar Techniques For Molecular Sequencing and Network Security Doug Madory 27 APR 05 Big Picture Big Picture Protein Structure Protein Structure Sequencing.
Hidden Markov Models.
Hw1 Shown below is a matrix of log odds column scores made from an alignment of a set of sequences. (A) Calculate the alignment score for each of the four.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
CSE182-L9 Modeling Protein domains using HMMs. Profiles Revisited Note that profiles are a powerful way of capturing domain information Pr(sequence x|
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without loss of Accuracy Zasha Weinberg, and Walter L. Ruzzo Presented by: Jeff.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
(H)MMs in gene prediction and similarity searches.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Genome Annotation (protein coding genes)
Dynamic Programming General Idea
Hidden Markov Models.
Intro to Alignment Algorithms: Global and Local
Three classic HMM problems
Professor of Computer Science and Mathematics
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Hidden Markov Model ..
Professor of Computer Science and Mathematics
Professor of Computer Science and Mathematics
CSE 5290: Algorithms for Bioinformatics Fall 2009
Dynamic Programming General Idea
CSE 5290: Algorithms for Bioinformatics Fall 2011
CSE 5290: Algorithms for Bioinformatics Fall 2009
Hidden Markov Model Lecture #6
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Presentation transcript:

CSE182-L10 HMM applications

Probability of being in specific states What is the probability that we were in state k at step I? Pr[All paths that passed through state k at step I, and emitted x] Pr[All paths that emitted x]

The Forward Algorithm x1…xi Recall v[i,j] : Probability of the most likely path the automaton chose in emitting x1…xi, and ending up in state j. Define f[i,j]: Probability that the automaton started from state 1, and emitted x1…xi What is the difference? x1…xi

Most Likely path versus Probability of Arrival There are multiple paths from states 1..j in which the automaton can output x1…xi In computing the viterbi path, we choose the most likely path V[i,j] = maxπ Pr[x1…xi|π] The probability of emitting x1…xi and ending up in state j is given by F[i,j] = ∑π Pr[x1…xi|π]

The Forward Algorithm Recall that Instead 1 j v(i,j) = max lQ {v(i-1,l).A[l,j] }.ej(xi) Instead F(i,j) = ∑lQ (F(i-1,l).A[l,j] ).ej(xi) 1 j

The Backward Algorithm Define b[i,j]: Probability that the automaton started from state i, emitted xi+1…xn and ended up in the final state xi+1…xn x1…xi 1 m i

Forward Backward Scoring F(i,j) = ∑lQ (F(i-1,l).A[l,j] ).ej(xi) B[i,j] = ∑lQ (A[j,l].el(xi+1) B(i+1,l)) Pr[x,πi=k]=F(i,k) B(i,k)

Application of HMMs How do we modify this to handle indels? 0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0 0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2 0.0 0.4 0.3 0.0 0.5 0.0 A C G T 1 2 3 4 5 6 7 8

Applications of the HMM paradigm Modifying Profile HMMs to handle indels States Ii: insertion states States Di: deletion states 1 2 3 4 5 6 7 8 A C G T 0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0 0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2 0.0 0.4 0.3 0.0 0.5 0.0

Profile HMMs An assignment of states implies insertion, match, or deletion. EX: ACACTGTA 1 2 3 4 5 6 7 8 A C G T 0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0 0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2 0.0 0.4 0.3 0.0 0.5 0.0 C A A C T G T A

Viterbi Algorithm revisited Define vMj (i) as the log likelihood score of the best path for matching x1..xi to profile HMM ending with xi emitted by the state Mj. vIj(i) and vDj(i) are defined similarly.

Viterbi Equations for Profile HMMs vMj-1(i-1) + log(A[Mj-1, Mj]) vMj(i) = log (eMj(xi)) + max vIj-1(i-1) + log(A[Ij-1, Mj]) vDj-1(i-1) + log(A[Dj-1, Mj]) vMj(i-1) + log(A[Mj-1, Ij]) vIj(i) = log (eIj(xi)) + max vIj(i-1) + log(A[Ij-1, Ij]) vDj(i-1) + log(A[Dj-1, Ij])

Compositional Signals CpG islands. In genomic sequence, the CG di-nucleotide is rarely seen CG helps methylation of C, and subsequent mutation to T. In regions around a gene, the methylation is suppressed, and therefore CG is more common. CpG islands: Islands of CG on the genome. How can you detect CpG islands?

An HMM for Genomic regions Node A emits A with Prob. 1, and 0 for all other bases. The start and end node do not emit any symbol. All outgoing edges from nodes are equi-probable, except for the ones coming out of C. A G .25 0.1 end start C 0.4 T .25

An HMM for CpG islands Node A emits A with Prob. 1, and 0 for all other bases. The start and end node do not emit any symbol. All outgoing edges from nodes are equi-probable, except for the ones coming out of C. A G 0.25 0.25 end start C 0.25 T

HMM for detecting CpG Islands B A G A 0.1 end G start end C start 0.4 T C T In the best parse of a genomic sequence, each base is assigned a state from the sets A, and B. Any substring with multiple states coming from B can be described as a CpG island.

HMM: Summary HMMs are a natural technique for modeling many biological domains. They can capture position dependent, and also compositional properties. HMMs have been very useful in an important Bioinformatics application: gene finding.