Master’s course Bioinformatics Data Analysis and Tools

Slides:

Advertisements

Similar presentations

Markov models and applications

Advertisements

. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.

Hidden Markov Model in Biological Sequence Analysis – Part 2

Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.

Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz

Lecture 2 Hidden Markov Model. Hidden Markov Model Motivation: We have a text partly written by Shakespeare and partly “written” by a monkey, we want.

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

Learning HMM parameters

Hidden Markov Model.

Introduction to Hidden Markov Models

Hidden Markov Models Eine Einführung.

Hidden Markov Models.

Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.

MNW2 course Introduction to Bioinformatics

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

Hidden Markov Models Modified from:

Statistical NLP: Lecture 11

Profiles for Sequences

Hidden Markov Models Theory By Johan Walters (SR 2003)

Statistical NLP: Hidden Markov Models Updated 8/12/2005.

JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.

. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}

درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.

Master’s course Bioinformatics Data Analysis and Tools Lecture 12: (Hidden) Markov models Centre for Integrative Bioinformatics.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.

Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.

. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Hidden Markov Models Lecture 5, Tuesday April 15, 2003.

S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter

. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Hidden Markov Models Lecture 5, Tuesday April 15, 2003.

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter

Hidden Markov Models.

Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.

Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Probabilistic Sequence Alignment BMI 877 Colin Dewey February 25, 2014.

Markov Chain Models BMI/CS 576 Fall 2010.

Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.

MNW2 course Introduction to Bioinformatics Lecture 22: Markov models Centre for Integrative Bioinformatics FEW/FALW

Hidden Markov Models for Sequence Analysis 4

BINF6201/8201 Hidden Markov Models for Sequence Analysis

Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.

Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.

Hidden Markov Models Usman Roshan CS 675 Machine Learning.

Hidden Markov Models BMI/CS 776 Mark Craven March 2002.

10/29/20151 Gene Finding Project (Cont.) Charles Yan.

Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.

Interpolated Markov Models for Gene Finding BMI/CS 776 Mark Craven February 2002.

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.

CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:

From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009.

Markov Chain Models BMI/CS 576 Colin Dewey Fall 2015.

(H)MMs in gene prediction and similarity searches.

1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.

Hidden Markov Models BMI/CS 576

Markov Chain Models BMI/CS 776

Interpolated Markov Models for Gene Finding

Hidden Markov Models Part 2: Algorithms

Presentation transcript:

Master’s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics

Problem in biology Data and patterns are often not clear cut When we want to make a method to recognise a pattern (e.g. a sequence motif), we have to learn from the data (e.g. maybe there are other differences between sequences that have the pattern and those that do not) This leads to Data mining and Machine learning

A widely used machine learning approach: Markov models Contents: Markov chain models (1st order, higher order and inhomogeneous models; parameter estimation; classification) • Interpolated Markov models (and back-off models) • Hidden Markov models (forward, backward and Baum- Welch algorithms; model topologies; applications to gene finding and protein family modeling

Markov Chain Models a Markov chain model is defined by: a set of states some states emit symbols other states (e.g. the begin state) are silent a set of transitions with associated probabilities the transitions emanating from a given state define a distribution over the possible next states

Markov Chain Models given some sequence x of length L, we can ask how probable the sequence is given our model for any probabilistic model of sequences, we can write this probability as key property of a (1st order) Markov chain: the probability of each Xi depends only on Xi-1

Markov Chain Models Pr(cggt) = Pr(c)Pr(g|c)Pr(g|g)Pr(t|g)

Markov Chain Models Can also have an end state, allowing the model to represent: Sequences of different lengths Preferences for sequences ending with particular symbols

Markov Chain Models The transition parameters can be denoted by where Similarly we can denote the probability of a sequence x as Where aBxi represents the transition from the begin state

Example Application CpG islands CGdinucleotides are rarer in eukaryotic genomes than expected given the independent probabilities of C, G but the regions upstream of genes are richer in CG dinucleotides than elsewhere – CpG islands useful evidence for finding genes Could predict CpG islands with Markov chains one to represent CpG islands one to represent the rest of the genome Example includes using Maximum likelihood and Bayes’ statistical data and feeding it to a HM model

Estimating the Model Parameters Given some data (e.g. a set of sequences from CpG islands), how can we determine the probability parameters of our model? One approach: maximum likelihood estimation given a set of data D set the parameters  to maximize Pr(D | ) i.e. make the data D look likely under the model

Maximum Likelihood Estimation Suppose we want to estimate the parameters Pr(a), Pr(c), Pr(g), Pr(t) And we’re given the sequences: accgcgctta gcttagtgac tagccgttac Then the maximum likelihood estimates are: Pr(a) = 6/30 = 0.2 Pr(g) = 7/30 = 0.233 Pr(c) = 9/30 = 0.3 Pr(t) = 8/30 = 0.267

These data are derived from genome sequences

Higher Order Markov Chains An nth order Markov chain over some alphabet is equivalent to a first order Markov chain over the alphabet of n-tuples Example: a 2nd order Markov model for DNA can be treated as a 1st order Markov model over alphabet: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, and TT (i.e. all possible dipeptides)

A Fifth Order Markov Chain

Inhomogenous Markov Chains In the Markov chain models we have considered so far, the probabilities do not depend on where we are in a given sequence In an inhomogeneous Markov model, we can have different distributions at different positions in the sequence Consider modeling codons in protein coding regions

Inhomogenous Markov Chains

A Fifth Order Inhomogenous Markov Chain

Selecting the Order of a Markov Chain Model Higher order models remember more “history” Additional history can have predictive value Example: – predict the next word in this sentence fragment “…finish __” (up, it, first, last, …?) – now predict it given more history “Fast guys finish __”

Selecting the Order of a Markov Chain Model However, the number of parameters we need to estimate grows exponentially with the order – for modeling DNA we need parameters for an nth order model, with n  5 normally The higher the order, the less reliable we can expect our parameter estimates to be – estimating the parameters of a 2nd order homogenous Markov chain from the complete genome of E. Coli, we would see each word > 72,000 times on average – estimating the parameters of an 8th order chain, we would see each word ~ 5 times on average

Interpolated Markov Models The IMM idea: manage this trade-off by interpolating among models of various orders Simple linear interpolation:

Interpolated Markov Models We can make the weights depend on the history – for a given order, we may have significantly more data to estimate some words than others General linear interpolation

Gene Finding: Search by Content Encoding a protein affects the statistical properties of a DNA sequence – some amino acids are used more frequently than others (Leu more popular than Trp) – different numbers of codons for different amino acids (Leu has 6, Trp has 1) – for a given amino acid, usually one codon is used more frequently than others This is termed codon preference Codon preferences vary by species

Codon Preference in E. Coli AA codon /1000 ---------------------- Gly GGG 1.89 Gly GGA 0.44 Gly GGU 52.99 Gly GGC 34.55 Glu GAG 15.68 Glu GAA 57.20 Asp GAU 21.63 Asp GAC 43.26

Search by Content • Common way to search by content – build Markov models of coding & noncoding regions – apply models to ORFs (Open Reading Frames) or fixed- sized windows of sequence • GeneMark [Borodovsky et al.] – popular system for identifying genes in bacterial genomes – uses 5th order inhomogenous Markov chain models

The GLIMMER System Salzberg et al., 1998 System for identifying genes in bacterial genomes Uses 8th order, inhomogeneous, interpolated Markov chain models

IMMs in GLIMMER How does GLIMMER determine the values? First, let us express the IMM probability calculation recursively:

IMMs in GLIMMER If we haven’t seen xi-1… xi-n more than 400 times, then compare the counts for the following: • Use a statistical test ( 2) to get a value d indicating our confidence that the distributions represented by the two sets of counts are different

IMMs in GLIMMER 2 score when comparing nth-order with n-1th-order Markov model (preceding slide)

The GLIMMER method 8th order IMM vs. 5th order Markov model Trained on 1168 genes (ORFs really) Tested on 1717 annotated (more or less known) genes

Plot sensitivity over 1-specificity

Hidden Markov models (HMMs) Given say a T in our input sequence, which state emitted it?

Hidden Markov models (HMMs) Hidden State We will distinguish between the observed parts of a problem and the hidden parts • In the Markov models we have considered previously, it is clear which state accounts for each part of the observed sequence In the model above (preceding slide), there are multiple states that could account for each part of the observed sequence – this is the hidden part of the problem – states are decoupled from sequence symbols

HMM-based homology searching HMM for ungapped alignment… Transition probabilities and Emission probabilities Gapped HMMs also have insertion and deletion states (next slide)

Model for alignment with insertions and deletions Profile HMM: m=match state, I-insert state, d=delete state; go from left to right. I and m states output amino acids; d states are ‘silent”. d1 d2 d3 d4 I0 I2 I3 I4 I1 m0 m1 m2 m3 m4 m5 Start End Model for alignment with insertions and deletions

HMM-based homology searching Most widely used HMM-based profile searching tools currently are SAM-T99 (Karplus et al., 1998) and HMMER2 (Eddy, 1998) formal probabilistic basis and consistent theory behind gap and insertion scores HMMs good for profile searches, bad for alignment (due to parametrisation of the models) HMMs are slow

Homology-derived Secondary Structure of Proteins (HSSP) Sander & Schneider, 1991 It’s all about trying to push “don’t know region” down…

The Parameters of an HMM

HMM for Eukaryotic Gene Finding Figure from A. Krogh, An Introduction to Hidden Markov Models for Biological Sequences

A Simple HMM

Three Important Questions How likely is a given sequence? the Forward algorithm What is the most probable “path” for generating a given sequence? the Viterbi algorithm • How can we learn the HMM parameters given a set of sequences? the Forward-Backward (Baum-Welch) algorithm

Three basic problems of HMMs Once we have an HMM, there are three problems of interest. (1) The Evaluation Problem Given an HMM and a sequence of observations, what is the probability that the observations are generated by the model? (2) The Decoding Problem Given a model and a sequence of observations, what is the most likely state sequence in the model that produced the observations? (3) The Learning Problem Given a model and a sequence of observations, how should we adjust the model parameters in order to maximize Evaluation problem can be used for isolated (word) recognition. Decoding problem is related to the continuous recognition as well as to the segmentation. Learning problem must be solved, if we want to train an HMM for the subsequent use of recognition tasks.

Three Important Questions How likely is a given sequence? Forward algorithm What is the most probable “path” for generating a given sequence? How can we learn the HMM parameters given a set of sequences?

How Likely is a Given Sequence? The probability that the path is taken and the sequence is generated: (assuming begin/end are the only silent states on path)

How Likely is a Given Sequence?

How Likely is a Given Sequence? The probability over all paths is: but the number of paths can be exponential in the length of the sequence... • the Forward algorithm enables us to compute this efficiently

How Likely is a Given Sequence: The Forward Algorithm Define fk(i) to be the probability of being in state k Having observed the first i characters of x we want to compute fN(L), the probability of being in the end state having observed all of x We can define this recursively

How Likely is a Given Sequence:

The forward algorithm Initialisation: f0(0) = 1 (start), fk(0) = 0 (other silent states k) Recursion: fl(i) = el(i)k fk(i-1)akl (emitting states), fl(i) = k fk(i)akl (silent states) Termination: Pr(x) = Pr(x1…xL) = f N(L) = k fk(L)akN probability that we’re in start state and have observed 0 characters from the sequence probability that we are in the end state and have observed the entire sequence

Forward algorithm example …

Three Important Questions How likely is a given sequence? What is the most probable “path” for generating a given sequence? Viterbi algorithm How can we learn the HMM parameters given a set of sequences?

Finding the Most Probable Path: The Viterbi Algorithm Define vk(i) to be the probability of the most probable path accounting for the first i characters of x and ending in state k We want to compute vN(L), the probability of the most probable path accounting for all of the sequence and ending in the end state Can be defined recursively Can use DP to find vN(L) efficiently

Finding the Most Probable Path: The Viterbi Algorithm Initialisation: v0(0) = 1 (start), vk(0) = 0 (non-silent states) Recursion for emitting states (i =1…L): Recursion for silent states:

Finding the Most Probable Path: The Viterbi Algorithm

Three Important Questions How likely is a given sequence? (clustering) What is the most probable “path” for generating a given sequence? (alignment) How can we learn the HMM parameters given a set of sequences? The Baum-Welch Algorithm

The Learning Problem Generally, the learning problem is how to adjust the HMM parameters, so that the given set of observations (called the training set) is represented by the model in the best way for the intended application. Thus it would be clear that the ``quantity'' we wish to optimize during the learning process can be different from application to application. In other words there may be several optimization criteria for learning, out of which a suitable one is selected depending on the application. There are two main optimization criteria found in the literature; Maximum Likelihood (ML) and Maximum Mutual Information (MMI).

The Learning Task Given: – a model – a set of sequences (the training set) Do: – find the most likely parameters to explain the training sequences The goal is find a model that generalizes well to sequences we haven’t seen before

Learning Parameters If we know the state path for each training sequence, learning the model parameters is simple – no hidden state during training – count how often each parameter is used – normalize/smooth to get probabilities – process just like it was for Markov chain models If we don’t know the path for each training sequence, how can we determine the counts? – key insight: estimate the counts by considering every path weighted by its probability

Learning Parameters: The Baum-Welch Algorithm An EM (expectation maximization) approach, a forward-backward algorithm Algorithm sketch: – initialize parameters of model – iterate until convergence Calculate the expected number of times each transition or emission is used Adjust the parameters to maximize the likelihood of these expected values Baum-Welch has as important feature that it always converges

The Expectation step

The Expectation step

The Expectation step

The Expectation step

The Expectation step • First, we need to know the probability of the i th symbol being produced by state q, given sequence x: Pr( i = k | x) Given this we can compute our expected counts for state transitions, character emissions

The Expectation step

The Backward Algorithm

The Expectation step

The Expectation step

The Expectation step

The Maximization step

The Maximization step

The Baum-Welch Algorithm Initialize parameters of model Iterate until convergence – calculate the expected number of times each transition or emission is used – adjust the parameters to maximize the likelihood of these expected values This algorithm will converge to a local maximum (in the likelihood of the data given the model) Usually in a fairly small number of iterations