Markov Chain Models BMI/CS 776

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Probabilistic sequence modeling II: Markov chains Haixu Tang School of Informatics.
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Models Eine Einführung.
Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
MNW2 course Introduction to Bioinformatics
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Markov Chains Lecture #5
Master’s course Bioinformatics Data Analysis and Tools Lecture 12: (Hidden) Markov models Centre for Integrative Bioinformatics.
Markov Models Charles Yan Spring Markov Models.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Master’s course Bioinformatics Data Analysis and Tools
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Algorithms for variable length Markov chain modeling Author: Gill Bejerano Presented by Xiangbin Qiu.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Markov Chain Models BMI/CS 576 Fall 2010.
MNW2 course Introduction to Bioinformatics Lecture 22: Markov models Centre for Integrative Bioinformatics FEW/FALW
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
10/29/20151 Gene Finding Project (Cont.) Charles Yan.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
MS Sequence Clustering
Interpolated Markov Models for Gene Finding BMI/CS 776 Mark Craven February 2002.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Markov Chain Models BMI/CS 576 Colin Dewey Fall 2015.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
Maik Friedel, Thomas Wilhelm, Jürgen Sühnel FLI-Jena, Germany Introduction: During the last 10 years, a large number of complete.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Introduction to Profile HMMs
Hidden Markov Models BMI/CS 576
What is a Hidden Markov Model?
Learning Sequence Motif Models Using Expectation Maximization (EM)
Hidden Markov Models - Training
Interpolated Markov Models for Gene Finding
Eukaryotic Gene Finding
Hidden Markov Models Part 2: Algorithms
Hidden Markov Models (HMMs)
CSE 5290: Algorithms for Bioinformatics Fall 2009
Hidden Markov Model Lecture #6
Presentation transcript:

Markov Chain Models BMI/CS 776 www.biostat.wisc.edu/~craven/776.html Mark Craven craven@biostat.wisc.edu February 2002 Illustrate why affine case requires 3 matrices BLAST & PSI-BLAST readings? Justify number of possible alignments

Announcements no office hours tomorrow interest in basic probability tutorial? (Wed, Thurs evening) HW #1 out; due March 11 3 free late days for semester homeworks docked 10 percentage points/day after late days used next reading: Salzberg et al., Microbial Gene Identification Using Interpolated Markov Models “Biomodule” class: Introduction to GCG Computing and Sequence Analysis in Unix and Xwindows Environments taught by Ann Palmenberg and Jean-Yves Sgro April 16 and 17 see http://www.virology.wisc.edu/acp/ for more details

Topics for the Next Few Weeks Markov chain models (1st order, higher order and inhomogenous models; parameter estimation; classification) interpolated Markov models (and back-off models) Expectation Maximization (EM) methods (applications to motif finding) Gibbs sampling methods (applications to motif finding) hidden Markov models (forward, backward and Baum-Welch algorithms; model topologies; applications to gene finding and protein family modeling)

Markov Chain Models .38 A G .16 .34 .12 C T transition probabilities begin .12 C T transition probabilities state transition

Markov Chain Models a Markov chain model is defined by a set of states some states emit symbols other states (e.g. the begin state) are silent a set of transitions with associated probabilities the transitions emanating from a given state define a distribution over the possible next states

Markov Chain Models given some sequence x of length L, we can ask how probable the sequence is given our model for any probabilistic model of sequences, we can write this probability as key property of a (1st order) Markov chain: the probability of each depends only on the value of

Markov Chain Models A G begin C T

Markov Chain Models can also have an end state; allows the model to represent a distribution over sequences of different lengths preferences for ending sequences with certain symbols begin end A T C G

Markov Chain Notation the transition parameters can be denoted by where similarly we can denote the probability of a sequence x as where represents the transition from the begin state

Example Application CpG islands CG dinucleotides are rarer in eukaryotic genomes than expected given the independent probabilities of C, G but the regions upstream of genes are richer in CG dinucleotides than elsewhere – CpG islands useful evidence for finding genes could predict CpG islands with Markov chains one to represent CpG islands one to represent the rest of the genome p represents the phosphodiester bond between the C and G

Estimating the Model Parameters given some data (e.g. a set of sequences from CpG islands), how can we determine the probability parameters of our model? one approach: maximum likelihood estimation given a set of data D set the parameters to maximize i.e. make the data D look likely under the model

Maximum Likelihood Estimation suppose we want to estimate the parameters Pr(a), Pr(c), Pr(g), Pr(t) and we’re given the sequences accgcgctta gcttagtgac tagccgttac then the maximum likelihood estimates are

Maximum Likelihood Estimation suppose instead we saw the following sequences gccgcgcttg gcttggtggc tggccgttgc then the maximum likelihood estimates are do we really want to set this to 0?

A Bayesian Approach instead of estimating parameters strictly from the data, we could start with some prior belief for each for example, we could use Laplace estimates pseudocount using Laplace estimates with the sequences gccgcgcttg gcttggtggc tggccgttgc

A Bayesian Approach a more general form: m-estimates prior probability of a number of “virtual” instances with m=8 and uniform priors gccgcgcttg gcttggtggc tggccgttgc

Markov Chains for Discrimination suppose we want to distinguish CpG islands from other sequence regions given sequences from CpG islands, and sequences from other regions, we can construct a model to represent CpG islands a null model to represent the other regions can then score a test sequence by:

Markov Chains for Discrimination parameters estimated for CpG and null models + A C G T .18 .27 .43 .12 .17 .37 .19 .16 .34 .38 .08 .36 - A C G T .30 .21 .28 .32 .08 .25 .24 .18 .29

Markov Chains for Discrimination light bars represent negative sequences dark bars represent positive sequences the actual figure here is not from a CpG island discrimination task, however Figure from A. Krogh, “An Introduction to Hidden Markov Models for Biological Sequences” in Computational Methods in Molecular Biology, Salzberg et al. editors, 1998.

Markov Chains for Discrimination why use Bayes’ rule tells us if we’re not taking into account priors, then just need to compare and

Higher Order Markov Chains the Markov property specifies that the probability of a state depends only on the probability of the previous state but we can build more “memory” into our states by using a higher order Markov model in an nth order Markov model

Higher Order Markov Chains an nth order Markov chain over some alphabet is equivalent to a first order Markov chain over the alphabet of n-tuples example: a 2nd order Markov model for DNA can be treated as a 1st order Markov model over alphabet AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT

A Fifth Order Markov Chain AAAAA CTACA Pr(A | GCTAC) CTACC begin CTACG CTACT Pr(C | GCTAC) Pr(GCTAC) GCTAC TTTTT

A Fifth Order Markov Chain AAAAA CTACA Pr(A | GCTAC) CTACC begin CTACG CTACT Pr(GCTAC) GCTAC

Inhomogenous Markov Chains in the Markov chain models we have considered so far, the probabilities do not depend on where we are in a given sequence in an inhomogeneous Markov model, we can have different distributions at different positions in the sequence consider modeling codons in protein coding regions

Inhomogeneous Markov Chains T C G A T C G A T C G begin pos 1 pos 2 pos 3