1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

Probabilistic sequence modeling II: Markov chains Haixu Tang School of Informatics.
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Hidden Markov Model.
Hidden Markov Models Chapter 11. CG “islands” The dinucleotide “CG” is rare –C in a “CG” often gets “methylated” and the resulting C then mutates to T.
Edward Marcotte/Univ. of Texas/BIO337/Spring 2014
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
1 DNA Analysis Amir Golnabi ENGS 112 Spring 2008.
Hidden Markov Models Modified from:
Hidden Markov Models Theory By Johan Walters (SR 2003)
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Markov Chains Lecture #5
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Markov Models Charles Yan Spring Markov Models.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
More about Markov model.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Hidden Markov Models.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
= stochastic, generative models
CS5263 Bioinformatics Lecture 10: Markov Chain and Hidden Markov Models.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
1 Hidden Markov Models (HMMs). 2 Definition Hidden Markov Model is a statistical model where the system being modeled is assumed to be a Markov process.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Markov Models Brian Jackson Rob Caldwell March 9, 2010.
CSE182-L10 HMM applications.
Hidden Markov Models BMI/CS 576
Hidden Markov Models (HMMs)
Hidden Markov Models (HMMs)
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Algorithms of POS Tagging
CSE 5290: Algorithms for Bioinformatics Fall 2009
Hidden Markov Model Lecture #6
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Presentation transcript:

1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008

2 What we saw in part I: 1. Markov Chain 2. DNA and Modeling 3. Markovian Models for DNA Sequences 4. HMM for DNA Sequences Part II: 1. DNA Methylation and CpG islands 2. Markov Chain Model 3. Hidden Markov Model 4. Finding the State Path 5. Parameter Estimation for HMMs 6. References

3 CG base pair in the human genome Modification of Cytosine by methylation High chance of mutation of methyl-C into a T CG dinucleotides are rarer in the genome Methylation is suppressed in short stretches of the genome such as around the promoters or start regions of many genes.  more CG dinucleotides: CpG islands "p“: "C" and "G" are connected by a phosphodiester bond Two questions: – Given a short stretch of genomic sequence, how would we decide whether it comes from a CpG island? – Given a long piece of sequence, how would we find the CpG islands in it? 1.DNA Methylation and CpG islands

4 Markov Chain: Transition probabilities: Probability of sequences: Beginning and end of sequences: > Silent states 2.Given a short stretch of genomic sequence, how would we decide whether it comes from a CpG island?

5 Two Markov chain models: 1.CpG islands (the ‘+’ model) 2.Remainder of the sequence (the ‘-’ model) Table of frequencies: Each row sums to 1. Tables are asymmetric. Transition probabilities using Maximum likelihood estimator for CpG islands: +ACGT A C G T ACGT A C G T

6 x is the sequence β is the log likelihood ratio is corresponding transition probabilities - The histogram of the length-normalized scores,S(x), for all the sequences(~60,000 nucleotides) To use this model for discrimination: Log-odds ratio: β ACGT A C G T

7 Single model for the entire sequence that incorporates both Markov chains: HMM Similar transition probabilities within each set Small chance of switching between + and – regions There is no one-to-one correspondence between states and symbols. 3. Given a long piece of sequence, how would we find the CpG islands in it?

8 Sequence of states (path Π): Transition probabilities: – State sequence is hidden in HMM Sequence of symbols: emission probabilities: – Prob. b is seen in state s – emission prob. of CpG islands: 0 or 1 A sequence can be generated from a HMM as follows: – A state is chosen according to – In an observation is emitted according to – A new state is chosen according to – and so forth…: A sequence of random observations – P(x)= prob. X was generated by the model – Joint probability of an observed seq x and state seq :

9 Example: Prob. of sequence ‘CGCG’ being emitted by the state sequence (C+,G-,C-,G+): Not very useful in practice because the path is not known → Path estimation: By finding the most likely one – Viterbi Algorithm – Forward or Backward Algorithm Example: CpG model: Generating symbol sequence CGCG – State sequences: (C+,G+,C+,G+),(C-,G-,C-,G-), (C+,G-,C-,G+) – (C+,G-,C-,G+): switching back and forth between + and – – (C-,G-,C-,G-): small prob. of CG in ‘-’ group – (C+,G+,C+,G+): Best option!

10 5.Parameter Estimation for HMMs: HMM models: 1.Design the structure: states and their connections 2.Design parameter values: transition and emission probabilities, and Baum-Welch And Viterbi training

11 7.References Bandyopadhyay, Sanghamitra. Gene Identification: Classical and Computational Ingelligence Approach. 38 vols. IEEE, JAN2008. Durbin, R., S. Eddy, and A. Krogh. Biological Sequence Analysis. Cambridge: Cambridge University, Koski, Timo. Hidden Markov Models for Bioinformatics. Sweden: Kluwer Academic, Birney, E. "Hidden Markov models in biological sequence analysis". July 2001: Haussler, David. David Kulp, Martin Reese Frank Eeckman "A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA". Boufounos, Petros, Sameh El-Difrawy, Dan Ehrlich. "HIDDEN MARKOV MODELS FOR DNA SEQUENCING".