From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009.

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

CS344 : Introduction to Artificial Intelligence
Hidden Markov Model in Biological Sequence Analysis – Part 2
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model.
Hidden Markov Models Eine Einführung.
Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
MNW2 course Introduction to Bioinformatics
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Markov Models Charles Yan Spring Markov Models.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Master’s course Bioinformatics Data Analysis and Tools
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
. Class 5: HMMs and Profile HMMs. Review of HMM u Hidden Markov Models l Probabilistic models of sequences u Consist of two parts: l Hidden states These.
Comparative ab initio prediction of gene structures using pair HMMs
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein.
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.
MNW2 course Introduction to Bioinformatics Lecture 22: Markov models Centre for Integrative Bioinformatics FEW/FALW
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Hidden Markov Models A first-order Hidden Markov Model is completely defined by: A set of states. An alphabet of symbols. A transition probability matrix.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
Multiple alignment using hidden Markove models November 21, 2001 Kim Hye Jin Intelligent Multimedia Lab
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
(H)MMs in gene prediction and similarity searches.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Markov Models Brian Jackson Rob Caldwell March 9, 2010.
Hidden Markov Models BMI/CS 576
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Hidden Markov Model Lecture #6
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Presentation transcript:

From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009

Objective 1 Create a pattern recognition tool that can locate, isolate, and characterize events in seismic data. Adapt hidden Markov model (HMM) search algorithms from genomics applications and extend them to work on multi-dimensional seismic data.

Outline 2 HMM Applications HMM Theory Generating Sequences Simple Markov Models Hidden Markov Models From Sequence Generation to Pattern Recognition Application to Seismic Data

Application Areas 3 HMMs can be used to build powerful, fast pattern recognition tools for noisy, incomplete data. Medical Informatics/Genomics hmmer – Sean Eddy, Howard Hughes Medical Institute Speech Recognition Intelligence

Prerequisites 4 We will describe our data as a sequence of symbols from a predefined alphabet For DNA, the alphabet consists of nucleic acids: {A, C, G, T} A sample DNA sequence: CGATATGCG We will reference the symbols in a sequence by their position, starting with 1; ie symbol 1 in the previous sequence is ‘C’

Sequence Generation 5 Goal: build a probabilistic state machine (model) that can generate any DNA sequence and characterize quantitatively the probability that the model generates any given sequence. Enter the Markov model (chain)

Markov Models 6 Primary Characteristic: The probability of any symbol in a sequence depends solely on the probability of the previous symbol. t AC = P(x i = A | x i-1 = C)

Markov Model State Machine 7 Each state with straight edges is an emitting state. B and E are special non-emitting states for beginning and ending sequences.

Markov Model State Machine 8 A transition probability is assigned to each arrow: t AC = P(x i = A | x i-1 = C) The probability of sequence x of length L is: P(x) = P(E | x L )P(x L | x L-1 )…P(x 1 | B)

Markov Model State Machine 9 Given the sequence, CGAGTC, and a table of transition probabilities, we can trace a path through the state machine to get the probability of the sequence.

Markov Model Example 10 Assume all transitions are equiprobable, ie,.25 = t AC = t AG = t AT = t CA = … P(CGAGTC) = (t EC )(t CG )(t GA )(t AG )(t GT )(t TC )(t CB ) =.00097

CpG Islands 11 The dinucleotide subsequence CG is relatively rare in the human genome and is usually associated with the beginning of coding DNA regions CpG islands are subsequences in which the CG pattern is common, and there are more C and G nucleotides in general.

CpG Island Model 12 We can define a new Markov model for CpG islands, in which the transition probabilities are adjusted to reflect the higher frequency of C and G nucleotides.

Combined Model 13 What we really want is a model that can emit normal DNA sequences and CpG islands, with low probability transitions between the two regions.

Hidden Markov Model 14 We call this type of Markov model ‘hidden’ because one cannot immediately determine which state emitted a given symbol.

Outline 15 HMM Applications HMM Theory Generating Sequences Simple Markov Models Hidden Markov Models From Sequence Generation to Pattern Recognition Application to Seismic Data

HMM Search 16 We can calculate the path through the model which generates a given sequence with the highest probability using the Viterbi algorithm. This allows us to identify portions in our sequence that have a high probability of being CpG islands.

HMM Generalizations 17 Remove direct link between states and symbols and allow any state to emit any symbol with a defined probability distribution. e k (b) = P(x i = b | π i = k) Sequence probability is now a joint probability of transition and emission probabilities: P(x, π) = Π e πi (x i ) t πi,πi+1

HMM Generalizations 18 Create HMM based on specific sequences, with (M)atch, (I)nsert and (D)elete states. Add B self-transition to be able to skip symbols. Allow for feedback from E to B to link recognized sequence portions together.

Outline 19 HMM Applications HMM Theory Generating Sequences Simple Markov Models Hidden Markov Models From Sequence Generation to Pattern Recognition Application to Seismic Data

Searching a Trace 20 When searching a trace, what is our alphabet? Trace amplitudes + noise, which is assumed to be normally distributed. What are our models? Library of scaled wavelets – one set of (M)atch, (I)nsert, and (D)elete states for each sample.

Searching a Trace 21 Emitting states emit a trace sample with a high probability when the trace sample amplitude is within one standard deviation of the scaled model sample amplitude. HMM search program will return a list of wavelet types, central times, and amplitudes.

Searching a Trace 22 HMM search correctly identified all wavelet components in the left trace, allowing us to synthesize the spiked trace. Central TimeFrequencyCoefficient.23s18hz.25s20hz.5.48s20hz s18hz1.5

Searching a Trace 23 What about noise?

Searching Across Traces 24 Taking the output wavelets from a 1D HMM, what is the alphabet when we search across traces? Moveouts. What are our models? Library of target trajectories.

First Arrival 1D HMM 25

First Arrival 2D HMM 26

First Arrival 2D HMM 27

To Be Continued… 28