Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009.

Similar presentations


Presentation on theme: "From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009."— Presentation transcript:

1 From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009

2 Objective 1 Create a pattern recognition tool that can locate, isolate, and characterize events in seismic data. Adapt hidden Markov model (HMM) search algorithms from genomics applications and extend them to work on multi-dimensional seismic data.

3 Outline 2 HMM Applications HMM Theory Generating Sequences Simple Markov Models Hidden Markov Models From Sequence Generation to Pattern Recognition Application to Seismic Data

4 Application Areas 3 HMMs can be used to build powerful, fast pattern recognition tools for noisy, incomplete data. Medical Informatics/Genomics hmmer – Sean Eddy, Howard Hughes Medical Institute Speech Recognition Intelligence

5 Prerequisites 4 We will describe our data as a sequence of symbols from a predefined alphabet For DNA, the alphabet consists of nucleic acids: {A, C, G, T} A sample DNA sequence: CGATATGCG We will reference the symbols in a sequence by their position, starting with 1; ie symbol 1 in the previous sequence is ‘C’

6 Sequence Generation 5 Goal: build a probabilistic state machine (model) that can generate any DNA sequence and characterize quantitatively the probability that the model generates any given sequence. Enter the Markov model (chain)

7 Markov Models 6 Primary Characteristic: The probability of any symbol in a sequence depends solely on the probability of the previous symbol. t AC = P(x i = A | x i-1 = C)

8 Markov Model State Machine 7 Each state with straight edges is an emitting state. B and E are special non-emitting states for beginning and ending sequences.

9 Markov Model State Machine 8 A transition probability is assigned to each arrow: t AC = P(x i = A | x i-1 = C) The probability of sequence x of length L is: P(x) = P(E | x L )P(x L | x L-1 )…P(x 1 | B)

10 Markov Model State Machine 9 Given the sequence, CGAGTC, and a table of transition probabilities, we can trace a path through the state machine to get the probability of the sequence.

11 Markov Model Example 10 Assume all transitions are equiprobable, ie,.25 = t AC = t AG = t AT = t CA = … P(CGAGTC) = (t EC )(t CG )(t GA )(t AG )(t GT )(t TC )(t CB ) =.00097

12 CpG Islands 11 The dinucleotide subsequence CG is relatively rare in the human genome and is usually associated with the beginning of coding DNA regions CpG islands are subsequences in which the CG pattern is common, and there are more C and G nucleotides in general.

13 CpG Island Model 12 We can define a new Markov model for CpG islands, in which the transition probabilities are adjusted to reflect the higher frequency of C and G nucleotides.

14 Combined Model 13 What we really want is a model that can emit normal DNA sequences and CpG islands, with low probability transitions between the two regions.

15 Hidden Markov Model 14 We call this type of Markov model ‘hidden’ because one cannot immediately determine which state emitted a given symbol.

16 Outline 15 HMM Applications HMM Theory Generating Sequences Simple Markov Models Hidden Markov Models From Sequence Generation to Pattern Recognition Application to Seismic Data

17 HMM Search 16 We can calculate the path through the model which generates a given sequence with the highest probability using the Viterbi algorithm. This allows us to identify portions in our sequence that have a high probability of being CpG islands.

18 HMM Generalizations 17 Remove direct link between states and symbols and allow any state to emit any symbol with a defined probability distribution. e k (b) = P(x i = b | π i = k) Sequence probability is now a joint probability of transition and emission probabilities: P(x, π) = Π e πi (x i ) t πi,πi+1

19 HMM Generalizations 18 Create HMM based on specific sequences, with (M)atch, (I)nsert and (D)elete states. Add B self-transition to be able to skip symbols. Allow for feedback from E to B to link recognized sequence portions together.

20 Outline 19 HMM Applications HMM Theory Generating Sequences Simple Markov Models Hidden Markov Models From Sequence Generation to Pattern Recognition Application to Seismic Data

21 Searching a Trace 20 When searching a trace, what is our alphabet? Trace amplitudes + noise, which is assumed to be normally distributed. What are our models? Library of scaled wavelets – one set of (M)atch, (I)nsert, and (D)elete states for each sample.

22 Searching a Trace 21 Emitting states emit a trace sample with a high probability when the trace sample amplitude is within one standard deviation of the scaled model sample amplitude. HMM search program will return a list of wavelet types, central times, and amplitudes.

23 Searching a Trace 22 HMM search correctly identified all wavelet components in the left trace, allowing us to synthesize the spiked trace. Central TimeFrequencyCoefficient.23s18hz.25s20hz.5.48s20hz-.4.498s18hz1.5

24 Searching a Trace 23 What about noise?

25 Searching Across Traces 24 Taking the output wavelets from a 1D HMM, what is the alphabet when we search across traces? Moveouts. What are our models? Library of target trajectories.

26 First Arrival 1D HMM 25

27 First Arrival 2D HMM 26

28 First Arrival 2D HMM 27

29 To Be Continued… 28


Download ppt "From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009."

Similar presentations


Ads by Google