Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein.

Similar presentations


Presentation on theme: "Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein."— Presentation transcript:

1 Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein

2 GPHMM CONSERVED Exon method 2 step GLASS n ROSETTA TWINSCAN which extends GENESCAN etc

3   Do not exploit all information in evolutionary pattern   Not easily extended to multiple genome sequences.

4 (EHMM) Composed of : 1.Hidden Markov Model (HMM) 2.Phylogenetic Tree A Probabilistic model of both Genome Structure and Evolution

5  Can handle any number of sequences in an alignment.  Can have properties of higher order HMM’s  Can handle variability in the sequences along the alignment  State of art evolutionary models can be incorporated later  Evolutionary events between different genomes are not treated independently

6 SCOPE Not to compete with the existing finding methods on performance but to illustrate the power of this approach. Relies on a pre produced alignment.

7 MARKOV CHAINS  A set of states  The transitions from one state to all other states, including itself, are governed by a probability distribution  First order Markov chain: the probabilities depend solely on the current state  n-th order Markov chain: n previous states

8 HIDDEN MARKOV MODEL 5 Components A set of states Matrix of transition probabilities ( A ) Set of alphabets ( C ) Set of emission distribution (e) Initial state distribution ( B )

9   A C A - - - A T G   T C A A C T A T C   A C A C - - A G C   A G A - - - A T C   A C C G - - A T C NO 1:1 correspondence between states and symbols Why the name Hidden ? Example of hidden Markov model

10 Components  State k  Emits symbols (observables) C  PROBABILISTIC MODEL Emission Distribution e Emission Distribution e Initial state distribution B Initial state distribution B Transition Probabilities A Transition Probabilities A

11 Path Π Different paths possible for same sequence Different paths possible for same sequence

12 In EHMM Emission distribution e specified by e specified by Evolutionary model Ek Evolutionary model Ek Phylogenetic tree T Phylogenetic tree T

13 PHYLOGENETIC TREES

14 Motivation : The problem of explaining the evolutionary history of today's species  In Phylogenetic trees  Leaves represent present day species  Character states of inner nodes are missing data  Interior nodes represent hypothesized ancestors  The length of the brances of a tree represent the evolutionary difference.

15 Evolution is often modeled by continuous markov chains Here evolution along the branches of the phylogenetic tree is modelled by Ek Transition probability Pk ( t ) For a branch length t P k ( t ) = exp ( t Q k ) Increasing the number of sequences is increasing the amount of evolutionary information. THE ALIGNMENT COLUMN CORRESPONDS TO THE STATE OF ELOVUTION AT THE LEAVES OF THE PHYLOGENETIC TREE

16 Phylogenetic tree of the entries of the 3 alignment columns THE PEOPABILITY OF GENERATING AN ALIGNMENT COLUMN IN STATE K EQUALS PROBABILITY OF OBSERVING A GIVEN CHARACTER PATTERN ON THE LEAVES OF T WHEN GIVEN E k

17  Codon based evolutionary model used to calculate emission probability of columns of A  Nucleotide Based evolutionary model used to calculate emission probability of column B  Emission probability of C is got from the equilibrium distribution of the the relevant evolutionary model

18 Parameter Estimation Parameters of HMM are estimated by a combination of Baum – Welch Baum – Welch Powell Powell Evolutionary model E divided into divided into E equ E evo

19 Initial State Distribution B can be estimated by Baum-Welch but It is generally set to 0.000 01 for all states except the intergenic. The expectation step of Baum-Welch estimates the number of nucleotides emitted from each state the expected number of state transitions Expected number of times a state is used. Powell another optimization method estimates E evo phylogenetic tree T Baum – Welch method is used to estimate E equ A

20 Therefore Likelihood of an alignment ( x ) given a parameterization of the EHMM Can be found by the equation Here we are summing over all possible paths This can be done in linear time by Dynamic Programming

21 EHMM is fully probabilistic and can be used to simulate data and find genes. EUKARYOTIC GENOME MODEL can be used to generate alignments. Reduced model produces only inner exons.

22 Results Benefits of modeling evolution with a EHMM using a data set of orthologous mouse/human gene pair using a data set of orthologous mouse/human gene pair Benefit will depend on divergence between sequences compared Key parameter for modelling the difference between exons and introns is the dN/dS ratio.

23

24 Moreover we see that Evolutionary model shows a distinct difference between the intergenic /intron state and the codon state

25 Evaluations were performed on both single and aligned sequences

26 Graphical Representation

27 Simple model used now not comparable to state of art methods Any number of aligned sequences can be handled

28 Extensions of the model GENESCAN can be extended into HMM Splice site finders Models of ribosome binding site and promoter regions Non – geometric length distributions of exons Pseudo higher order EHMM can be constructed. Idea of pair HMM to multiple sequences

29 Disadvantages in present model  Existing frame work does not model gaps but treats it as missing data.  Optimal data for EHMM is a multiple alignment of full – length genome.  Challenge in constructions of the alignment is to reduce the noise per signal ratio. BUT ……….. BUT ………..


Download ppt "Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein."

Similar presentations


Ads by Google