Download presentation
Presentation is loading. Please wait.
1
Finding Genes based on Comparative Genomics Robin Raffard November, 30 th 2004 CS 374
2
References Main References Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. McAuliffe J., Pachter L., Jordan M. 2004. Computational identification of evolutionarily conserved exons. Siepel A., Haussler D. 2004. Additional references Phylogenetic shadowing if primate sequences to find functional regions of the human genome. Boffelli D., McAuliffe J., Ovcharenko D., Lewis K., Ovcharenko I., Pachter L., Rubin E. A hidden markov model approach to variation among sites in rate evolution. Felsenstein J., Churchill G. Statistics for Biology and health. Ewens W., Grant G.
3
DNA consists of genes (functional sequences) separated by intergenics (nonfunctional sequences). Problem formulation Gene 1Gene 2Gene 3 DNA Intergenics ATCATTACGCGGCTTAGCCCTTATAGCGATACGATGACAGATGACAA
4
DNA consists of genes (functional sequences) separated by intergenics (nonfunctional sequences). Problem formulation Gene 1Gene 2Gene 3 DNA
5
DNA consists of genes (functional sequences) separated by intergenics (nonfunctional sequences). Problem formulation Gene 1Gene 2Gene 3 DNA
6
DNA consists of genes (functional sequences) separated by intergenics (nonfunctional sequences). Problem: Find genes using comparative genomics Key: Exons are conserved along evolution Problem formulation Gene 1Gene 2Gene 3 DNA
7
In Practice >human AGTGAGACACGACGAGCCTACTATCAGGACGAGAGCAGGAGAGTGAT GATGAGTAGCGCACAGCGACGATCATCACGAGAGAGTAAGAAGCAGTG ATGATGTAGAGCGACGAGAGCACAGCGGCGACTACTACTAGG >mouse AGTGTGTCTCGTCGTGCCTACTTTCAGGACGAGAGCAGGTGAGTGTTG ATGAGTTGCGCTCTGCGACGTTCATCTCGAGTGAGTTAGAAAGTGAAG GTATAACACAAGGTGTGAAGGCAGTGATGATGTAGAGCGACGAGAGCA CAGCGGCGGGATGATATATCTAGGAGGATGCCCAATTTTTTTTT >platypus CTCTGCGGCGTTCGTCTCGGGTGGGTTGGGGGGTGGGGGTGTGGCG CAAGGTGTGAAGCACGACGACGATCTACGACGAGCGAGTGATGAGAG TGATGAGCGACGACGAGCACTAGAAGCGACGACTACTATCGACGAGCA GCCGAGATGATGATGAAAGAGAGAGAA
8
2 Questions 1 st question: Which genomes to compare: human/mouse or human/primates ? 2 nd question: How to extract genes from this comparison ?
9
Outline Human/Mouse vs Human/Primate –Advantages of Human/Mouse –Advantages of Human/Primate –Conclusion Gene Finding –Phylogenic tree –Hidden Markov Chain –Hidden Markov Phylogeny Contributions of the 2 papers
10
Functional sequences in Human/Mouse/Primates DNA sequence % of similitude
11
Advantage of Human/Mouse Easy to figure out what the functional sequences are
12
Disadvantage of Human/Mouse Some human genes are not present in the mouse genome. Therefore impossible to extract them from a Mouse/Human comparison Human Mouse
13
Human/Primates
14
Phylogenetic shadowing
15
Phylogenetic shadowing on real data DNA sequence Likelihood of mutation (log)
16
Motivating Example: Gene apo(a) Plasma protein Important cardiovascular disease risk predictor Absent Present
17
Phylogenetic shadowing of apo(a) DNA sequence Likelihood of mutation (log)
18
So Human/Mouse or Human/Primate ? Old genes: Human/Mouse (Non coding sequences are strongly different) New genes: Human/Primate (Straightforward alignment of coding sequences)
19
Outline Human/Mouse vs Human/Primate –Advantages of Human/Mouse –Advantages of Human/Primate –Conclusion Gene Finding –Phylogenic tree –Hidden Markov Chain –Hidden Markov Phylogeny Contributions of the 2 papers
20
Naive way of extracting genes 1.Is not flexible/probabilistic. 2.Does not respect gene structure. Drawbacks:
21
1 st step: Phylogenetic tree Given a nucleotide, is it functional or not ? Species Nucleotide 1Nucleotide 2
22
Primate phylogeny A T T G A A
23
A A T A G A A A C A Observed nucleotides Which nucleotide ? Which rate α ?
24
Algorithm Given observed nucleotide, find the most likely rate α. Mathematically, Therefore,
25
Phylogenetic tree: Results Drawback: No biological model built in
26
Gene structure A gene finder should satisfy: Promoter region about 50 base upstream of gene TATA: start of transcription 5’ untranslated region 3’ untranslated region
27
Gene Model Exon Intron TATA S1 S6 S5 S4 S3 S2
28
Hidden Markov Chain Model Composed of: 1.Sequence of states which are unobservable: S1, S2, S3, …, Sn. Si = exon, intron. Jump from Si to Si+1 follows a Markov chain: P(Si | Si+1) 2.Sequence of (sequence of) letters O1, O2, O3, …, On, which are emitted by the states ( according to P(Oi | Si ) ) and which are observed. S1 O1 S2 O2 S3 O3 S4 O4 S6 O6 S5 O5 S7 O7 = ACGTACG… P(S4 | S5) P(O1 | S1)
29
Viterbi Algorithm Given a sequence of letters O1, … On (observed), find the sequence of states S1,…,Sn (unobservable). Mathematically, find 2 steps: 1. Compute max Prob(S,O) via dynamic programming: max Prob(S1,…,Si+1,O) = f ( max Prob(S1,…,Si,O) ) 2.Find a sequence of state which achieves the optimal: Si = argmax max Prob(S1,…,Si,O).
30
Generalized hidden Markov phylogeny Cumulates the 2 concepts: Phylogenetic tree Hidden Markov chain + = Generalized hidden Markov phylogeny
31
Global Method Get a series of DNA sequences Align them Build the Generalized Hidden Markov Model Train the parameters on sample genes Find the hidden states: Si The coding sequences are the exons
32
Contributions of the 1 st paper 1 st to implement the Hidden Markov Phylogeny on the Primate/Human phylogeny. Require only 5 primate species. Able to sequence the apo(a) gene. Gene Finders
33
Contributions of the 2 nd paper Implement sophisticated Hidden Markov Phylogeny on Human/Mouse phylogeny 1.Context-dependent phylogenetic models ( High-order Markov chain: Emission of one state also depends of the neighboring states). More computationally expensive but better. 2.Explicit modeling of conserved non-coding sequences. 3.Modeling of insertions and deletions.
34
Results of the 2 nd paper Gene Finders
35
Conclusion Genes found based on genomics comparison. Mouse/Human for old genes Primate/Human for recent genes In any cases, same tool for extracting coding sequences: Hidden Markov Phylogeny Future: Improve Markov model, sequence more genomes.
36
Thank you! Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.