Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Haussler Howard Hughes Medical Institute University of California, Santa Cruz Assembly, Comparison, and Annotation of Mammalian Genomes.

Similar presentations


Presentation on theme: "David Haussler Howard Hughes Medical Institute University of California, Santa Cruz Assembly, Comparison, and Annotation of Mammalian Genomes."— Presentation transcript:

1 David Haussler Howard Hughes Medical Institute University of California, Santa Cruz Assembly, Comparison, and Annotation of Mammalian Genomes

2 Bioinformatics of mammalian genomes Sequence Assembly Genome Browsers: new computational microscopes Computing Evolution’s Path: key to understanding function

3 Assembling the human genome GigAssembler (Kent) GigAssembler (Kent) –Built first draft of the human genome from lower- level contigs produced by Phrap( P. Green) Celera Assembler (Myers/Sutton) Celera Assembler (Myers/Sutton) –First mammalian whole genome shotgun assembler Outgoing UCSC internet traffic (green) for year 2000. Main peak is activity on July 7, 2000 when human sequence was first posted on the web

4 Assembling other mammalian genomes Arachne (Jaffe/Batzolou, Lander group at MIT) Arachne (Jaffe/Batzolou, Lander group at MIT) –Built first draft of mouse genome, February 2002 –Mouse also assembled by Phusion assembler (Mullikin, Sanger Centre) Atlas (Havlak/Chen/Durbin, Gibbs group at Baylor) Atlas (Havlak/Chen/Durbin, Gibbs group at Baylor) –Built first draft of rat genome, November 2002

5 Browsers as web-based genome microscopes Ensembl Browser (Birney et al.) Ensembl Browser (Birney et al.) MapViewer (NCBI Mapviewer team) MapViewer (NCBI Mapviewer team) UCSC Genome Browser (Kent et al.) http://genome.ucsc.edu, currently getting more than 140,000 page requests per day UCSC Genome Browser (Kent et al.) http://genome.ucsc.edu, currently getting more than 140,000 page requests per day http://genome.ucsc.edu

6 Browsers take you from early maps of the genome...

7 ... to a multi-resolution view...

8 ... at the gene cluster level...

9 ... the single gene level...

10 ... the single exon level...

11 ... and at the single base level caggcggactcagtggatctggccagctgtgacttgacaag caggcggactcagtggatctagccagctgtgacttgacaag

12 linking to functional information In situ image from I. Dragatsis et al. 1998

13 Goal: the browser as a continuously-tuned engine for discovery Multiple streams of high-throughput genomics data generated asynchronously Multiple streams of high-throughput genomics data generated asynchronously Data fed into nightly updates of browser database, analysis and display Data fed into nightly updates of browser database, analysis and display Browser becomes a new kind of microscope scanning the genome at ever greater detail, dimension, and depth Browser becomes a new kind of microscope scanning the genome at ever greater detail, dimension, and depth

14 Using evolution to find genes and other functional elements Mouse conservation pattern in the IGFALS gene on human chr. 16 and a known transcription factor binding site R. Weber, L. Elnitski et. al.

15 At least half of the human genome consists of relics of retrotransposons

16 Ancestral retrotransposons Retrotransposon relics from our common ancestor with mouse and other placental mammals Retrotransposon relics from our common ancestor with mouse and other placental mammals They cover 22% of the human genome They cover 22% of the human genome “AR” sites can be used to study neutral evolution: mutation without selection “AR” sites can be used to study neutral evolution: mutation without selection “AR” sites are similar to “4D” sites in genes (four-fold degenerate sites in codons) “AR” sites are similar to “4D” sites in genes (four-fold degenerate sites in codons)

17 Estimated rate of neutral substitution from AR and 4D sites co-varies along the chromosomes R. Hardison, K. Roskin, S, Yang, A. Smit, et al.

18 By comparison to local neutral substitution rates, it appears that about 5% of the human genome may be under purifying selection. K. Roskin, R. Weber, F. Chiaromonte

19 More species increases power to detect conserved elements BROWSER SNAPSHOT Human Chimp Baboon Cat Dog Pig Cow Rat Mouse Chicken Zebrafish Fugu Tetraodon Data from Eric Green at NGHRI, alignments by Webb Miller About 4% of CFTR region is under purifying selection

20 Models of molecular evolution Branch length equals average number of substitutions per site

21 Models of molecular evolution Branch length equals average number of substitutions per site A

22 Models of molecular evolution Branch length equals average number of substitutions per site A A G

23 Models of molecular evolution Branch length equals average number of substitutions per site A A G A T G G

24 Models of molecular evolution Branch length equals average number of substitutions per site A A G A T G G T T

25 Models of molecular evolution Branch length equals average number of substitutions per site A A G A T G G T T

26 Continuous-time Markov models of molecular evolution can be used to calculate p-values for conservation Conditional probability distribution on each branch has the form P = e Qt where t is the time and Q is a 4 by 4 rate matrix. Parameterizations of Q: JC, …, HKY, REV, UNR

27 Calculation of p-values p-value is probability of getting a given parsimony score or better, using a cont. time Markov model of evolution p-values are calculated recursively for the two subtrees, for all possible values of parsimony score and ancestral bases for each subtree data for subtrees is combines to produce p-value at root Method developed by Mathieu Blanchette and Martin Tompa

28 Calculation of p-values p-value is probability of getting a given parsimony score or better, using a cont. time Markov model of evolution p-values are calculated recursively for the two subtrees, for all possible values of parsimony score and ancestral bases for each subtree data for subtrees is combines to produce p-value at root Method developed by Mathieu Blanchette and Martin Tompa

29 Calculation of p-values p-value is probability of getting a given parsimony score or better, using a cont. time Markov model of evolution p-values are calculated recursively for the two subtrees, for all possible values of parsimony score and ancestral bases for each subtree data for subtrees is combines to produce p-value at root Method developed by Mathieu Blanchette and Martin Tompa

30 Examples of conserved regions Analysis of CFTR region by Mathieu Blanchette

31 Regulatory modules Mathieu Blanchette

32 Conserved RNA structure in a 3’ UTR Mathieu Blanchette

33 Intronic RNA structural element 73kb to ST7 1 st exon 73kb to ST7 2 nd exon ~90 bp conserved stem Mathieu Blanchette

34 Modeling different modes of substitution We want to pay attention to how elements are conserved, not just that they are conserved

35 Context matters Context matters substitution rate matrix for non-coding dinucleotides Adam Siepel

36 Dinucleotide and trinucleotide models fit substitution data from neutral regions much better Improvement in log likelihood on AR sites for higher order models of base substitution Adam Siepel

37 Method also produces improved models of codon evolution Adam Siepel

38 Phylogenetic HMMs TAATGGTA…CCAGTTA…GCAGAGT… CCATGGTT…CCCGTAG…CCAGAGT… TAATGGTA…CCGGTTA…ACAGAGT… TTATGGTA…CCTGTTA…ACAGAGT… CGATGGTG…CCGGTCG…ACAGAGC… CTATGGTC…CCTGTTA…TCAGAGC… GTATGGTC…CCTGTCG…TCAGAGC… CCATGGTT…CCCGTAG…CCAGAGT… human baboon mouse dog cat cow pig chicken Adam Siepel

39 Human splice variants of ZNF278 conserved in mouse Chuck Sugnet Comparative cDNA analysis finds alternatively spliced genes

40 Molecular evolution is more than base substitutions Insertions Insertions Deletions Deletions Duplications Duplications Inversions Inversions Rearrangements Rearrangements

41 Genome-wide human-mouse alignments reveal a host of multibase evolutionary events A 15,000 base inversion on human chromosome 7 containing two genes J. Kent, W. Miller, R. Baertsch

42 Hot spots for rearrangements? At finer resolution, many thousands of syntenic blocks between human and mouse are found, and short blocks are clustered in clumps J. Kent, W. Miller, R. Baertsch

43 Grand challenge of human molecular evolution Reconstruct the evolutionary history of each base in the human genome

44 Credits Thanks to Jim Kent, Terry Furey, Mathieu Blanchette, Adam Siepel, Chuck Sugnet, Ryan Weber, Krishna Roskin, Mark Diekhans, Robert Baertsch, Matt Schwartz, Angie Hinrichs, Donna Karolchik, Heather Trumbower, Yontao Lu, Fan Hsu, Daryl Thomas, Jorge Garcia, Patrick Gavin and Paul Tatarsky at UCSC Thanks to Jim Kent, Terry Furey, Mathieu Blanchette, Adam Siepel, Chuck Sugnet, Ryan Weber, Krishna Roskin, Mark Diekhans, Robert Baertsch, Matt Schwartz, Angie Hinrichs, Donna Karolchik, Heather Trumbower, Yontao Lu, Fan Hsu, Daryl Thomas, Jorge Garcia, Patrick Gavin and Paul Tatarsky at UCSC Francis Collins, Bob Waterston, Eric Lander, Richard Gibbs, Eric Green, Elliot Margulies, David Kulp, Alan Williams, Ray Wheeler, Webb Miller, Ross Hardison, Scott Schwartz, Francesca Chiaromonte, Thomas Pringle, Greg Schuler, Deanna Church, Steve Sherry, Ewan Birney, Michelle Clamp, David Jaffe, Asif Chinwalla, Jim Mullikin,Tim Hubbard, Arian Smit, Nick Goldman, Barbara Trask, Ian Dunham, Sean Eddy, Evan Eichler, David Cox, Carol Bult, and many other outside collaborators Francis Collins, Bob Waterston, Eric Lander, Richard Gibbs, Eric Green, Elliot Margulies, David Kulp, Alan Williams, Ray Wheeler, Webb Miller, Ross Hardison, Scott Schwartz, Francesca Chiaromonte, Thomas Pringle, Greg Schuler, Deanna Church, Steve Sherry, Ewan Birney, Michelle Clamp, David Jaffe, Asif Chinwalla, Jim Mullikin,Tim Hubbard, Arian Smit, Nick Goldman, Barbara Trask, Ian Dunham, Sean Eddy, Evan Eichler, David Cox, Carol Bult, and many other outside collaborators


Download ppt "David Haussler Howard Hughes Medical Institute University of California, Santa Cruz Assembly, Comparison, and Annotation of Mammalian Genomes."

Similar presentations


Ads by Google