Download presentation
Presentation is loading. Please wait.
Published byCalvin Fleming Modified over 9 years ago
1
Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org
2
Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics, 2 nd edition by J Pevsner (© 2009 by Wiley-Blackwell). These images and materials may not be used without permission from the publisher (instructors, email me at pevsner@kennedykrieger.org). Visit http://www.bioinfbook.org Copyright notice
3
We meet 3 times a week, from 10:30 to 11:50 am: W4013 (lecture/discussion and occasional computer lab) Announcements: where/when we meet
4
Textbook: Bioinformatics and Functional Genomics (2 nd edition, Wiley-Blackwell, 2009) by J. Pevsner, ISBN 978-0-470-08585-1. We’ll cover chapters 13-20 in this course For those who don’t want to buy a copy, I will share pdfs of all the chapters with the class You can buy a copy at the website www.bioinfbook.org and get a nice discount ($80). It’s $80 at Amazon.com. The JHU bookstore may have copies. Welch Library may have copies Book’s website: www.bioinfbook.org Course website: http://www.bioinfbook.org/genomics.php or visit www.bioinfbook.org/chapter 13 Announcements: book, website
5
Outline of this course Introduction to genomics Viruses Bacteria and archaea (Egbert Hoiczyk) Eukaryotes The eukaryotic chromosome Fungi; yeast functional genomics (Jef Boeke) Protozoans (David Sullivan) Nematodes (Al Scott) Mosquitoes (George Dimopoulos) Rodents: mouse and rat Primates The human genome (Dave Valle) Human disease
6
Outline of today’s lecture Introduction: 5 perspectives, history of life Genome-sequencing projects: chronology Genome analysis: criteria, resequencing, metagenomics DNA sequencing technologies: Sanger, 454, Solexa Process of genome sequencing: centers, repositories Genome annotation: features, prokaryotes, eukaryotes
7
Five approaches to genomics As we survey the tree of life, consider these perspectives: Approach I: cataloguing genomic information Genome size; number of chromosomes; GC content; isochores; number of genes; repetitive DNA; unique features of each genome Approach V: Bioinformatics aspects Algorithms, databases, websites Approach IV: Human disease relevance Approach III: function; biological principles; evolution How genome size is regulated; polyploidization; birth and death of genes; neutral theory of evolution; positive and negative selection; speciation Approach II: cataloguing comparative genomic information Orthologs and paralogs; COGs; lateral gene transfer Page 519
8
Two projects for this course Option [1] Select a genome and describe it in detail. Option [2] Select a gene and describe it in detail. For each, follow the five approaches just outlined, and apply the principles that we learn in this course.
9
Reading: Webb Miller et al. (2004) Comparative genomics Introduction Lessons learned form comparative genomics What have we learned about genes by comparing genomic sequences? What have we learned about regulation? About 5% of the human genome is under purifying selection Positively regulated regions Mechanisms and history of mammalian evolution Nonuniformity of neutral evolutionary rates within species Nonuniformity of evolution along the branches of phylogeny Learning more form existing data Choice of species Choice of tools Future of comparative genomics
10
Levels of analysis in genomics leveltopicsdatabases DNAgenes, chromosomesGenBank RNAESTs, ncRNAUniGene, GEO proteinORFs, compositionUniProt complexesbinary, multimericBIND pathwaysCOGs, KEGG organelles organs individualsvariation and diseaseHapMap speciesspeciationTaxBrowser; SGD genusJAX mouse phylumFishBase kingdomTOL
11
Definitions of terms Genomics is the study of genomes (the DNA comprising an organism) using the tools of bioinformatics. Bioinformatics is the study protein, genes, and genomes using computer algorithms and databases. Systematics is the scientific study of the kinds and diversity of organisms and of any and all relationships among them. Classification is the ordering of organisms into groups on the basis of their relationships. The relationships may be evolutionary (phylogenetic) or may refer to similarities of phenotype (phenetic). Taxonomy is the theory and practice of classifying organisms.
12
Outline of today’s lecture Introduction: 5 perspectives, history of life: trees Genome-sequencing projects: chronology Genome analysis: criteria, resequencing, metagenomics DNA sequencing technologies: Sanger, 454, Solexa Process of genome sequencing: centers, repositories Genome annotation: features, prokaryotes, eukaryotes
13
Fig. 13.1 Page 521 Pace (2001) described a tree of life based on small subunit rRNA sequences. This tree shows the main three branches described by Woese and colleagues.
14
Ernst Haeckel (1834-1919), a supporter of Darwin, published a tree of life (1879) including Monera (formless clumps, later named bacteria). Introduction: Systematics Page 520
15
plants animals monera fungi protists protozoa invertebrates vertebrates mammals Five kingdom system (Haeckel, 1879) Page 516
16
Chatton (1937) distinguished prokaryotes (bacteria that lack nuclei) from eukaryotes (having nuclei). Whittaker (1969) and others described the five-kingdom system: animals, plants, protists, fungi, and monera. In the 1970s and 1980s, Carl Woese and colleagues described the archaea, thus forming a tree of life with three main branches: archaea, bacteria, eukaryotes. Introduction: Systematics Page 520
17
Whittaker RH (1969) New concepts of kingdoms or organisms. Evolutionary relations are better represented by new classifications than by the traditional two kingdoms. Science. 163(863):150-60. Whittaker (1969): The two-kingdom system as it might have appeared in the early 1900s PlantaeAnimalia
18
Whittaker RH (1969) New concepts of kingdoms or organisms. Evolutionary relations are better represented by new classifications than by the traditional two kingdoms. Science. 163(863):150-60. The Copeland four-kingdom system of the 1930s-1950s Monera Metaphyta Metazoa Protoctista Prokaryotic Eukaryotic Unicellular Multicellular
19
Whittaker RH (1969) New concepts of kingdoms or organisms. Evolutionary relations are better represented by new classifications than by the traditional two kingdoms. Science. 163(863):150-60. Whittaker (1969): The five-kingdom system PlantaeFungiAnimalia Monera Protista Levels: prokaryotic (Monera) eukaryotic unicellular eukaryotic multicellular
20
Historically, trees were generated primarily using characters provided by morphological data. Molecular sequence data are now commonly used, including sequences (such as small-subunit RNAs) that are highly conserved. Visit the European Small Subunit Ribosomal RNA database for 20,000 SSU rRNA sequences. Molecular sequences as basis of trees Page 523
21
Pace (2001) described a tree of life based on small subunit rRNA sequences. This tree shows the main three branches described by Woese and colleagues. It is the best currently accepted model of the tree of life. Fig. 13.1 Page 521
22
http://www.zo.utexas.edu/faculty/antisense/Download.html Tree of life from David Hillis’ lab (based on ~3000 rRNAs) animals plants fungi protists bacteria archaea you are here 10-10
23
http://www.zo.utexas.edu/faculty/antisense/Download.html you are here Tree of life from David Hillis’ lab (based on ~3000 rRNAs) 10-10
24
Ribosomal RNA Database Ribosomal Database Project http://rdp.cme.msu.edu/index.jsp Santos, S. R. and Ochman H. Identification and phylogenetic sorting of bacterial lineages with universally conserved genes and proteins. Environmental Microbiology. 2004. Jul(6)7:754-9. ►Download fusA (translation elongation factor 2 [EF-2]) ►Obtain DNA in the fasta format ►Align by ClustalW in MEGA ►Create a neighbor-joining tree Page 524 10-10
26
European Small Subunit Ribosomal RNA database (http://www.psb.ugent.be/rRNA/ssu/) 10-10
27
Rickettsia Treponema Mycobacterium Aquifex aeolicus Yersinia pestis Clostridium Mycoplasma Bac. antracis Neighbor-joining tree of ~150 fusA (GTPase) DNA sequences
28
Fig. 15.1 Page 603
30
Eukaryotes (Baldauf et al. 2000) Fig. 18.1 Page 730
31
Outline of today’s lecture Introduction: 5 perspectives, history of life: time lines Genome-sequencing projects: chronology Genome analysis: criteria, resequencing, metagenomics DNA sequencing technologies: Sanger, 454, Solexa Process of genome sequencing: centers, repositories Genome annotation: features, prokaryotes, eukaryotes
32
History of life on earth 4.55 BYAformation of earth (violent 100 MY period) 4.4-3.8 BYAlast ocean-evaporating impacts 3.9 BYAoldest dated rocks 3.8 BYAsun brightened to 70% of today’s luminosity Ammonia, methane, or carbon dioxide atmosphere. Earliest life: RNA, protein Source: Schopf J.W. (ed.), Life’s Origins (U. Calif. Press, 2002) Page 521
33
History of life on earth: two major eons Source: Schopf J.W. (ed.), Life’s Origins (U. Calif. Press, 2002) Precambrian eonPhanerozoic eon Extends from the formation of the planet to the appearance of fossils of hard- shelled animals 550 MYA From Cambrian explosion to the present 1 BYA234
34
43210 Billions of years ago (BYA) Origin of life Origin of eukaryotes insects Fungi/animal Plant/animal Hadean eon Archean eonProterozoic eon Phanerozoic eon Earliest fossils Page 522
35
10001000500 Insects Cambrian explosion Age of Reptiles ends Land plants Proterozoic eon Phanerozoic eon deuterostome/ protostome echinoderm/ chordate Millions of years ago (MYA) Page 522
36
Millions of years ago (MYA) Dinosaurs extinct; Mammalian radiation Human/chimp divergence 10010050 Mass extinction Page 522
37
Millions of years ago (MYA) Homo sapiens/ Chimp divergence Emergence of Homo erectus Earliest stone tools 10 1 05 Australepithecus Lucy Page 522
38
Homo erectus emerges in Africa Mitochondrial Eve 1,000,000100,0000500,000 Years ago Page 523
39
Years ago Neanderthal and Homo erectus disappear Emergence of anatomically modern H. sapiens 100,000 10,0000 50,000 Page 523
40
Years ago “Ice Man” from Alps Aristotle 10,000 1,0000 5,000 Earliest pyramids Page 523
41
Years ago algebracalculus Darwin, Mendel Gutenberg 1,000 1000 500 Page 523
42
Page 524 Today’s continents derive from earlier land masses (Laurasia, Gondwana), affecting evolution of species
43
Outline of today’s lecture Introduction: 5 perspectives, history of life: time lines Genome-sequencing projects: chronology Genome analysis: criteria, resequencing, metagenomics DNA sequencing technologies: Sanger, 454, Solexa Process of genome sequencing: centers, repositories Genome annotation: features, prokaryotes, eukaryotes
44
We will next summarize the major achievements in genome sequencing projects from a chronological perspective. Chronology of genome sequencing projects Page 525
45
1976: first viral genome Fiers et al. sequence bacteriophage MS2 (3,569 base pairs, Accession NC_001417). 1977: Sanger et al. sequence bacteriophage X174. This virus is 5,386 base pairs (encoding 11 genes). See accession J02482; NC_001422. Chronology of genome sequencing projects Page 527
46
Fig. 13.5 Page 528 Entrez nucleotide record for bacteriophage X174 (graphics display)
47
1981 Human mitochondrial genome 16,500 base pairs (encodes 13 proteins, 2 rRNA, 22 tRNA) Today (10/10), over 2200 mitochondrial genomes sequenced 1986 Chloroplast genome 156,000 base pairs (most are 120 kb to 200 kb) Chronology of genome sequencing projects Page 527
48
mitochondrion chloroplast Lack mitochondria (?)
49
http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/organelles.html Entrez Genomes organelle resource at NCBI 10-10
50
There are ~2500 eukaryotic organelles (10/10)
51
http://www-lecb.ncifcrf.gov/mitoDat/ MitoDat: resource for organelle genomes “This database is dedicated to the nuclear genes specifying the enzymes, structural proteins, and other proteins, many still not identified, involved in mitochondrial biogenesis and function. MitoDat highlights predominantly human nuclear- encoded mitochondrial proteins.” Not updated recently. 10-10
52
http://www.mitomap.org/ MitoMap: resource for organelle genomes 10-10
53
It is possible to map mutations in human mitochondrial DNA that are responsible for disease
54
1995: first genome of a free-living organism, the bacterium Haemophilus influenzae Chronology of genome sequencing projects Page 530
55
1995: genome of the bacterium Haemophilus influenzae is sequenced Fig. 13.7 Page 531
56
How to find information about a genome: NCBI All databases Genome follow link to Bacteria
57
Overview of bacterial complete genomes (2000) n=30
58
Overview of bacterial complete genomes (2010) n=3,330
59
Fig. 12.9 Page 411 You can find functional annotation through the COGs database (Clusters of Orthologous Genes)
60
Click the circle to access the genome sequence
61
You can find functional annotation through the COGs database (Clusters of Orthologous Genes) Entrez Genome view of H. influenzae (October 2009)
62
Click the circle to access the genome sequence Genes are color-coded according to the COGs scheme
63
1996: first eukaryotic genome The complete genome sequence of the budding yeast Saccharomyces cerevisiae was reported. We will describe this genome soon. Also in 1996, TIGR reported the sequence of the first archaeal genome, Methanococcus jannaschii. Chronology of genome sequencing projects Page 532
64
1996: a yeast genome is sequenced
65
To learn about a genome of interest, visit NCBI TaxBrowser Genome Projects
66
To learn about a genome of interest, follow theTaxBrowser Genome Projects links Size (in megabases), number of chromosomes are given here
68
To place the sequencing of the yeast genome in context, these are the eukaryotes…
69
Tree of eukaryotes (Baldauf et al. 2000) Fungi
70
1997: More bacteria and archaea Escherichia coli 4.6 megabases, 4200 proteins (38% of unknown function) 1998: first multicellular organism Nematode Caenorhabditis elegans 97 Mb; 19,000 genes. 1999: first human chromosome Chromosome 22 (49 Mb, 673 genes) Chronology of genome sequencing projects Page 532
71
See the article by Webb Miller et al. (2004), “Comparative genomics” for a discussion of annotation and analysis progress made since 1998
72
1999: Human chromosome 22 sequenced
73
49 MB 701 genes
74
2000: Fruitfly Drosophila melanogaster (13,000 genes) Plant Arabidopsis thaliana Human chromosome 21 2001: draft sequence of the human genome (public consortium and Celera Genomics) Chronology of genome sequencing projects Page 534
75
To explore human chromosome 21 at NCBI Find MapViewer Choose human Click chromosome 21
76
2000
77
2001 draft human genome sequence 2002 S. pombe (just 4,800 genes) 2004“finished” human genome 2007first individual human genome 20091000 Genomes Project
78
Outline of Monday’s lecture (Chapter 13) Introduction: 5 perspectives, history of life: time lines Genome-sequencing projects: chronology Genome analysis: criteria, resequencing, metagenomics DNA sequencing technologies: Sanger, 454, Solexa Process of genome sequencing: centers, repositories Genome annotation: features, prokaryotes, eukaryotes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.