Completed Genomes: Viruses and Bacteria Monday, October 20, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner

Slides:



Advertisements
Similar presentations
The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
Advertisements

Genomics, Genetics and Biochemistry
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Biotechnology Chapter 11.
Table of Contents Section 1 Viral Structure and Replication
Introduction to Genomics and the Tree of Life Friday, October 21, 2011 (part 1) Monday, October 24, 2011 (lab 1) Wednesday, October 26, 2011 (part 2) Genomics.
Unit 1: DNA and the Genome Key area 8: Genomic sequencing.
living organisms According to Presence of cell The non- cellular organism The cellular organisms According to Type the Eukaryotes the prokaryotes human.
CHAPTER 16 Viral Diversity.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
VIRUS PROPERTIES Infectious – must be transmissible horizontally Intracellular – require living cells RNA or DNA genome, not both* Most all have protein.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Viral Genomics Friday, October 28, 2011 Genomics J. Pevsner
An Introduction to the Viruses
Synthetic biology Genome engineering Chris Yellman, U. Texas CSSB.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Genome of Drosophila species Olga Dolgova UAB Barcelona, 2008.
Introduction to Genomics
Sonia Abdelhak Institut Pasteur Tunis Ahmed Rebaï Centre of Biotechnology Sfax Fredj Tekaia Institut Pasteur Paris Genomes Databases and Open Access Bibliographic.
Copyright © by Holt, Rinehart and Winston. All rights reserved. ResourcesChapter menu Viruses Chapter 24 Table of Contents Section 1 Viral Structure and.
Introduction to Genomics and the Tree of Life Chapter 13.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics J. Pevsner
E 1.3 Describe the difficulties in the classification of viruses
Classification and Systematics Tracing phylogeny is one of the main goals of systematics, the study of biological diversity in an evolutionary context.
Tentative definition of bioinformatics Bioinformatics, often also called genomics, computational genomics, or computational biology, is a new interdisciplinary.
This presentation was originally prepared by C. William Birky, Jr. Department of Ecology and Evolutionary Biology The University of Arizona It may be used.
The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME: J. Pevsner
IGEM 101: Session 7 4/2/15Jarrod Shilts 4/5/15Ophir Ospovat.
Essential knowledge 3.C.3:
Shatha Khalil Ismael. Transformation Certain species of Gram- negative, gram- positive bacteria and some species of Archaea are transformable. The uptake.
Viruses Gene Regulation results in differential Gene Expression, leading to cell Specialization.
BTY328: Virology Dr William Stafford Viral characteristics and isolation-Lecture 1&2 Origin and diversity of viruses?-Tutorial Viral.
An Introduction to the Viruses Chapter 6 Copyright © The McGraw-Hill Companies, Inc) Permission required for reproduction or display.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bailee Ludwig Quality Management. Before we get started…. ….Let’s see what you know about Genomics.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.
Microbial Models I: Genetics of Viruses and Bacteria 8 November, 2004 Text Chapter 18.
How many genes are there?
Viruses and bacteria are the simplest biological systems - microbial models where scientists find life’s fundamental molecular mechanisms in their most.
Fig µm Chapter 19 - Viruses. Copyright © 2008 Pearson Education Inc., publishing as Pearson Benjamin Cummings Overview: A Borrowed Life Viruses.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The prokaryotic genome.
1 Zoology 145 course General Animal Biology For Premedical Student H Zoology Department Lecture 3 : Viruses.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Molecular Clocks and Continued Research
Bacteria Compared with Other Microorganisms Chapter 1.
Copyright © 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings PowerPoint ® Lecture Presentations for Biology Eighth Edition Neil Campbell.
General Microbiology (Micr300)
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
GENOMES and VIRUSES Chapter 13
The genome of prokaryotes and eukaryotes- nuclear and extranuclear genetic organization.
1 Annotation of the bacteriophage 933W genome: an in- class interactive web-based exercise.
An Introduction to the Viruses Non-Living Etiologies
Phylogeny and the Tree of Life
MCB 7200: Molecular Biology
Viruses Page 328.
Virus Basics - part I Viruses are genetic parasites that are smaller than living cells. They are much more complex than molecules, but clearly not alive,
Viruses.
Chapter 24 Table of Contents Section 1 Viral Structure and Replication
Subject: Molecular Virology Instructor: Dr. Sobia Manzoor
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Today… Review a few items from last class
Genomes and Their Evolution
Chapter 10: Classification of Microorganisms
General Animal Biology
Chapter 26 Phylogeny and the Tree of Life
Evolution of eukaryote genomes
Viruses Page 328.
Viruses Page 328.
Presentation transcript:

Completed Genomes: Viruses and Bacteria Monday, October 20, 2003 Introduction to Bioinformatics ME: J. Pevsner

Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by J Pevsner (ISBN ). Copyright © 2003 by Wiley. These images and materials may not be used without permission from the publisher. Visit Copyright notice

We are now beginning the last third of the course: Today: completed genomes (Chapters 12-14) Wednesday: Fungi. Exam #2 is due at the start of class. Next Monday: Functional genomics (Jef Boeke) Next Wednesday: Pathways (Joel Bader) Monday Nov. 3: Eukaryotic genomes Wednesday Nov. 5: Human genome Monday Nov. 10: Human disease Wednesday Nov. 12: Final exam (in class) Announcements

Genome projects (Chapter 12) chronological overview major issues and themes Introduction to viruses (Chapter 13) classification bioinformatics challenges and resources Introduction to bacteria and archaea (Chapter 14) classification bioinformatics challenges and resources Outline of today’s lecture

A genome is the collection of DNA that comprises an organism. Today we have assembled the sequence of hundreds of genomes. We will begin by introducing the “tree of life” in an effort to make a comprehensive survey of life forms. Introduction to genomes Page 397

Ernst Haeckel ( ), a supporter of Darwin, published a tree of life (1879) including Moner (formless clumps, later named bacteria). Chatton (1937) distinguished prokaryotes (bacteria that lack nuclei) from eukaryotes (having nuclei). Whittaker and others described the five-kingdom system: animals, plants, protists, fungi, and monera. In the 1970s and 1980s, Carl Woese and colleagues described the archaea, thus forming a tree of life with three main branches. Introduction: Systematics Page 399

plants animals monera fungi protists protozoa invertebrates vertebrates mammals Five kingdom system (Haeckel, 1879) Page 396

Fig Page 400 Pace (2001) described a tree of life based on small subunit rRNA sequences. This tree shows the main three branches described by Woese and colleagues.

Historically, trees were generated primarily using characters provided by morphological data. Molecular sequence data are now commonly used, including sequences (such as small-subunit RNAs) that are highly conserved. Visit the European Small Subunit Ribosomal RNA database for 20,000 SSU rRNA sequences. Molecular sequences as basis of trees Page 401

Genomes that span the tree of life are being sequenced at a rapid rate. There are several web-based resources that document the progress, including: GNNGenome News Network GOLDGenomes Online Database PEDANTProtein Extraction, Description & Analysis Tool Genome sequencing projects Page 405

There are three main resources for genomes: EBIEuropean Bioinformatics Institute NCBINational Center for Biotechnology Information TIGRThe Institute for Genomic Research Genome sequencing projects Page 405

archaea bacteria eukaryota

Overview of viral complete genomes

Overview of archaea complete genomes

Overview of eukaryota genomes in NCBI’s Entez division

Overview of eukaryota genomes in NCBI’s Entrez division

We will next summarize the major achievements in genome sequencing projects from a chronological perspective. Chronology of genome sequencing projects Page 404

1977: first viral genome Sanger et al. sequence bacteriophage  X174. This virus is 5386 base pairs (encoding 11 genes). See accession J Human mitochondrial genome 16,500 base pairs (encodes 13 proteins, 2 rRNA, 22 tRNA) Today, over 400 mitochondrial genomes sequenced 1986 Chloroplast genome 156,000 base pairs (most are 120 kb to 200 kb) Chronology of genome sequencing projects Page 406

Fig Page 407 Entrez nucleotide record for bacteriophage  X174 (graphics display)

mitochondrion chloroplast Lack mitochondria (?)

1995: first genome of a free-living organism, the bacterium Haemophilus influenzae Chronology of genome sequencing projects Page 409

1995: genome of the bacterium Haemophilus influenzae is sequenced Fig Page 411

Overview of bacterial complete genomes

Fig Page 411 You can find functional annotation through the COGs database (Clusters of Orthologous Genes)

Fig Page 411 Click the circle to access the genome sequence

Fig Page 412 Click the circle to access the genome sequence Genes are color-coded according to the COGs scheme

1996: first eukaryotic genome The complete genome sequence of the budding yeast Saccharomyces cerevisiae was reported. We will describe this genome on Wednesday. Also in 1996, TIGR reported the sequence of the first archaeal genome, Methanococcus jannaschii. Chronology of genome sequencing projects Page 413

1996: a yeast genome is sequenced

To place the sequencing of the yeast genome in context, these are the eukaryotes…

Eukaryotes (Baldauf et al. 2000) Fungi

1997: More bacteria and archaea Escherichia coli 4.6 megabases, 4200 proteins (38% of unknown function) 1998: first multicellular organism Nematode Caenorhabditis elegans 97 Mb; 19,000 genes. 1999: first human chromosome Chromosome 22 (49 Mb, 673 genes) Chronology of genome sequencing projects Page 413

1999: Human chromosome 22 sequenced

49 MB 673 genes

2000: Fruitfly Drosophila melanogaster (13,000 genes) Plant Arabidopsis thaliana Human chromosome : draft sequence of the human genome (public consortium and Celera Genomics) Chronology of genome sequencing projects Page 415

2000

Completed genome projects (current) Eukaryotes: 10In progress (partial): Anopheles gambiae Danio rerio (zebrafish) Arabidopsis thalianaGlycine max (soybean) Caenorhabditis elegans Hordeum vulgare (barley) Drosophila melanogaster Leishmania major Encephalitozoon cuniculi Rattus norvegicus Guillardia theta nucleomorph Mus musculus Plasmodium falciparum Saccharomyces cerevisiae (yeast) Schizosaccharomyces pombe Viruses: 1419 Bacteria: 139 Archaea: 36 Page 417

eukaryotes

[1] Selection of genomes for sequencing [2] Sequence one individual genome, or several? [3] How big are genomes? [4] Genome sequencing centers [5] Sequencing genomes: strategies [6] When has a genome been fully sequenced? [7] Repository for genome sequence data [8] Genome annotation Overview of genome analysis Page 418

Fig Page 418

[1] Selection of genomes for sequencing is based on criteria such as: genome size (some plants are >>>human genome) cost relevance to human disease (or other disease) relevance to basic biological questions relevance to agriculture Overview of genome analysis Page 419

[1] Selection of genomes for sequencing is based on criteria such as: genome size (some plants are >>>human genome) cost relevance to human disease (or other disease) relevance to basic biological questions relevance to agriculture Ongoing projects: ChickenFungi (many) ChimpanzeeHoney bee CowSea urchin Dog (recent publication)Rhesus macaque Overview of genome analysis Page 419

[2] Sequence one individual genome, or several? Try one… --Each genome center may study one chromosome from an organism --It is necessary to measure polymorphisms (e.g. SNPs) in large populations (November 5) For viruses, thousands of isolates may be sequenced. For the human genome, cost is the impediment. Overview of genome analysis Page 419

[3] How big are genomes? Viral genomes: 1 kb to 350 kb (Mimivirus: 800 kb) Bacterial genomes: 0.5 Mb to 13 Mb Eukaryotic genomes: 8 Mb to 686 Mb (discussed further on Monday, November 3) Overview of genome analysis Page 420

viruses plasmids bacteria fungi plants algae insects mollusks reptiles birds mammals Genome sizes in nucleotide base pairs The size of the human genome is ~ 3 X 10 9 bp; almost all of its complexity is in single-copy DNA. The human genome is thought to contain ~30,000-40,000 genes. bony fish amphibians

[4] 20 Genome sequencing centers contributed to the public sequencing of the human genome. Many of these are listed at the Entrez genomes site. (See Table 17.6, page 625.) Overview of genome analysis Page 421

[5] There are two main stragies for sequencing genomes Whole Genome Shotgun (from the NCBI website) An approach used to decode an organism's genome by shredding it into smaller fragments of DNA which can be sequenced individually. The sequences of these fragments are then ordered, based on overlaps in the genetic code, and finally reassembled into the complete sequence. The 'whole genome shotgun' (WGS) method is applied to the entire genome all at once, while the 'hierarchical shotgun' method is applied to large, overlapping DNA fragments of known location in the genome. Overview of genome analysis Page 421

Hierarchical shotgun method Assemble contigs from various chromosomes, then sequence and assemble them. A contig is a set of overlapping clones or sequences from which a sequence can be obtained. The sequence may be draft or finished. A contig is thus a chromosome map showing the locations of those regions of a chromosome where contiguous DNA segments overlap. Contig maps are important because they provide the ability to study a complete, and often large segment of the genome by examining a series of overlapping clones which then provide an unbroken succession of information about that region. Overview of genome analysis Page 421

[6] When has a genome been fully sequenced? A typical goal is to obtain five to ten-fold coverage. Finished sequence: a clone insert is contiguously sequenced with high quality standard of error rate 0.01%. There are usually no gaps in the sequence. Draft sequence: clone sequences may contain several regions separated by gaps. The true order and orientation of the pieces may not be known. Overview of genome analysis Page 422

[7] Repository for genome sequence data Raw data from many genome sequencing projects are stored at the trace archive at NCBI or EBI (main NCBI page, bottom right) Overview of genome analysis Page 425

Fig Page 426

Fig Page 426

[8] Genome annotation Information content in genomic DNA includes: -- repetitive DNA elements -- nucleotide composition (GC content) -- protein-coding genes, other genes These topics will be discussed in detail on November 3 (eukaryotic genomes) Overview of genome analysis Page 425

GC content (%) Vertebrates Invertebrates Plants Bacteria Number of species in each GC class GC content varies across genomes Fig Page 428

Viruses are small, infectious, obligate intracellular parasites. They depend on host cells to replicate. Because they lack the resources for independent existence, they exist on the borderline of the definition of life. The virion (virus particle) consists of a nucleic acid genome surrounded by coat proteins (capsid) that may be enveloped in a host-derived lipid bilayer. Viral genomes consist of either RNA or DNA. They may be single-, double, or partially double stranded. The genomes may be circular, linear, or segmented. Introduction to viruses Page 437

Viruses have been classified by several criteria: -- based on morphology (e.g. by electron microscopy) -- by type of nucleic acid in the genome -- by size (rubella is about 2 kb; HIV-1 about 9 kb; poxviruses are several hundred kb). Mimivirus (for Mimicking microbe) has a double-stranded circular genome of 800 kb. -- based on human disease Page 438 Introduction to viruses

Fig Page 439

Fig Page 440 The International Committee on Taxonomy of Viruses (ICTV) offers a website, accessible via NCBI’s Entrez site

Vaccine-preventable viral diseases include: Hepatitis A Hepatitis B Influenza Measles Mumps Poliomyelitis Rubella Smallpox Page 441 Introduction to viruses

Some of the outstanding problems in virology include: -- Why does a virus such as HIV-1 infect one species (human) selectively? -- Why do some viruses change their natural host? In 1997 a chicken influenza virus killed six people. -- Why are some viral strains particularly deadly? -- What are the mechanisms of viral evasion of the host immune system? -- Where did viruses originate? Bioinformatic approaches to viruses Page

The unique nature of viruses presents special challenges to studies of their evolution. viruses tend not to survive in historical samples viral polymerases of RNA genomes typically lack proofreading activity viruses undergo an extremely high rate of replication many viral genomes are segmented; shuffling may occur viruses may be subjected to intense selective pressures (host immune respones, antiviral therapy) viruses invade diverse species the diversity of viral genomes precludes us from making comprehensive phylogenetic trees of viruses Diversity and evolution of viruses Page 441

Herpesviruses are double-stranded DNA viruses that include herpes simplex, cytomegalovirus, and Epstein-Barr. Phylogenetic analysis suggests three major groups that originated about MYA. Bioinformatic approaches to herpesvirus Page 442

Fig Page 443

Consider human herpesvirus 9 (HHV-8). Its genome is about 140,000 base pairs and encodes about 80 proteins. We can explore this virus at the NCBI website. Try NCBI  Entrez  Genomes  viruses  dsDNA Bioinformatic approaches to herpesvirus Page 442

Fig Page 444

Fig Page 445

Fig Page 449

Consider human herpesvirus 9 (HHV-8). Its genome is about 140,000 base pairs and encodes about 80 proteins. Microarrays have been used to define changes in viral gene expression at different stages of infection (Paulose-Murphy et al., 2001). Conversely, gene expression changes have been measured in human cells following viral infection. Bioinformatic approaches to herpesvirus Page 442

Fig Page 450 Paulose-Murphy et al. (2001) described HHV-8 viral genes that are expressed at different times post infection

Human Immunodeficiency Virus (HIV) is the cause of AIDS. At the end of the year 2002, 42 million people were infected. HIV-1 and HIV-2 are primate lentiviruses. The HIV-1 genome is 9181 bases in length. Note that there are almost 100,000 Entrez nucleotide records for this genome (but only one RefSeq entry). Phylogenetic analyses suggest that HIV-2 appeared as a cross-species contamination from a simian virus, SIVsm (sooty mangebey). Similarly, HIV-1 appeared from simian immunodeficiency virus of the chimpanzee (SIVcpz). Bioinformatic approaches to HIV Page 446

Fig Page 446

Two major resources are NCBI and the Los Alamos National Laboratory (LANL) databases. See LANL offers -- an HIV BLAST server -- Synonymous/non-synonymous analysis program -- a multiple alignment program -- a PCA-like tool -- a geography tool Bioinformatic approaches to HIV Page 453

Fig Page 452

Fig Page 446

Bacteria and archaea constitute two of the three main branches of life. Together they are the prokaryotes. We can classify prokaryotes based on six criteria: [1] morphology [2] genome size [3] lifestyle [4] relevance to human disease [5] molecular phylogeny (rRNA) [6] molecular phylogeny (other molecules) Bacteria and archaea: genome analysis Page 466

Fig Page 468

Fig Page 470 M. genitalium has one of the smallest bacterial genome sizes. View its genome at

We may distinguish six prokaryotic lifestyles: [1] Extracellular (e.g. E. coli) [2] Facultatively intracellular (Mycobacterium tuberculosis) [3] Extremophilic (e.g. M. jannaschi) [4] epicellular bacteria (e.g. Mycoplasma pneumoniae) [5] obligate intracellular and symbiotic (B. aphidicola) [6] obligate intracellular and parasitic (Rickettsia) Bacteria and archaea: lifestyles Page 472

Fig Page 477

Fig Page 478 Revised figure

Fig Page 479

DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae Nature 406, (2000)

Four main features of genomic DNA are useful: [1] Open reading frame length [2] Consensus for ribosome binding (Shine-Dalgarno) [3] Pattern of codon usage [4] Homology of putative gene to other genes Bacteria and archaea: finding genes Page 480

Fig Page 482 GLIMMER for gene-finding in bacteria (

Fig Page 484 Lateral gene transfer occurs in stages

COGs database: organisms and tools

COGs database: functional annotation

COGs database: distribution of COGs by number of species COGs database: distribution of COGs by number of clades...

How can whole genomes be compared? -- molecular phylogeny -- You can BLAST (or PSI-BLAST) all the DNA and/or protein in one genome against another -- TaxPlot and COG for bacterial (and for some eukaryotic) genomes -- PipMaker, MUMmer and other programs align large stretches of genomic DNA from multiple species

Fig Page 493

Fig Page 493

Fig Page 494

Fig Page 495