Exploiting Gene Clusters to Curate Annotations October, 2003 Ross Overbeek, Fellowship for Interpretation of Genomes (FIG)

Slides:



Advertisements
Similar presentations
Two ways to Regulate a Metabolic Pathway
Advertisements

Control of Gene Expression
2 Bacterial Genetic Recombination What is the main source of genetic recombination in bacteria? Mutations What are the other sources of recombination?
An Introduction to “Bioinformatics to Predict Bacterial Phenotypes” Jerry H. Kavouras, Ph.D. Lewis University Romeoville, IL.
Lipid Biosynthesis C483 Spring Which of these is NOT a difference between fatty acid synthesis and beta oxidation? A)Synthesis requires an enzyme.
Recombinant DNA technology
Regulation and Control of Metabolism in Bacteria
Primary and Secondary Metabolites Despite the extremely varied characteristics of living organisms, the pathways for generally modifying and synthesizing.
INTRODUCTON Although Gregor John Mendel for the first time use the term Factor for hereditary units. This mystery.
Four of the many different types of human cells: They all share the same genome. What makes them different?
E.coli aerobic/anaerobic switch study Chao Wang, Mar
Viral & Prokaryotic Genetics “Simple” Model Systems.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bacterial Physiology (Micr430) Lecture 6 Lipids and Nitrogen Metabolism (Text Chapters: 9, 12) IN CASE OF EMERGENCY WHEN I CANNOT UPLOAD SLIDES, PLEASE.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
National Microbial Pathogen Data Resource About us NMPDR is a Bioinformatics Resource Center dedicated to the thorough understanding of core.
Annotations, Subsystems based approach Rob Edwards Argonne National Labs San Diego State University.
Plasmid purification lab
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Microbial Genomes Features Analysis Role of high-throughput sequencing Yeast - the eukaryotic model microbe Databases –TIGR CMR –NCBI Microbial Genomes.
Final Review C483 Spring Replication.
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
Regulatory factors 1) Gene copy number 2) Transcriptional control 2-1) Promoters 2-2) Terminators, attenuators and anti-terminators 2-3) Induction and.
Regulation of Gene Expression
Chapter 13. The Impact of Genomics on Antimicrobial Drug Discovery and Toxicology CBBL - Young-sik Sohn-
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
CS 790 – Bioinformatics Introduction and overview.
Gene expression and DNA microarrays Old methods. New methods based on genome sequence. –DNA Microarrays Reading assignment - handout –Chapter ,
Processing Function Terms Amelia Ireland. Function Problems Function-process crossover  Function and process are not orthogonal  Functions appear in.
Bacterial Gene Expression and Regulation
Regulation of Gene expression by E. Börje Lindström This learning object has been funded by the European Commissions FP6 BioMinE project.
Sequencing the World of Possibilities for Energy & Environment Annotation: function prediction and metabolic reconstruction Thanos Lykidis Genome Biology.
Subsystem: Succinate dehydrogenase The super-macromolecular respiratory complex II (succinate:quinone oxidoreductase) couples the oxidation of succinate.
BC21D: Bioenergetics & Metabolism The formation of Acetyl Coenzyme A; Krebs cycle; electron transport chains and chemiosmotic phosphorylation mechanism:
Comparative genomics and metabolic reconstruction of bacterial pathogens Mikhail Gelfand Institute for Information Transmission Problems, RAS GPBM-2004.
Gene Regulation, Part 1 Lecture 15 Fall Metabolic Control in Bacteria Regulate enzymes already present –Feedback Inhibition –Fast response Control.
Lipid Biosynthesis (Chapter 21) Fatty acid biosynthesis and oxidation proceed by distinct pathways, catalyzed by different enzymes, using different cofactors.
The Genetics of Viruses
Control of Gene Expression Chapter Proteins interacting w/ DNA turn Prokaryotic genes on or off in response to environmental changes  Gene Regulation:
AP Biology April 12, 2012 BellRinger Quiz  Identify and describe the 3 main parts of an operon Objective  Explain prokaryotic and eukaryotic gene regulation.
Functional and Evolutionary Attributes through Analysis of Metabolism Sophia Tsoka European Bioinformatics Institute Cambridge UK.
AP Biology Control of Prokaryotic (Bacterial) Genes.
Transcriptional Signature following Inhibition of Early- Stage Cell Wall Biosynthesis in Staphylococcus aureus A.J O’Neil, J. A. Lindsay, K. Gould, J.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Annotation. Traditional genome annotation BLAST Similarities.
Bacterial Genetics.
Plasmid Isolation Prepared by Latifa Aljebali Office: Building 5, 3 rd floor, 5T250.
Gene Regulation Bacterial metabolism Need to respond to changes – have enough of a product, stop production waste of energy stop production.
I. Introduction Tetrahydrobiopterin (BH4) is a cofactor used in various processes. It has been extensively studied in mammalian systems were BH4 has a.
Ribonucleotide reductases (RNRs) catalyse the reduction of ribonucleotides to their corresponding 2`-deoxyribonucleotides and therefore play an essential.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Subsystem: General secretory pathway (sec-SRP) complex (TC 3.A.5.1.1) Matthew Cohoon, Department of Computer Science, University of Chicago, Chicago, IL.
Chapter 15. I. Prokaryotic Gene Control  A. Conserves Energy and Resources by  1. only activating proteins when necessary  a. don’t make tryptophan.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Methionine and cysteine are the two sulfur-containing amino acids. In addition to its general function as a component of proteins, methionine is specifically.
The Integrated Microbial Genome (IMG) systems
FLiPS Functional Linkage Prediction Service.
Microbial Metabolism In Class Activity
Biosynthesis of Fatty Acid
Microbial Metabolism In Class Activity
Control of Gene Expression
Microbiology: A Systems Approach
Regulation of Gene Expression
Review Warm-Up What is the Central Dogma?
Review Warm-Up What is the Central Dogma?
Chapter 9 Topics - Genetics - Flow of Genetics - Regulation - Mutation
Gene Regulation in Prokaryotes
Review Warm-Up What is the Central Dogma?
Annotations, Subsystems based approach
Presentation transcript:

Exploiting Gene Clusters to Curate Annotations October, 2003 Ross Overbeek, Fellowship for Interpretation of Genomes (FIG)

Outline of the Talk The Emerging Opportunity The Use of Clusters to Find “Missing Genes” Experiences with a Single Pathway “The Project” Tools Needed to Support the Project

Three “Laws”  The amount of available DNA sequence data will double every 18 months  The number of available genomes will double every 18 months  The cost of sequence will drop by a factor of 2 every 18 months.

Basic Facts  We have about publicly available more-or-less complete genomes  We will have about 1000 complete genomes within 3 years  This will lead to better annotations, not worse  The majority of annotations will need to be automated, and the process must accurately follow the steps that a human expert would take

The Use of Clusters to Find Missing genes

3, ,000 functional roles (300 – 3,000 per organism) Largely conserved across the three kingdoms (sequences; functions; pathways) “Missing genes” are still there Central Machinery of Life: Horizons of gene discovery

B E1 A + DC + EF E2E3 gene Agene B?gene C Missing genes in metabolic pathways making a case Missing gene Globally Missing Gene (never identified in any species)

B E1 A + DC + EF E2E3 gene Agene B?gene C Missing genes in metabolic pathways making a case Missing gene Locally Missing Gene (non-orthologous gene displacement)

gene A 1 gene C 1 gene R 1 gene T 1 gene G 1 gene X 1 GENOME 1 GENOME 2 gene A 2 gene M 2 gene X 2 GENOME 3 gene A 3 gene S 3 gene U 3 gene X 3 gene Y 3 gene N 2 gene C 2 gene Y 2 gene Q 3 GENE CLUSTERING ON THE CHROMOSOME (OPERONS) Techniques of genome context analysis (I) checking neighbors

gene A 1 gene C 1 GENOME 1 GENOME 3 gene C 3 / Z 3 GENOME 4 gene A 4 / X 4 gene A 3 gene C 4 GENOME 5 gene C 5 / A 5 PROTEIN FUSION EVENTS Techniques of genome context analysis (II) checking connections

gene A 1 gene C 1 gene R 1 gene T 1 gene X 1 GENOME 1 GENOME 5 gene C 5 / A 5 gene R 5 gene X 5 GENOME 2 gene A 2 gene W 2 gene C 2 SHARED REGULATORY SITES (REGULONS ) Techniques of genome context analysis (III) co-regulation

gene A 1 gene C 1 gene I 1 gene X 1 gene H 1 gene G 1 gene W 1 gene Y 1 gene Z 1 GENOME 1 gene A 2 gene C 2 gene I 2 gene X 2 gene H 2 gene G 2 gene W 2 gene Y 2 - GENOME 2 gene A 3 gene C 3 gene I 3 gene X 3 gene H 3 gene G 3 gene W 3 gene Y 3 gene Z 3 GENOME 3 gene A 4 gene C 4 gene I 4 gene X 4 gene H 4 - gene W 4 -- GENOME 4 gene A 5 gene C 5 gene I 5 gene X 5 gene H 5 gene G 5 gene W GENOME 5 gene I 6 gene H 6 - gene W 6 gene Y 6 gene Z 6 GENOME 6 gene I 7 gene H 7 gene G 7 gene W 7 - gene Z 7 GENOME 7 gene I 8 gene H 8 - gene W 8 gene Y 8 - GENOME 8 gene I 9 gene H 9 - gene W 9 gene Y 9 gene Z 9 GENOME 9 gene I gene W 10 gene Y 10 gene Z 10 GENOME 10 IN-GROUP OUT-GROUP Score: Techniques of genome context analysis (IV) co-evolution OCCURRENCE PROFILES

Missing gene case primary suspects

Chorismate catabolism Isochorismate anabolism Trp Phe Tyr syntheses D-Erythrose 4-P + Phosphoenol pyruvate 7P-2-Dehydro-3- deoxy-D-arabino -heptulosonate 3-Dehydro-Quinate 3-Dehydro-Shikimate aroH aroF aroG 1 aroB 2 aroD 3 Shikimate Kinase (EC ) aroK aroL 5 Chorismate O5-(1-Carboxyvinyl)- 3-P-Shikimate aroA 6 aroC 7 Shikimate P H OH H OH H H COOH H H OP H OH H OH H H COOH H H OH Shikimate 4 ydiB aroD Example I: Chorismate Pathway Missing gene in all archaea

?? Fusion Protein Chromosomal Clustering: Prediction

Functional coupling in chorismate pathway ClusteringFusionOccurence

Example II: “Missing Drug Target” in S.pneumoniae acp P fab D accA accDaccB accC fabHfab F fab G fabZ fabI Gene fabI of Enoyl-ACP reductase (EC ) is missing in a number of Streptococci

Clustering of FAB Genes : Prediction Genome X TR? fabIhyp ? hyp TR? FRNS Genome Y Clostridium acetobutylicum TR? Streptococcus pyogenes ? hyp Escherichia coli EC 4…PLSX L32Pg30k MAF TR? ? fabH acpP ? fabGfabFaccAaccDaccC accB fabZ fabD fabHfabDfabGacpPfabF fabGfabF accBfabD accAaccDaccCfabZ fabGfabFaccAaccDaccC accB fabZ fabDfabH acpP fabH acpP fabH acpP fabGfabFaccAaccDaccC accB fabZ fabD A conserved hypothetical FMN-binding protein “?” is the best candidate for the missing gene fabI in Gram-positive cocci

13 July 2000 Nature 406, (2000) © Macmillan Publishers Ltd. Microbiology : A triclosan-resistant bacterial enzyme RICHARD J. HEATH AND CHARLES O. ROCK Triclosan is an antimicrobial agent that is widely used in a variety of consumer products and acts by inhibiting one of the highly conserved enzymes (enoyl-ACP reductase, or FabI) of bacterial fatty-acid biosynthesis. But several key pathogenic bacteria do not possess FabI, and here we describe a unique triclosan- resistant flavoprotein, FabK, that can also catalyse this reaction in Streptococcus pneumoniae. Our finding has implications for the development of FabI-specific inhibitors as antibacterial agents. Independent Experimental Verification

Missing genes, examples in cofactor pathways prediction and experimental verification

The Leucine Degradation Cluster: Origin of a New Perspective on Uses of Clusters

Isovaleryl-CoA dehydrogenase (EC ) Leu Iso- valeryl- CoA Methyl- crotonoyl- CoA Methylcrotonoyl-CoA carboxylase (EC ) Methylglutaconyl- CoA hydratase (EC ) Methyl- glutaconyl- CoA HMG- CoA deamination oxydation Acetyl- CoA Aceto- acetate carboxylase subunit biotin-containing subunit Context-based enrichment of initial functional assignments example from Brucella melitensis genome analysis E.C. NoFunctional role Gene ID No. in cluster ISOVALERYL-COA DEHYDROGENASE BR METHYLCROTONYL-COA CARBOXYLASE - Biotin-containing subunit BR Carboxylase subunit BR METHYLGLUTACONYL-COA HYDRATASE BR HYDROXYMETHYLGLUTARYL-COA LYASE BR0017* ACETOACETATE-COA LIGASE BR00215 BR0017* BR0021 BR0016 BR0018 BR0019 BR0020 TIGR specific non-specific * specific non-specific * frameshift * Biotin carboxylase; Carboxyl transferase familty subunit; Enoyl-CoA hydratase/isomerase family

No gene assigned in any organism in KEGG, NCBI, TIGR Gene assigned in B. melitensis 2003 (IG) Gene assignment propagated over 26 organisms using gene clustering Leucine degradation in Baccili

158 New assignments OrganismGene anchor Clustered genes

Gene cluster in B. subtilis

Leucine degradation in Baccili E.C. NoFunctional role No. in cluster ISOVALERYL-COA DEHYDROGENASE METHYLCROTONYL-COA CARBOXYLASE - BIOTIN CONTAINING SUBUNIT 3 - CARBOXYLASE SUBUNIT 1 BIOTIN CARBOXYL CARRIER METHYLGLUTACONYL-COA HYDRATASE HYDROXYMETHYLGLUTARYL-COA LYASE ACETOACETATE-COA LIGASE *ACETOACETATE-COA LIGASE* 14 ?

Listeria Clostridia Ralstonia Shew. Xylella 1 Cell division protein mraZ 3 S-adenosyl-methyltransferase mraW (EC ) 4 Cell division protein ftsI 2UDP-N-acetylmuramoylalanine--D-glutamate ligase (EC ) 2UDP-N-acetylmuramoylalanyl-D-glutamate--2,6-diaminopimelate ligase (EC ) 5Phospho-N-acetylmuramoyl-pentapeptide-transferase (EC )

Brevibacter Enterococcus Brucella Geobacter 1 Phospho-N-acetylmuramoyl-pentapeptide-transferase (EC ) 2UDP-N-acetylmuramoylalanine--D-glutamate ligase (EC ) 6Cell division protein ftsW 5 UDP-N-acetylglucosamine--N-acetylmuramyl-(pentapeptide) pyrophosphoryl-undecaprenol N- acetylglucosamine transferase (EC ) 2UDP-N-acetylmuramate--alanine ligase (EC ) 9Cell division protein ftsZ 11UDP-N-acetylenolpyruvoylglucosamine reductase (EC ) 2D-alanine--D-alanine ligase (EC )

Bacteroides thetaiotaomicron Bacillus cereus Geobacter metallireducens Buchnera 5 Cell division protein ftsW 1 UDP-N-acetylglucosamine--N-acetylmuramyl-(pentapeptide) pyrophosphoryl- undecaprenol N-acetylglucosamine transferase (EC ) 2UDP-N-acetylmuramoylalanine--D-glutamate ligase (EC ) 8UDP-N-acetylenolpyruvoylglucosamine reductase (EC ) 9Cell division protein ftsQ 2 UDP-N-acetylmuramoylalanyl-D-glutamate--2,6-diaminopimelate ligase (EC ) 3Cell division protein ftsA 6 Cell division protein ftsZ

Oceanobacillus iheyensis Enterococcus faecium DO Escherichia coli K12 Wigglesworthia brevipalpis 2 Cell division protein ftsA 1 Cell division protein ftsZ 8Hypothetical protein 10 Hypothetical protein 12 RNA binding protein 7 UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine deacetylase (EC ) 13Protein translocas subunit secA

The Project: Annotate 1000 Genomes in Three Years By making the task concrete, we force engineering decisions It will be easier to annotate 1000 genomes well than to annotate 50 well (comparative analysis is the key) Analysis by subsystem (rather than by genome) is clearly the key The use of clusters is the key to precise annotation of subsystems

Annotation by Subsystem Requires knowledge of known variants Evolution of clusters plays a major role There are three components of the task: –Building tools to support analysis –Actually doing the analysis on subsystems –Coordinating with groups doing a limited set of wet lab confirmations

FIG: Building the Initial Annotation Tools Releasing the browser/curation tool with approximately genomes within a few months Peer-to-peer updates/synchronization Open source and free (initially for Macs and Linux systems)