Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource www.nmpdr.org Claudia Reich NCSA, University of Illinois, Urbana.

Slides:



Advertisements
Similar presentations
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Advertisements

Basics of Comparative Genomics Dr G. P. S. Raghava.
Gene Ontology John Pinney
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
3.1 An overview of genetic possesses 3.2 The basis of hereditary 3.3 DNA replication 3.4 RNA and protein synthesis 3.5 Gene expression.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Bioinformatics and Phylogenetic Analysis
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
Internet tools for genomic analysis: part 2
National Microbial Pathogen Data Resource About us NMPDR is a Bioinformatics Resource Center dedicated to the thorough understanding of core.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Annotations, Subsystems based approach Rob Edwards Argonne National Labs San Diego State University.
The diversity of genomes and the tree of life
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Ch10. Intermolecular Interactions and Biological Pathways
Automatic methods for functional annotation of sequences Petri Törönen.
Metagenomic Analysis Using MEGAN4
Tools for comparative genomics and expert annotations.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
T-COFFEE Multiple Alignments of Orthologous Sequences Horizontal Gene Transfer (Phylogenetic Trees) WebLogo.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Overview. What is Annotation? Annotation is the process of determining the location and function of all identifiable genes in a genome. Annotation is.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana.
Protein and RNA Families
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Operated by Los Alamos National Security, LLC for NNSA Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Western New York Genetics in Research Partnership Expanding Exposure, Career Exploration and Interactive Projects in Basic Genome Analysis and Bioinformatics.
Annotation. Traditional genome annotation BLAST Similarities.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Computational functional genomics Lital Haham Sivan Pearl.
Reconstructing the metabolic network of a bacterium from its genome: the construction of LacplantCyc Christof Francke In silico reconstruction of the metabolic.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Ontology TM (GO) Consortium
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Subsystem: General secretory pathway (sec-SRP) complex (TC 3.A.5.1.1) Matthew Cohoon, Department of Computer Science, University of Chicago, Chicago, IL.
Introducing Bioinformatics Using the Nitrogen Cycle Alyssa Bumbaugh Ron Peck Mark Radosevich.
1 Genes and Proteins The genetic information contained in the nucleotide sequence of DNA specifies a particular type of protein Enzymes = proteins that.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Bacterial infection by lytic virus
Using BLAST to Identify Species from Proteins
Bacterial infection by lytic virus
The Integrated Microbial Genome (IMG) systems
Comparative Analysis in BioCyc
Sequence based searches:
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Workshop on the analysis of microbial sequence data using ARB
Department of Genetics • Stanford University School of Medicine
Using BLAST to Identify Species from Proteins
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Genome Annotation Continued
Overview of Microbial Pathway and Genome Databases
What do you with a whole genome sequence?
Annotation Presentation
Explore Evolution: Instrument for Analysis
Using BLAST to Identify Species from Proteins
Annotations, Subsystems based approach
Presentation transcript:

Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana

Complete Microbial Genomes 464 complete microbial genomes in NCBI as of microbial genomes in progress as of

Making Sense of Genome Data Locate Genes: identify ORFs automatically  GeneMark  NCBI’s ORF Finder  Glimmer  Critica Assign Function: by sequence similarity to experimentally characterized proteins  BLAST family of sequence comparison tools

Problems with Assignments by Similarity When ORF is a member of a protein family Paralogous genes ORFs encoding similar proteins acting on different substrates Assignments can be transitive, and many times removed from experimental data

Other Factors Can Aid in Function Assignments Molecular phylogeny Paralogous and orthologous families Conserved gene neighborhood Metabolic context Bidirectional best hit matches across multiple genomes

Incorporating Information Other Than Similarity KEGG: manually curated pathway and metabolic maps GO: vocabularies that describe ORFs as associated with  biological processes  cellular components  molecular function MetaCyc: experimentally elucidated metabolic pathways

What is Needed: A system that:  integrates all the above concepts  organizes genomic data in structured idioms  allows high-throughput annotation of newly sequenced genomes  resolves discrepancies in different annotation tools  informs experimental research

Enter the SEED* Database and annotation environment Underlies, and accessible through, NMPDR ( Expert annotation via subsystems building Provides the most accurate genome annotations available *Argonne National Lab, University of Chicago, UIUC, FIG

What is a Subsystem? Any organizing biological principle:  metabolic pathway amino acid biosynthesis, nitrogen fixation, glycolysis  complex structure ribosome, flagellum  set of defining features virulome, pathogenicity islands  functional concept bacterial sigma factors, DNA binding proteins

Subsystems are: Sets of functional roles, which are functions, or abstractions of functions (such as an EC number), that together implement a specific biological process or concept Created manually by expert curators Experts annotate single subsystems over the complete collection of genomes, thus contributing and sharing their expertise with the scientific community

How Subsystems are Built Create a subsystem for the biological concept, and define the functional roles In one (or a few) key organisms that include the subsystem, find the genes and assign meaningful functional names Project the annotations to orthologous genes Expand to more genomes, creating a Populated Subsystem

Populated Subsystems Are Spreadsheets where:  Columns: functional roles  Rows: specific genomes  Cells: genes in the organism that implement the functional role

How to Access Subsystems From Search menu From Organism pages From search results when found protein is included in a subsystem From Annotation Overview pages

Subsystem Pages in NMPDR Table of Functional Roles Subsystem diagram (if appropriate) Populated subsystem spreadsheet Customizable spreadsheet viewing options Functional variants and subsets of roles Curator’s notes

Benefits of Subsystems More accurate annotations Annotation of protein families Analysis of sets of functionally related proteins Less error-prone to automatic projections to novel genomes

Subsystems Reveal Interesting Pathway variants:  Are they clustered by phylogeny? Delta subunit of RNA polymerase only Bacillales  Are they clustered by functional niche?  Horizontal gene transfer? Fused genes:  and  ’ subunit of RNA polymerase fused in Helicobacter Fissioned genes:  ’ subunit of RNA polymerase is fissioned in Cyanobacteria

Subsystems Reveal Interesting Duplicate assignments  More than one gene for one functional role? Alpha subunit of RNA polymerase in Magnetococcus and Francisella  Same sequenced region in more than one contig in partially assembled genomes?  Frameshifts or other sequencing errors?  Annotation errors?

Subsystems Reveal Interesting Missing genes:  Is the function essential?  Is the function conserved?  Does the missing gene cluster with homologs in other organisms?  Is the function performed by a newly recruited gene?  Has a gene been acquired by horizontal gene transfer and now performs that function?

Synthesis of Selenocysteinyl-tRNA Two known pathway variants  One step in Bacteria SelA is annotated  Two steps in Archaea and Eucarya PSTK was missing until very recently

Explore Selenocysteine Usage Start by searching for gene name, selA, in an organism known to use Sec, E. coli K12 Start from subsystem tree; expand category of "Protein metabolism," expand subcategory of "Selenoproteins" Open "Selenocysteine metabolism" subsystem from protein page or SS tree  Genomes arranged phylogenetically  Roles defined on mouse-over  What genes are missing in which organisms?  Are there Sec metabolism genes present in any organisms that do not have proteins that need Sec?  Are there organisms known to need Sec for certain proteins, but that do not have a complete Sec biosynthesis pathway?  Why is there a hypothetical protein included in this subsystem?