Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Slides:



Advertisements
Similar presentations
Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean
Advertisements

Integration of Prokaryotic Genomics into the Unknown Microbe ID Lab Bert Eardley – Penn State, Berks & Dan Golemboski – Bellarmine University.
Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Tucson High School Biotechnology Course Spring 2010.
Metabarcoding 16S RNA targeted sequencing
What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards Fellowship.
Pfam(Protein families )
DESIGNING THE MICROBIAL RESEARCH COMMONS: AN INTERNATIONAL SYMPOSIUM NATIONAL ACADEMY OF SCIENCES, WASHINGTON, DC, 8-9 OCTOBER 2009 Paul Gilna, B.Sc.,
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Mike Arnoult 9/30/2010 The role of Artificial Neural Networks in Phage Research.
High Throughput Computational Sequence Analysis Rob Edwards Argonne National Laboratory San Diego State University.
Introduction to Environmental Science: A Case Study Of Critical Thinking Strategies And The Development Of Technical Writing Skills. Dr. Tom Wilson Department.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
High performance computational analysis of DNA sequences from different environments Rob Edwards Computer Science Biology edwards.sdsu.eduwww.theseed.org.
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
THE GLOBAL MARINE VIRIOME Rob Edwards Dept. Biology, SDSU Computational Sciences Research Center, SDSU Center for Microbial Sciences, San Diego, Fellowship.
Metagenomics Rob Edwards MCS. The Soudan Mine, Minnesota Red Stuff Oxidized Black Stuff Reduced.
How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology Rob Edwards Depts of Computer Science And Biology,
National Microbial Pathogen Data Resource About us NMPDR is a Bioinformatics Resource Center dedicated to the thorough understanding of core.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Sequencing All of Microbial Life: Challenges and Opportunities Rob Edwards Argonne National Laboratory San Diego State University.
Annotations, Subsystems based approach Rob Edwards Argonne National Labs San Diego State University.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Metagenomics Binning and Machine Learning
Metagenomic Analysis Using MEGAN4
Genome-scale Metabolic Reconstruction and Modeling of Microbial Life Aaron Best, Biology Matthew DeJongh, Computer Science Nathan Tintle, Mathematics Hope.
The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.
Bacterial Virulence Factors Dongwoo Shin Laboratory of Molecular Bacteriology Department of Molecular Cell Biology Sungkyunkwan University School of Medicine.
Advancing Science with DNA Sequence Undergraduate Genomics in a Research University Environment A Collaborative Effort between the JGI and UC Merced M.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Chapter 21 Eukaryotic Genome Sequences
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Tsute (George) Chen Bioinformatics Core Department of Microbiology The Forsyth Institute March 24 th, 2015 HOMD A Tour to the Data and Tools.
The metagenomics sequencing service CD Genomics. Metagenomics: Metagenomics is the study of metagenomes, genetic material recovered directly from environmental.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
A collaborative tool for sequence annotation. Contact:
Metagenomics at Second Genome
Finding new nirK genes in metagenomic data
Metagenomics Tools Rob Edwards San Diego State University Flinders 2015 Image: Lisa Brown for National Public Radio.
Joanna Klein, Ph.D. Northwestern Scholarship Symposium May 10, 2013.
SGM Meeting, Warwick, April 2006
Fast Categorization of Bacteriophage Protein Families using Computer Graphics.
Systems Microbiology Biology 475. Systems microbiology aims to integrate basic biological information with genomics, transcriptomics, metabolomics, glycomics,
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Subsystem: General secretory pathway (sec-SRP) complex (TC 3.A.5.1.1) Matthew Cohoon, Department of Computer Science, University of Chicago, Chicago, IL.
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing.
Real Time DNA Sequence Analysis: New tools for mining data Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne,
The SEED Family First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How.
Using Computers to Understand Life: from Bacteria and Viruses to Corals and Fishes Rob Edwards SDSURF 2011.
Genomics, Metagenomics, And Google Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne, IL
Real time metagenomics Ross Overbeek Bob Olson Terry Disz Liz Dinsdale.
Rob Edwards San Diego State University
The bioinformatics behind
Seminar in Bioinformatics (236818)
Omolola C. Betiku1,2. , Carl J. Yeoman2, T. Gibson Gaylord1, Suzanne L
Genome Annotation Continued
Genomic Data Manipulation
Genomes and Their Evolution
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
What is Transformation?
Comparison of functions between the Gsub bacterium and the reference genomes. Comparison of functions between the Gsub bacterium and the reference genomes.
Three major barriers to the integration of metagenomics into pharmacology and toxicology. Three major barriers to the integration of metagenomics into.
Annotations, Subsystems based approach
Presentation transcript:

Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory NSF/EU Cyberinfrastructure Meeting, Washington, DC.

First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing

Everybody in San Diego Everybody in USA All cultured Bacteria 100 people How much will be sequenced? One genome from every species Most major microbial environments

What do we want from annotations? Consistent Accurate Available Reliable

Consistent

The Importance of Consistency Consistency: same genes connected to same functional role Enables communication Required for most comparative genomics assays

hisA FIG function: Phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (EC ) Other functions in RefSeq: phosphoribosylformimino-5-aminoimidazole carboxamide phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase phosphoribosylformimino-5-aminoimidazole carboxamide ribotide... 1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase N-(5-phospho-L-ribosyl-formimino)-5-amino-1-(5- phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1-(5'-phosphoribosyl)-4-imidazolecarboxamide isomerase Phosphoribosyl isomerase A [1-[5-phosphoribosyl]-5-[[5-phosphoribosylamino]methylideneamino] imidazole-4-carboxamide isomerase]

Measuring Consistency Define a set of protein families such that each family contains genes playing the same function Attach functional roles to protein families Measure the consistency of the annotations made to genes within each family 1."consistency" is the odds that two proteins from the same family have the same function 2.Evaluate both families and functions.

Consistency among databases

Accurate

How to measure accuracy If everything was called “hypothetical protein” the database would be 100% consistent Need to measure accuracy (specificity) as well as consistency Sample 100 proteins at random from “curated” set (i.e. that are believed to be correct) Manually inspect annotations to score correctness

Available

Free service User registration/log in Free to upload sequences in several formats Automatically annotates sequences Download in several formats Complete genomes too: Soon to come: Plasmids, phages, other short genomes

Metagenome Metabolic Reconstruction

Metabolic potential in environments

Phylogenomics

Comparing Metagenomes to Genomes (or other metagenomes!)

Reliable (Believable)

Metabolic potential in environments

Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signaling Phosphorus RNA Mine Saltern Marine Microbialites Coral Fish Animals Freshwater From sequences to environments

What do we want from annotations? Consistent Accurate Available Reliable When do we want it? NOW

Acknowledgements Environmental Genomics Forest Rohwer Rohwer lab members All the labs that provided sequence Metagenomics Annotation Server Rick Stevens Daniel Paarman Folker Meyer Bob Olsen Statistics Liz Dinsdale Dana Hall Beltran Rodriguez-Brito FIG Ross Overbeek Veronika Vonstein Annotators

Subsystems make up metabolism Wikipedia Metabolism