The SEED Family First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How.

Slides:



Advertisements
Similar presentations
Submitting a Genome to RAST. Uploading Your Job 1.Login to your RAST account. You will need to register if this is your first time using SEED technologies.
Advertisements

High Throughput Computational Sequence Analysis Rob Edwards Argonne National Laboratory San Diego State University.
High performance computational analysis of DNA sequences from different environments Rob Edwards Computer Science Biology edwards.sdsu.eduwww.theseed.org.
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
THE GLOBAL MARINE VIRIOME Rob Edwards Dept. Biology, SDSU Computational Sciences Research Center, SDSU Center for Microbial Sciences, San Diego, Fellowship.
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms.
Metagenomics Rob Edwards MCS. The Soudan Mine, Minnesota Red Stuff Oxidized Black Stuff Reduced.
How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology Rob Edwards Depts of Computer Science And Biology,
Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
Annotations, Subsystems based approach Rob Edwards Argonne National Labs San Diego State University.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Amino Acid Metabolism. Essential Amino Acids Essential amino acids must be consumed in the diet. Mammalian cells lack enzymes to synthesize their carbon.
Metagenomic Analysis Using MEGAN4
Genome-scale Metabolic Reconstruction and Modeling of Microbial Life Aaron Best, Biology Matthew DeJongh, Computer Science Nathan Tintle, Mathematics Hope.
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.
Overview. What is Annotation? Annotation is the process of determining the location and function of all identifiable genes in a genome. Annotation is.
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Related Pathways Anaerobic Respiration Metabolism of Fats & Proteins.
Aim: How does DNA direct the production of proteins in the cell?
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
A collaborative tool for sequence annotation. Contact:
Body System Project Animal Nutrition Chapter 41 Kristy Blake and Krystal Brostek.
Annotation. Traditional genome annotation BLAST Similarities.
Metagenomics Tools Rob Edwards San Diego State University Flinders 2015 Image: Lisa Brown for National Public Radio.
SGM Meeting, Warwick, April 2006
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Related Pathways Anaerobic Pathways (4.4) & Alternatives To Glucose (4.3)
Subsystem: General secretory pathway (sec-SRP) complex (TC 3.A.5.1.1) Matthew Cohoon, Department of Computer Science, University of Chicago, Chicago, IL.
 Series of enzyme catalyzed reactions  Glycolysis - citrate cycle – oxidative phosphorylation  Sugar -> energy.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Comparison of Mouse Data to Human Datasets 3/1/16.
Real Time DNA Sequence Analysis: New tools for mining data Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne,
Using Computers to Understand Life: from Bacteria and Viruses to Corals and Fishes Rob Edwards SDSURF 2011.
Genomics, Metagenomics, And Google Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne, IL
Real time metagenomics Ross Overbeek Bob Olson Terry Disz Liz Dinsdale.
Biosynthesis of Amino Acids
Rob Edwards San Diego State University
The Integrated Microbial Genome (IMG) systems
The bioinformatics behind
Protein Synthesis: Translation
The Pathway Tools FBA Module
Building Metabolic Models
Protein Estimation by Lowry’s Method
Pipelines for Computational Analysis (Bioinformatics)
Protein Sequence Alignments
Proteins.
Do now activity #2 Name all the DNA base pairs.
The Omics Dashboard Suzanne Paley Pathway Tools Workshop 2018
Section 3-4: Translation
What is the equation for cellular respiration?
Proteins!!! More than just meat.
Amino acid synthesis Title slide - amino acid synthesis.
20.2 Gene Expression & Protein Synthesis
Comparative Genomics.
Introduction and Fundamentals of Protein Structure
Introduction and Fundamentals of Protein Structure
Do now activity #6 What is the definition of: RNA?
Replication, Transcription, Translation PRACTICE
Comparison of functions between the Gsub bacterium and the reference genomes. Comparison of functions between the Gsub bacterium and the reference genomes.
Do now activity #5 How many strands are there in DNA?
Aim: How does DNA direct the production of proteins in the cell?
Replication, Transcription, Translation PRACTICE
Replication, Transcription, Translation PRACTICE
The Omics Dashboard.
C A U G C U G G G G G U A C U C G C G C U A C C C G G G U A A
Biosynthesis and Usage of Lysine
Annotations, Subsystems based approach
By Jennifer Turley and Joan Thompson
Presentation transcript:

The SEED Family

First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing

Annotations vs. sequences

Subsystems Make Up Metabolism Wikipedia Metabolism

Subsystem spreadsheet (conceptually)

Three level “hierarchy” Amino Acids and Derivatives –Alanine, serine, and glycine Serine Biosynthesis Amino Acids and Derivatives –Lysine, threonine, methionine, and cysteine Methionine Biosynthesis Make your own subsystems! Over 1,000 Subsystems

Annotation of Complete Genomes Automated user originated processing Takes 1-7 hours depending on size and complexity of the genome ~2,000 external submissions, including hundreds of genomes not yet publicly released. Reannotation of >500 genomes complete 1,000 users, 200 organizations, 25 countries.

● Find the phylogenetic neighborhood of your genome ● Look for proteins that related organisms have –Core proteins –Subset of all subsystems ● Use those calls as a training set for critica/glimmer –Intrinsic training set! The annotation process (complete genomes)

This one’s for Gary

● Subsystem, GO, and KEGG connections –KEGG EC numbers –KEGG reaction numbers –SEED reaction numbers (Chris Henry) ● Metabolic flux models –Automatically generate FBA matrices (Aaron Best/Matt DeJongh; Hope College) Automatic metabolic reconstruction

The Populated Subsystem

Automatically compare metabolic reconstructions

● Rapidly correct missing annotations ● Add more members to subsystems Improves future genome annotations! (especially with new subsystems) Find and suggest candidate functions

10 genomes submitted on Thursday at 6 pm First annotation complete before 8 am Friday ● Remaining annotations completed Friday before noon ● (there were others in the pipeline too!) ● Presentation ASM 2009 Tuesday, 8pm The Live ASM Test Philadelphia, 2009

Subsystems coverage of sequenced Archaea

PHANTOME Mya Breitbart, Matt Sullivan, Jeff Elhai, Rob Edwards NSF Haloferax sulfurifontis prophage Prophages

Metagenomics RAST has 300 public metagenomes Compared using tblastx Comparing complete genomes to metagenomes

Human Poop

Thanks Nick Celms, Beltran Rodriguez-Mueller, Mya Breitbart, & Forest Rohwer High Salinity Salterns San Diego, July 2004

Low salinity salternsHigh salinity salterns July 2004 Nov 2005

The metagenomics RAST server

Automated Processing

Summary View

Metagenomics Tools Annotation & Subsystems

Metagenomics Tools Annotation & KEGG maps

Metagenomics Tools Recruitment Plots

Metagenomics Tools Phylogenetic Reconstruction

Metagenomics Tools Comparative Tools

Hours of Compute Time Input size (MB) Computational Requirements ~19 hours of compute per input megabyte

How much so far Total: 3,348 metagenomes 318,630,847 sequences 82,945,869,083 bp (83 Gbp) Largest metagenome: 729 Mbp, 11,719,618 reads Public: 393 Metagenomes 54,306,078 sequences 22,160,008,455 bp (22 Gbp) Compute time (on a single CPU): 1,575,971 hours = 65,665 days = 179 years

Lots of computers, no pattern

Does it work?

Lots of sequences all pyrosequencing

Metagenomics Tools Functional Heat Maps

Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signaling Phosphorus RNA Mine Saltern Marine Microbialites Coral Fish Animals Freshwater From Sequences To Environments Dinsdale et al, Nature 2008

BACK!