Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Similar presentations


Presentation on theme: "Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,"— Presentation transcript:

1 Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory ASM General Meeting, Boston. www.nmpdr.orgwww.theseed.org See also poster: B-179 (126B) Aziz et al

2 First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing www.nmpdr.orgwww.theseed.org

3 Everybody in Boston Everybody in USA All cultured Bacteria 100 people How much will be sequenced? One genome from every species Most major microbial environments www.nmpdr.orgwww.theseed.org

4 The Problem How do you generate consistent and accurate annotations for metagenomes? www.nmpdr.orgwww.theseed.org

5 The SEED Family www.nmpdr.orgwww.theseed.org

6 Annotations using subsystems FIG has developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex Extended subsystems into FIGfams – protein families that perform the same functions. www.nmpdr.orgwww.theseed.org

7 Subsystems make up metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism

8 SEED Viewer www.nmpdr.orgwww.theseed.org

9 Populated Subsystem www.nmpdr.orgwww.theseed.org

10 predicted or measured co-regulation genome context (virulence islands, prophages, conserved gene clusters) virulence mechanism cellular localization enzymatic activity common phenotype combinations of criteria Subsystems Are Not Just Pathways www.nmpdr.orgwww.theseed.org

11 Automated Annotations of Complete genomes Automated user originated processing Takes 1-7 hours depending on size and complexity of the genome ~1,500 external submissions, including 150 genomes not yet publicly released. Reannotation of >500 genomes complete 789 users, 160 organizations, 25 countries. http://rast.nmpdr.org/

12 Automated Annotations of Complete Metagenomes MG-RAST Server Accurate and consistent annotations in a few days Automatic metabolic reconstruction Freely available after registration http://metagenomics.theseed.org/ www.nmpdr.orgwww.theseed.org

13 Metagenome Annotation Automated pipeline –upload sequences in fasta, with or without Q- scores –removes exact duplicates (454 artefact) –renumbers sequences (mapping provided) –BLAST against SEED nr, 16S rDNA –Annotations and metabolic reenactment –Taxonomic summary www.nmpdr.orgwww.theseed.org

14 Metagenome Metabolic Reenactment

15 Phylogenomics

16 Comparing Metagenomes to Genomes (or other metagenomes!)

17 Metabolic potential in environments

18 Hours of Compute Time Input size (MB) MG-RAST computation ~19 hours of compute per input megabyte

19 How much so far 676 metagenomes 10,012,793,995 bp (10 Gbp) Average: ~15 M bp per genome Compute time (on a single CPU): 190,243 hours = 7,926 days = 21 years ~200 GS20 ~200 FLX ~200 Sanger] www.nmpdr.orgwww.theseed.org

20 Lots of sequences all pyrosequencing www.nmpdr.orgwww.theseed.org

21 Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signaling Phosphorus RNA Mine Saltern Marine Microbialites Coral Fish Animals Freshwater From Sequences To Environments Dinsdale et al, Nature 2008

22 Upcoming Features More user options (removing sequences, E-values, percent identities, etc) More databases (ACLAME, human, etc) More user generated content (mash- ups) via webservices and published API www.nmpdr.orgwww.theseed.org

23 Thanks: Bahador Nosrat SDSU Accessing Data via Web Services

24 Workshops Free workshops on NMPDR, RAST, mg- RAST, SEED Upcoming workshops: Greece, Argonne, Urbana-Champaign, San Diego Contact Leslie McNeil lkmcneil@ncsa.uiuc.edu or visit http://www.nmpdr.org/

25 Acknowledgements Environmental Genomics Forest Rohwer and the labs that provided sequence Metagenomics Annotation Server Rick Stevens Daniel Paarman Folker Meyer Bob Olsen Mark D'Souza Statistics & Web services Liz Dinsdale Dana Hall Beltran Rodriguez-Brito Bahador Nosrat FIG Ross Overbeek Veronika Vonstein Annotators www.nmpdr.orgwww.theseed.org

26


Download ppt "Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,"

Similar presentations


Ads by Google