Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Similar presentations


Presentation on theme: "The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State."— Presentation transcript:

1 The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory Roche Life Sciences Workshop, Sept 2008 www.nmpdr.orgwww.theseed.org

2 Outline Metagenomics Tools for analyzing sequences Computational Challenges Does it work? www.nmpdr.orgwww.theseed.org

3 First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing www.nmpdr.orgwww.theseed.org

4 Everybody in San Diego Everybody in USA All cultured Bacteria 100 people How much will be sequenced? One genome from every species Most major microbial environments www.nmpdr.orgwww.theseed.org

5 Metagenomics (Just sequence it) 200 liters water 5-500 g fresh fecal matter 50 g soil Sequence Epifluorescent Microscopy Concentrate and purify bacteria, viruses, etc Extract nucleic acids Publish papers

6 Marine Near-shore water (~100 samples) Off-shore water (~50 samples) Near- and off-shore sediments Metazoan associated Corals Fish Human blood Human stool Modern Metagenomics Terrestrial/Soil Terragenomics Amazon rainforest Konza prairie Joshua Tree desert Air Freshwater Aquifer Glacial lake Extreme Hot springs (84 o C; 78 o C) Soda lake (pH 13) Solar saltern (>35% salt)

7 The Problem How do you generate consistent and accurate annotations for metagenomes? www.nmpdr.orgwww.theseed.org

8 The SEED Family www.nmpdr.orgwww.theseed.org

9 Annotations using subsystems FIG developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex Extended subsystems into FIGfams – protein families that perform the same functions. www.nmpdr.orgwww.theseed.org

10 Annotation of Complete Genomes Automated user originated processing Takes 1-7 hours depending on size and complexity of the genome ~2,000 external submissions, including hundreds of genomes not yet publicly released. Reannotation of >500 genomes complete 1,000 users, 200 organizations, 25 countries. http://rast.nmpdr.org/ www.nmpdr.orgwww.theseed.org

11 The metagenomics RAST server www.nmpdr.orgwww.theseed.org

12 Automated Processing

13 www.nmpdr.orgwww.theseed.org Summary View

14 Metagenomics Tools Annotation & Subsystems www.nmpdr.orgwww.theseed.org

15 Metagenomics Tools Annotation & KEGG maps

16 Metagenomics Tools Recruitment Plots

17 Metagenomics Tools Phylogenetic Reconstruction

18 Metagenomics Tools Comparative Tools

19 Hours of Compute Time Input size (MB) Computational Requirements ~19 hours of compute per input megabyte www.nmpdr.orgwww.theseed.org

20 How much so far 986 metagenomes 79,417,238 sequences 17,306,834,870 bp (17 Gbp) Average: ~15-20 M bp per genome Compute time (on a single CPU): 328,814 hours = 13,700 days = 38 years ~300 GS20 ~300 FLX ~300 Sanger www.nmpdr.orgwww.theseed.org

21 Lots of sequences all pyrosequencing www.nmpdr.orgwww.theseed.org

22 Metagenomics Tools Functional Heat Maps

23 Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signaling Phosphorus RNA Mine Saltern Marine Microbialites Coral Fish Animals Freshwater From Sequences To Environments Dinsdale et al, Nature 2008

24 Workshops Free workshops on NMPDR, RAST, mg-RAST, SEED Contact Leslie McNeil lkmcneil@ncsa.uiuc.edu or visit http://www.nmpdr.org/ www.nmpdr.orgwww.theseed.org

25 Acknowledgements Environmental Genomics Forest Rohwer All the labs that provided sequence Metagenomics Annotation Server Rick Stevens Folker Meyer Bob Olson Daniel Paarman Mark D'Souza Jared Wilkening Andreas Wilke Statistics & Web services Liz Dinsdale Robert Schmieder Dana Hall Beltran Rodriguez-Brito Bahador Nosrat FIG Ross Overbeek Veronika Vonstein Annotators www.nmpdr.orgwww.theseed.org Artist Paula Morris Argonne Sequencing Marc Domanus Areej Ammar

26 Artists impression : not all machines are known to explode

27 Terragenomics

28 Differences between soil samples


Download ppt "The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State."

Similar presentations


Ads by Google