Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genomics, Metagenomics, And Google Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne, IL

Similar presentations


Presentation on theme: "Genomics, Metagenomics, And Google Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne, IL"— Presentation transcript:

1 Genomics, Metagenomics, And Google Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne, IL www.theseed.orgedwards.sdsu.edu

2 Outline ● Biology | Metagenomics | Yikes! Biology | Metagenomics | Yikes! ● (More biology?) (More biology?) ● Bioinformatics Bioinformatics ● Things Google could do Things Google could do ● Things we do with Google Things we do with Google FirstOutlineLast

3 First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing FirstOutlineLast

4 Everybody in Google Everybody in USA All cultured Bacteria 100 people How much will be sequenced? One genome from every species Most major microbial environments FirstOutlineLast Year

5 Why Metagenomics? What is there? How many are there? What are they doing? Experimental manipulations? FirstOutlineLast

6

7 Human-associated viruses More bacteria than somatic (human) cells by at least an order of magnitude More viruses than bacteria by an order of magnitude Sample the things in the intestine by sampling the viruses FirstOutlineLast

8 Most Viral DNA Sequences in Adult Human Feces are Unknown Phages Known 40% Unknown 60% Breitbart (2003) J. Bacteriol. Phages 94% Eukaryotic Viruses 6% FirstOutlineLast

9 Most Human RNA Viruses are Known Known 92% Unknown 8% Pepper Mild Mottle Virus 65% Other Plant Viruses 9% Other 26% Zhang (2006) PLoS Biology FirstOutlineLast

10 Pepper Mild Mottle Virus (PMMV) ssRNA virus; ≈6 kb genome Related to Tobacco Mosaic Virus Infects members of Capsicum family Widely distributed – spread through seeds Fruits are small, malformed, mottled Rod-shaped virions TOBACCO MOSAIC VIRUS http://www.rothamsted.bbsrc.ac.uk/ ppi/links/pplinks/virusems/ Viral particles in fecal sample FirstOutlineLast

11 S1S1 S2S3S4S5S6S7S8S9PMMV PMMV is common in Human Feces Fecal samples Extract total RNA RT-PCR for PMMV San Diego : 78% people are positive Singapore : 67% people are positive 10-50 fold increase in feces compared to food 106-109 PMMV copies per gram dry weight of feces FirstOutlineLast

12 Indian curry Pork noodle red chili Chicken rice Chinese food Hong Kong chili sauce Hong Kong green chili Vegetarian chili Which Foods Contain PMMV? Chili powder Chili sauces NOT FOUND IN FRESH PEPPERS FirstOutlineLast

13

14 Where Next? ● More (but not much more) biology? More (but not much more) biology? ● Less biology Less biology ● No biology FirstOutlineLast

15 Phages, Reefs, Human Disturbance FirstOutlineLast

16 Phages, Reefs, Human Disturbance FirstOutlineLast

17 Different Bacteria At Each Island FirstOutlineLast

18 More People == More Pathogens Negative numbers mean relatively more phage hosts at Kingman FirstOutlineLast

19 Bioinformatics Tools FirstOutlineLast

20 The SEED Family FirstOutlineLast

21 The metagenomics RAST server FirstOutlineLast

22 Automated Processing FirstOutlineLast

23 Hours of Compute Time Input size (MB) Computational Requirements ~19 hours of compute per input megabyte FirstOutlineLast

24 FirstOutlineLast

25 Computational Time FirstOutlineLast

26 How much so far Total: 2,740 metagenomes 255,178,533 sequences 65,595,200,612 bp (53 Gbp) Public: 299 Metagenomes 45,445,163 sequences 19,341,509,132 bp (19 Gbp) Compute time (on a single CPU): 1,246,308 hours = 51,929 days = 142 years FirstOutlineLast

27 Metagenomics Tools Annotation & Subsystems FirstOutlineLast

28 Lots of sequences all pyrosequencing FirstOutlineLast

29 Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signalin g Phosphorus RNA Mine Saltern Marine Microbialites Coral Fis h Animals Freshwater From Sequences To Environments Dinsdale et al, Nature 2008 FirstOutlineLast

30 Chickens, Cows, Mice, and People; Oh my! FirstOutlineLast

31 Virulence Subsystems In The Intestines Qu et al, PNAS, 2009 FirstOutlineLast

32 Microbial Virulence Genes Discriminate Hosts Qu et al, PNAS, 2009 FirstOutlineLast

33 Marine Near-shore water Off-shore water Near- and off-shore sediments Metazoan associated Corals Fish Human Sampling Sites Terrestrial/Soil NEON sites Urban Airborne Freshwater Aquifer Glacial lake Extreme Hot springs (84oC; 78oC) Soda lake (pH 13) Solar saltern (>35% salt) FirstOutlineLast

34 FirstOutlineLast

35 Searching (Text) ● Searching for genes (names, functions, text strings) ● Searching for controlled vocabulary terms (Subsystems, GO terms) ● Federating disparate data ● NCBI, SEED, JGI, EBI, DDBJ NCBISEEDJGIEBIDDBJ ● Annotation clearinghouse Annotation clearinghouse Desir e FirstOutlineLast

36 Web services FirstOutlineLast

37 Searching (Sequence) ● Searching for [DNA, protein] ● A better BLAST search ● Separate word matching from extension/scoring ● Perfectly (embarrassingly) parallel Desir e FirstOutlineLast

38 Desir e How BLAST Works Protein sequence Filter for words above a threshold Find all words in the protein sequence (>3 letters by default) Extend while score is above another threshold Calculate & report final score for alignment high scoring pairs Map Reduce FirstOutlineLast

39 ● Google App Engine // GWT to extract information ● Searching | Browsing | Annotation ● 1Mb limit too small Data Visualization Desir e FirstOutlineLast

40 Data Visualization oror Desir e FirstOutlineLast

41 SEED/KML/PostGIS Liz Dinsdale (Biology) Bahador Nosrat (Msc student) Doin g Data Mapping Satellite photosynthesis vs. photosynthesis genes Pathogens around Kiritimati island FirstOutlineLast

42 Open Social Doin g FirstOutlineLast

43 Open Social Doin g Vasken Kamikisissian; Matt Seitz (Undergraduates) FirstOutlineLast

44 Doin g Open Social FirstOutlineLast

45 Acknowledgements Environmental Genomics Forest Rohwer Brian White Mya Breitbart All the labs that provided sequence Metagenomics Annotation Server Rick Stevens Folker Meyer Bob Olson Daniel Paarman Mark D'Souza Jared Wilkening Andreas Wilke Statistics & Web services Liz Dinsdale Robert Schmieder Dana Hall Beltran Rodriguez- Brito Bahador Nosrat FIG Ross Overbeek Veronika Vonstein Annotators Artist Paula Morris Argonne Sequencing Marc Domanus Areej Ammar FirstOutlineLast


Download ppt "Genomics, Metagenomics, And Google Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne, IL"

Similar presentations


Ads by Google