Download presentation
Presentation is loading. Please wait.
Published byLucas Chase Modified over 8 years ago
1
Genomics, Metagenomics, And Google Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne, IL www.theseed.orgedwards.sdsu.edu
2
Outline ● Biology | Metagenomics | Yikes! Biology | Metagenomics | Yikes! ● (More biology?) (More biology?) ● Bioinformatics Bioinformatics ● Things Google could do Things Google could do ● Things we do with Google Things we do with Google FirstOutlineLast
3
First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing FirstOutlineLast
4
Everybody in Google Everybody in USA All cultured Bacteria 100 people How much will be sequenced? One genome from every species Most major microbial environments FirstOutlineLast Year
5
Why Metagenomics? What is there? How many are there? What are they doing? Experimental manipulations? FirstOutlineLast
7
Human-associated viruses More bacteria than somatic (human) cells by at least an order of magnitude More viruses than bacteria by an order of magnitude Sample the things in the intestine by sampling the viruses FirstOutlineLast
8
Most Viral DNA Sequences in Adult Human Feces are Unknown Phages Known 40% Unknown 60% Breitbart (2003) J. Bacteriol. Phages 94% Eukaryotic Viruses 6% FirstOutlineLast
9
Most Human RNA Viruses are Known Known 92% Unknown 8% Pepper Mild Mottle Virus 65% Other Plant Viruses 9% Other 26% Zhang (2006) PLoS Biology FirstOutlineLast
10
Pepper Mild Mottle Virus (PMMV) ssRNA virus; ≈6 kb genome Related to Tobacco Mosaic Virus Infects members of Capsicum family Widely distributed – spread through seeds Fruits are small, malformed, mottled Rod-shaped virions TOBACCO MOSAIC VIRUS http://www.rothamsted.bbsrc.ac.uk/ ppi/links/pplinks/virusems/ Viral particles in fecal sample FirstOutlineLast
11
S1S1 S2S3S4S5S6S7S8S9PMMV PMMV is common in Human Feces Fecal samples Extract total RNA RT-PCR for PMMV San Diego : 78% people are positive Singapore : 67% people are positive 10-50 fold increase in feces compared to food 106-109 PMMV copies per gram dry weight of feces FirstOutlineLast
12
Indian curry Pork noodle red chili Chicken rice Chinese food Hong Kong chili sauce Hong Kong green chili Vegetarian chili Which Foods Contain PMMV? Chili powder Chili sauces NOT FOUND IN FRESH PEPPERS FirstOutlineLast
14
Where Next? ● More (but not much more) biology? More (but not much more) biology? ● Less biology Less biology ● No biology FirstOutlineLast
15
Phages, Reefs, Human Disturbance FirstOutlineLast
16
Phages, Reefs, Human Disturbance FirstOutlineLast
17
Different Bacteria At Each Island FirstOutlineLast
18
More People == More Pathogens Negative numbers mean relatively more phage hosts at Kingman FirstOutlineLast
19
Bioinformatics Tools FirstOutlineLast
20
The SEED Family FirstOutlineLast
21
The metagenomics RAST server FirstOutlineLast
22
Automated Processing FirstOutlineLast
23
Hours of Compute Time Input size (MB) Computational Requirements ~19 hours of compute per input megabyte FirstOutlineLast
24
FirstOutlineLast
25
Computational Time FirstOutlineLast
26
How much so far Total: 2,740 metagenomes 255,178,533 sequences 65,595,200,612 bp (53 Gbp) Public: 299 Metagenomes 45,445,163 sequences 19,341,509,132 bp (19 Gbp) Compute time (on a single CPU): 1,246,308 hours = 51,929 days = 142 years FirstOutlineLast
27
Metagenomics Tools Annotation & Subsystems FirstOutlineLast
28
Lots of sequences all pyrosequencing FirstOutlineLast
29
Sulfur CDA 60.2% CDA 21.7% Respiration Capsule Motility Membrane transport Stress Signalin g Phosphorus RNA Mine Saltern Marine Microbialites Coral Fis h Animals Freshwater From Sequences To Environments Dinsdale et al, Nature 2008 FirstOutlineLast
30
Chickens, Cows, Mice, and People; Oh my! FirstOutlineLast
31
Virulence Subsystems In The Intestines Qu et al, PNAS, 2009 FirstOutlineLast
32
Microbial Virulence Genes Discriminate Hosts Qu et al, PNAS, 2009 FirstOutlineLast
33
Marine Near-shore water Off-shore water Near- and off-shore sediments Metazoan associated Corals Fish Human Sampling Sites Terrestrial/Soil NEON sites Urban Airborne Freshwater Aquifer Glacial lake Extreme Hot springs (84oC; 78oC) Soda lake (pH 13) Solar saltern (>35% salt) FirstOutlineLast
34
FirstOutlineLast
35
Searching (Text) ● Searching for genes (names, functions, text strings) ● Searching for controlled vocabulary terms (Subsystems, GO terms) ● Federating disparate data ● NCBI, SEED, JGI, EBI, DDBJ NCBISEEDJGIEBIDDBJ ● Annotation clearinghouse Annotation clearinghouse Desir e FirstOutlineLast
36
Web services FirstOutlineLast
37
Searching (Sequence) ● Searching for [DNA, protein] ● A better BLAST search ● Separate word matching from extension/scoring ● Perfectly (embarrassingly) parallel Desir e FirstOutlineLast
38
Desir e How BLAST Works Protein sequence Filter for words above a threshold Find all words in the protein sequence (>3 letters by default) Extend while score is above another threshold Calculate & report final score for alignment high scoring pairs Map Reduce FirstOutlineLast
39
● Google App Engine // GWT to extract information ● Searching | Browsing | Annotation ● 1Mb limit too small Data Visualization Desir e FirstOutlineLast
40
Data Visualization oror Desir e FirstOutlineLast
41
SEED/KML/PostGIS Liz Dinsdale (Biology) Bahador Nosrat (Msc student) Doin g Data Mapping Satellite photosynthesis vs. photosynthesis genes Pathogens around Kiritimati island FirstOutlineLast
42
Open Social Doin g FirstOutlineLast
43
Open Social Doin g Vasken Kamikisissian; Matt Seitz (Undergraduates) FirstOutlineLast
44
Doin g Open Social FirstOutlineLast
45
Acknowledgements Environmental Genomics Forest Rohwer Brian White Mya Breitbart All the labs that provided sequence Metagenomics Annotation Server Rick Stevens Folker Meyer Bob Olson Daniel Paarman Mark D'Souza Jared Wilkening Andreas Wilke Statistics & Web services Liz Dinsdale Robert Schmieder Dana Hall Beltran Rodriguez- Brito Bahador Nosrat FIG Ross Overbeek Veronika Vonstein Annotators Artist Paula Morris Argonne Sequencing Marc Domanus Areej Ammar FirstOutlineLast
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.