Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metagenomics.

Similar presentations


Presentation on theme: "Metagenomics."— Presentation transcript:

1 Metagenomics

2 What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

3 Screening from the environment
Random fragments of DNA Clone into a vector Low copy vectors BACs YACs

4 BACs Science Creative Quarterly

5 Screening from the environment
Random fragments of DNA Clone into a vector Low copy vectors BACs YACs Screen for a phenotype e.g. Diversa patents > 1,000 amylase genes Why did Diversa sequence whale-falls?

6 Screening from the environment
Expression host? Pathway or single gene? Get what you select But remember … A selection is worth a thousand screens

7 16S sequencing Catalogs the bacteria that are present
PCR amplify the 16S gene with standard primers Sequence the primers Compare to known databases

8 Ribosomes Ribosomes are made of proteins and RNA Prokaryotic ribosome:
Large subunit: 50S 5S and 23S rRNA Small subunit: 30S 16S rRNA

9 30S Thermus aquaticus subunit
Blue: protein Orange: rRNA

10 16S rRNA secondary structure
E. coli 16S rRNA secondary structure Highly conserved Base pairs = stems No pairing = loops

11 16S rRNA secondary structure
E. coli 16S rRNA secondary structure V7 V6 V5 V8 V4 V9 V3 V1 Variable regions in the 16S rRNA. Vn – 9 regions forward/rev primers V2

12 16S Primers 27F – 1492R full length 1,465 base pairs
967F – 1046R V6 region 1380F – 1510R V9 region 1,465 base pairs 79 base pairs 130 base pairs

13 Variable regions = Variable results!
V1-V3 V3-V5 V6-V9

14 16S databases Greengenes SILVA – ARB VAMPS
Gary Andersen, Lawrence Berkeley National Laboratory SILVA – ARB Frank Oliver Glöckner, MPI, Bremen, Germany VAMPS Mitch Sogin, Woods Hole, USA Ribosomal Database Project (RDP) James Cole, Michigan State University, USA

15 16S sequencing Cheap Easy Portable PCR bias
Variable regions give variable answers Only tells you which organisms are present & abundance Does not explain much of the variance of the data What does 16S sequencing actually tell you?

16 What does 16S sequencing tell you?

17 What does 16S sequencing tell you?

18 What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

19 16S sequencing is not good for functions

20 How much of the data? Topography of [fungi and] bacteria on the skin
Study = 5,000 taxa 14 skin sites 10 people 3 skin types 5,000 variables The remainder of the variance (85.1%) is explained by a few taxa each Each dimension only adds marginal information Findley et al, Nature 2013 doi: /nature12171 They don't explain the meaning of j-q

21 How much of the data? Nine biomes paper Variance:
1,040,665 reads total (from 45 samples) 30 subsystems 9 biomes 30 variables Fewer of the variables explain more of the data The variables are distinctive for each environment Dinsdale et al., Nature 2008 doi: /nature06810

22 Shotgun sequencing (HiSeq)
Movies courtesy Will Trimble, Argonne National Labs

23 16S sequencing (MiSeq) Movies courtesy Will Trimble, Argonne National Labs

24 Shotgun + 16S (HiSeq) Movies courtesy Will Trimble, Argonne National Labs

25 There is no 16S for viruses
Rohwer and Edwards, 2002. The phage proteomic tree. doi: /JB

26 Random community genomics
200 liters water 5-500 g fresh fecal matter Concentrate and purify viruses or bacteria Extract nucleic acids DNA/RNA LASL Epifluorescent Microscopy Sequence

27 How do you sequence the environment?
How do you sequence the environment? Extract DNA Soil extraction kit Water extraction kit Create library LASLs fosmids Sequence fragments

28 Linker-Amplified Shotgun Libraries (LASLs)
Soil Extraction Kit This method produces high coverage libraries of over 1 million clones from as little as 1 ng DNA David Mead - Breitbart (2002) PNAS

29 Early Attempts at a Metagenomics Platform
Submit BLAST to local and remote databases Local (as fast as possible) NCBI (one search every 3 seconds) Many concurrent searches One search versus 1,000 searches Parse data into tables for Excel Access to taxonomy etc

30 Human-associated viruses
Human-associated viruses More bacteria than somatic cells by at least an order of magnitude More phages than bacteria by an order of magnitude Sample the bacteria in the intestine by sampling their phage

31 Most Viral DNA Sequences in Adult Human Feces are Unknown Phages
Most Viral DNA Sequences in Adult Human Feces are Unknown Phages Eukaryotic Viruses 6% Known 40% Unknown 60% Phages 94% Breitbart (2003) J. Bacteriol.

32 Abundance of viruses in twins
Reyes et al, Nature 2010

33 Abundance of viruses in twins
Microbial samples in guts don't change very much Reyes et al, Nature 2010

34 Abundance of viruses in twins
Phage samples in guts change a lot Reyes et al, Nature 2010

35 Abundance of viruses in twins
Microbial Phage Reyes et al, Nature 2010

36 Most Human RNA Viruses are Known
Other Plant Viruses 9% Unknown 8% Other 26% Pepper Mild Mottle Virus 65% Known 92% Zhang (2006) PLoS Biology

37 Pepper Mild Mottle Virus (PMMV)
ssRNA virus; ≈6 kb genome Related to Tobacco Mosaic Virus Infects members of Capsicum family Widely distributed – spread through seeds Fruits are small, malformed, mottled Rod-shaped virions Viral particles in fecal sample TOBACCO MOSAIC VIRUS

38 PMMV is common in Human Feces
Fecal samples Extract total RNA RT-PCR for PMMV S1 S2 S3 S4 S5 S6 S7 S8 S9 PMMV Add san diego samples San Diego : 78% people are positive Singapore : 67% people are positive 10-50 fold increase in feces compared to food PMMV copies per gram dry weight of feces

39 Which Foods Contain PMMV?
Chili powder Chili sauces NOT FOUND IN FRESH PEPPERS Hong Kong chili sauce Hong Kong green chili Pork noodle red chili Vegetarian chili Indian curry Chicken rice Chinese food

40 PMMV is Present at High Concentrations in Raw Sewage and Treated Wastewater
PMMoV was detected in 100% of raw sewage samples collected among the dif states and in most of the treated effluent except for FLK Note dif efficiency in removing PMMoV below detection levels of the assayn (100). Except for FLK, all effluent values were > 10^4. According to these results PMMoV cannot be used as an indicator of untreated Concentrations of PMMoV are >1 million copies per milliliter raw sewage ated wastewater or raw sewage… Rosario et al. AEM (2009) 40

41 Different PMMV families
Lib3 Contig[0064] Lib3 Contig[0064] Lib2 Contig[0070] Lib2 Contig[0070] AB AB AB AB I I AB AB AF103778 AF103778 AY AY AB AB AJ AJ AB AB CoatProtein CoatProtein AB AB AB AB II II Diverse populations Differences between individuals and over time AB AB M M AJ AJ AF AF Lib2_2217 Lib2_2217 Lib3_Contig[0494] Lib3_Contig[0494] Lib3_Contig[1213] Lib3_Contig[1213] Lib2_Contig[0458] Lib2_Contig[0458] Lib2_Contig[1099] Lib2_Contig[1099] Lib3_65 Lib3_65 Lib3_Contig[0273] Lib3_Contig[0273] Lib3_Contig[0078] Lib3_Contig[0078] Lib3_Contig[0863] Lib3_Contig[0863] AJ AJ III III AB AB Same person 6 months apart AJ AJ X X Lib2_1377 Lib2_1377 Library 1 Library 2 Library 3 Lib2_2914 Lib2_2914 Lib1_2299 Lib1_2299 Lib3_928 Lib3_928 Lib2_1656 Lib2_1656 Lib2_2549 Lib2_2549 Lib3_462 Lib3_462 Lib2_492 Lib2_492 Lib3_Contig[0655] Lib3_Contig[0655] Lib2_133 Lib2_133 Lib1_Contig[0253] Lib1_Contig[0253] No colors:GenBank It’s a diverse group of PMMoV strains w/in individual and between individuals Lib1_Contig[0123] Lib1_Contig[0123] Lib1_Contig[0279] Lib1_Contig[0279] Lib1_Contig[0107 Lib1_Contig[0107 IV IV Lib1_Contig[0052 Lib1_Contig[0052 ] ] ] Lib1_Contig[0004 ] Lib1_Contig[0004 ] ] Lib2_Contig[0995] Lib2_Contig[0995] Lib1_Contig[0009] Lib1_Contig[0009] Lib1_Contig[0166] Lib1_Contig[0166] Lib1_Contig[0657] Lib1_Contig[0657] Lib1_1449 Lib1_1449 Lib1_2211 Lib1_2211 Lib1_Contig[0029] Lib1_Contig[0029] Lib1_1733 Lib1_1733 Lib1_Contig[0076] Lib1_Contig[0076] Lib1_1168 Lib1_1168 Lib1_Contig[0261] Lib1_Contig[0261] Lib1_2361 Lib1_2361 Lib2 1468 Lib2 1468 Lib2 Contig[0031] Lib2 Contig[0031] Lib2 Contig[1202] Lib2 Contig[1202] V V Lib1_Contig[0005] Lib1_Contig[0005] Lib1_Contig[0558] Lib1_Contig[0558] AF AF AB AB 0.1 0.1 41

42 Human-fecal borne PMMV can infect plants
Spread of infection to Hungarian wax pepper evident within 1 week Infected leaf was positive by RT-PCR for PMMV Animals may serve as vectors for plant viruses Fecal sample Viral concentrate Plant leaf inoculation Total RNA PMMV RT-PCR Leave pic!!! Still infectious Infected leaf Control 42

43 Koch’s Postulates 43434343 Thesunmachine.net

44 Random community genomics

45 Eukaryotic metagenomics
ITS sequences Internal transcribed spacer regions Individual genes Cox1 Exome sequencing Pull out ESTs and sequence

46 Why Metagenomics? What is there? How many are there?
Why Metagenomics? What is there? How many are there? What are they doing? Experimental manipulations Diagnostics

47 Sequencing costs decreasing

48 How much has been sequenced?
100 bacterial genomes Environmental sequencing Number of known sequences First bacterial genome 1,000 bacterial genomes Year

49 How much will be sequenced?
Everybody in USA Everybody in San Diego One genome from every species 100 people All cultured Bacteria Most major microbial environments Training people for the future X-Prize competition, sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 per genome Year

50 Most pipelines work the same way!

51 Metagenomics Processing
Merge paired-end reads Preprocessing Functional Assignments Taxonomic assignments Contamination removal Gene Prediction Contig Clustering Binning reads

52 Metagenomics Quality control – Prinseq Statistics Deconseq
Annotation FOCUS Real time metagenomics mg-rast Super FOCUS Statistics STAMP Population genomes crAss metabat ContigClustering

53 Metagenomics Processing
AbundanceBin CompostBin concoct crAss tetra Contig clustering FASTQC FastX Toolkit fitGCP NGS QC Toolkit Non-pareil Prinseq QC-Chain Streaming Trim Preprocessing FragGeneScan GlimmerMG MetaGeneAnnotator MetaGeneMark MetaGun Orphelia Prodigal Gene Prediction CARMA myTaxa FOCUS PhylopythiaS KRAKEN phymmbl LMAT RAIphy MEGAN TACOA Metaplan Taxy Taxonomic assignment CLAMS Sequedex DiScRIBinATE SORT-ITEMS genometa SPANNER GSMer SPHINX PPLACER TaxSOM RTMg Treephyler Functional assignment


Download ppt "Metagenomics."

Similar presentations


Ads by Google