Download presentation
Presentation is loading. Please wait.
Published byRandall Harvey Modified over 9 years ago
1
18-21 August 2009 METAGENOMIC WORKSHOP James R. Cole, Ph.D. Ribosomal Database Project Center for Microbial Ecology Michigan State University http://rdp.cme.msu.edu
2
18-21 August 2009 RDP Pyrosequencing Pipeline Tools for high-throughput analysis
3
18-21 August 2009 Additional Functions Shannon Index Rarefaction Alignment Merger Estimate S SPADE Phylip Chao 1 Estimate Library Compare Dereplicate PAST R Mothur Many others compatible! Export Formats for Common Tools
4
18-21 August 2009
7
SPADE
8
18-21 August 2009 PAlaentologicalSTatistics
9
18-21 August 2009 R
11
Cluster Based Method
12
18-21 August 2009 TM7 Clostridia Unclassifed Bacteria Actinobacteria Bacteroidetea Acidobacteria Unclassifed Proteobacteria Deltaproteobacteria Gammaproteobacteria Verrucomicrobia Bacilli Planctomycetes Gemmatimonadetes Unclassified Firmicutes Betaproteobacteria Alphaproteobacteria TM7 Clostridia Unclassifed Bacteria Actinobacteria Bacteroidetea Acidobacteria Unclassifed Proteobacteria Deltaproteobacteria Gammaproteobacteria Verrucomicrobia Bacilli Planctomycetes Gemmatimonadetes Unclassified Firmicutes Betaproteobacteria Alphaproteobacteria Position by cluster order (thousands) 0 2 4 6 8 10 12 14 16 18 20 Species Genus Family Novel Species Genus Family Novel Pigeon Pea Bare Fallow Similarity Total abundance
13
18-21 August 2009 Pipeline Performance Processing Time –52 samples, 350,000 454 FLX reads Classifier ~ 2 CPU hrs. Aligner ~12 CPU hrs. Clustering ~2 CPU hrs. (depends on sample sizes) SeqMatch ~23 CPU hrs.
14
18-21 August 2009 Usage Stats 380 users since June 2008 April 2009 stats: –182 initial process jobs –1243 cluster jobs –832 alignment jobs –>11 million sequences aligned RDP Pyro tools distributed to several major institutions
15
18-21 August 2009 Analysis of 16S Variable Regions Important features
16
18-21 August 2009 v6 v4 v3 v1v2 rRNA Gene Regions Processed by the RDP Pyrosequencing Pipeline 5’5’ 3’ 16S rRNA Gene % of Sequence Covering Position
17
18-21 August 2009 V6 V3V4V1-V2 Canonical Positions 47129205261 Conserved Base Pairs 15266471 90 % Size Range 56-65 131- 159 206- 208 277- 328 % Missing Pairs 2x10 -4 1x10 -4 0.3x10 -4 0.9x10 -4 % Half Pairs 4x10 -4 1x10 -4 8x10 -4 1x10 -4 % Paired50366246 % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32% Statistics from 300,000 Sanger Sequences (RDP release 10.11) Secondary-structure figures from http://www.rna.icmb.utexas.edu
18
18-21 August 2009 V3 V6V3V4V1-V2 Canonical Positions 47129205261 Conserved Base Pairs 15266471 90 % Size Range 56-65 131- 159 206- 208 277- 328 % Missing Pairs 2x10 -4 1x10 -4 0.3x10 -4 0.9x10 -4 % Half Pairs 4x10 -4 1x10 -4 8x10 -4 1x10 -4 % Paired50366246 % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32%
19
18-21 August 2009 V4 V6V3V4V1-V2 Canonical Positions 47129205261 Conserved Base Pairs 15266471 90 % Size Range 56-65 131- 159 206- 208 277- 328 % Missing Pairs 2x10 -4 1x10 -4 0.3x10 -4 0.9x10 -4 % Half Pairs 4x10 -4 1x10 -4 8x10 -4 1x10 -4 % Paired50366246 % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32%
20
18-21 August 2009 V1 V2 V6V3V4V1-V2 Canonical Positions 47129205261 Conserved Base Pairs 15266471 90 % Size Range 56-65 131- 159 206- 208 277- 328 % Missing Pairs 2x10 -4 1x10 -4 0.3x10 -4 0.9x10 -4 % Half Pairs 4x10 -4 1x10 -4 8x10 -4 1x10 -4 % Paired50366246 % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32%
21
18-21 August 2009 V6V3V4V1-V2 Canonical Positions 47129205261 Conserved Base Pairs 15266471 90 % Size Range 56-65 131- 159 206- 208 277- 328 % Missing Pairs 2x10 -4 1x10 -4 0.3x10 -4 0.9x10 -4 % Half Pairs 4x10 -4 1x10 -4 8x10 -4 1x10 -4 % Paired50366246 % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32% Chance a Species’ Sequence is Identical to at Least One Other Species Based on 6,841 bacterial species type strain sequences Strain information from “The Living Tree Project” http://www.arb-silva.de/ projects/living-tree/
22
18-21 August 2009 Chance Two Operons Differ in One Organism V6V3V4V1-V2 Canonical Positions 47129205261 Conserved Base Pairs 15266471 90 % Size Range 56-65 131- 159 206- 208 277- 328 % Missing Pairs 2x10 -4 1x10 -4 0.3x10 -4 0.9x10 -4 % Half Pairs 4x10 -4 1x10 -4 8x10 -4 1x10 -4 % Paired50366246 % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32% Based on 561 completed genome sequences with two or more rRNA operons
23
18-21 August 2009 V4 SangerFLX Avg. Size207 % Missing Pairs 0.3x10 -4 2.1x10 -4 % Half Pairs 8x10 -4 1.4x10 -2 % Paired62 % Aligned99% Quality of Recovered Structure
24
18-21 August 2009 V4 SangerFLX Avg. Size207 % Missing Pairs 0.3x10 -4 2.1x10 -4 % Half Pairs 8x10 -4 1.4x10 -2 % Paired62 % Aligned99% Quality of Recovered Structure
25
18-21 August 2009 Introduction to the Short Read Archive (SRA) myRDP SRA Prepkit
26
18-21 August 2009 SRA Submission Format
27
18-21 August 2009 1N StudyExperiment AnalysisRunSample 1 1 N N N 1 Submission Six Different SRA Document Types
28
18-21 August 2009 myRDP SRA Prepkit myRDP SRA PREPKIT SEQUENCE READS XML DOCUMENTS NCBI-SRA EMBL-ERA METADATA SEQUENCING PROJECT myRDP SWS SUBMIT
29
18-21 August 2009
30
Sample Attributes Prefilled Genomic Standards Consortium MIMS (Minimal Information about a Metagenome Sequence)* *Nature Biotechnology 26, 541-547 (2008)
31
18-21 August 2009 Functional Genes
32
18-21 August 2009 FGPR Home Page Screenshot
33
18-21 August 2009 FGPR Screenshots seed sequences active links to GenBank records active links to GenBank records organism name display/filter options custom analysis
34
18-21 August 2009 Functional Gene Pipeline/Repository Sequence Analysis interactive commands sub-selection for further analysis sub-selection for further analysis dynamic tree applet http://flyingcloud.cme.msu.edu/fungene/
35
18-21 August 2009 Functional Gene Processing 1)Remove Frameshifts 1)tBLASTX 2)GeneWise 2)Translate and align sequences 1)HMMER 2)MUSCLE 3)Determine conserved residues 1)Entropy plot 4)Compare to reference sequences 1)Determine functional subclass
36
18-21 August 2009 Entropy (Dioxygenease Genes)
37
18-21 August 2009 Interactive distance matrix display Couples matrix with taxonomy information Allows rapid detection of taxonomic inconsistencies Taxomatic: Interactive Taxonomy Explorer
38
18-21 August 2009 Integrated overlays Taxomatic: Interactive Taxonomy Explorer
39
18-21 August 2009 Integrated overlays Taxomatic: Interactive Taxonomy Explorer
40
18-21 August 2009 Integrated overlays Taxomatic: Interactive Taxonomy Explorer
41
18-21 August 2009 Integrated overlays Taxomatic: Interactive Taxonomy Explorer
42
18-21 August 2009 zoom and pan
43
18-21 August 2009 Can zoom down to individual sequences
44
18-21 August 2009 Megan Taxonomic analysis through metagenomic data
45
18-21 August 2009
46
Megan Modified k-nn LCA taxonomic classifier Requires BLAST result file Extracts taxonomy, cogs from matches Features from NCBI Prokaryotic Attributes Table
47
18-21 August 2009 MEGAN Screenshot1
48
18-21 August 2009 MEGAN Screenshot2
49
18-21 August 2009 MEGAN Screenshot3
50
18-21 August 2009 Metagenomics Analysis Pipelines Sequence Comparison
51
18-21 August 2009 General Considerations What databases are used? –GenBank nr (not good) –Pfam, TIGRfam, FIGfam? What search strategy is used? –BLAST, HMMER, Additional tools? Will they process my data –Will my data become public
52
18-21 August 2009 HMMER vs BLAST
53
18-21 August 2009 BMC Genomics Aziz
54
18-21 August 2009 The SEED & RAST Subsystems: Pathway database –Expert annotation –Curated simultaneously across many genomes FIGfams: Database of protein families – Derived from Subsystems database –Controlled addition of new family members RAST: Genome annotation system –Uses FIGfams for gene annotation –Uses Subsystems for pathway annotation
55
18-21 August 2009 The SEED & RAST
56
18-21 August 2009 fromPDF
57
18-21 August 2009 BMC RAST Fig. 2
58
18-21 August 2009 BMC RAST Fig. 4
59
18-21 August 2009
62
JGI’S IMG/M HOME
63
18-21 August 2009
64
CAMERA HOME
65
18-21 August 2009 CAMERA DASHBOARD
66
18-21 August 2009 CAMERA PROJECT SAMPLES
67
18-21 August 2009 Metadata Data about data
68
18-21 August 2009 Metadata Standards Minimum Information about a Microarray Experiment (MIAME) Minimum Information about a genome sequence (MIGS) Minimum Information about a metagenome sequence (MIMS)
69
18-21 August 2009 Nature Biotechnology 26, 541-547 (2008)
70
18-21 August 2009 MIMS extension: select to report a set of uniform measurements for a given habitat: Water body: (temperature, pH, salinity, pressure, chlorophyll, conductivity, light intensity, dissolved organic carbon (DOC), current, atmospheric data, density, alkalinity, dissolved oxygen, particulate organic carbon (POC), phosphate, nitrate, sulfates, sulfides, primary production) (integer, unit) Box 1 Minimum Information about a Genome Sequence (MIGS): Habitat Specific Attributes
71
18-21 August 2009 To help establish a set of suggested attributes for soil sequence data In cooperation with: - The Genomic Standards Consortium - The International Soils Metagenome Sequencing Consortium (Terragenome) Soil Metadata Survey
72
18-21 August 2009 Soil Metadata Survey Summary Not Difficulty to obtain Importance Very Easy Hard
73
18-21 August 2009 Soil Metadata Survey Summary Not Difficulty to obtain Importance Very Easy Hard VERY IMPORTANT / EASY TO OBTAIN -- Chemical: pH (in water or Calcium chloride) Biological: plant cover (native) Soil/Geological: horizon Geographical: latitude and longitude, elevation Management: land use (e.g., urban, agri- culture, forestry), tillage (type), crops (current, rotation), fertilizers (type and annual amount) Climate: mean and seasonal rainfall, mean and seasonal temperatures Sampling: depth, composite design, moisture content at sampling area represented by composite sample, weight of sample used for DNA extraction
74
18-21 August 2009 Technology Issues Limitations of Pyrosequencing
75
18-21 August 2009 Gomez-Alvarez ISME Article
76
18-21 August 2009 Gomez-Alvarez Fig. 1 Figure 1 (a) Alignment of five sequences in a cluster demonstrates the types of sequencing errors and length variation (highlighted in gray) included in a cluster. (b) Number of reads in a cluster versus the cluster number, ordered from the largest to smallest sized cluster; both axes are plotted on a log 10 scale. (c) The best BLAST match and COG affiliation for four of the most abundant clusters in replicate soil metagenomes. (d) Distribution of exact duplicate and all replicate reads in a metagenomic dataset from soil (this study) and seawater metagenomes (Frias-Lopez et al., 2008; Mou et al., 2008). *Rep, technical replicates; +Sp, biological replicates. The number of reads in each category is presented in Table 1.
77
18-21 August 2009 Gomez-Alvarez Table 1 (left) Gomez-Alvarez, V., Teal, T.K., Schmidt, T.M. (July 2009) Accurate determination of microbial diversity from 454 pyrosequencing data. ISME Journal advance online publication. doi:10.1038/ismej.2009.72 Table 1 Total numbers of reads, exact duplicates and all replicate sequences, including duplicates, from representative metagenomic data sets Habitat (metagenome) Number of reads
78
18-21 August 2009 PyroNoise Article
79
18-21 August 2009 Pyro Fig. 1 Figure 1 | OTU number as a function of percentage sequence difference for 90 pyrosequenced 16S rRNA gene clones of known sequence. (a,b) Results are repeated for complete linkage (a) and average linkage algorithms (b).
80
18-21 August 2009 Pyro Fig. 2 Figure 2 | Proportion of sequences assigned to the correct OTU as a function of percentage sequence difference for pyrosequenced 16S rRNA gene clones of known sequence. (a,b) Results are repeated for complete linkage (a) and average linkage algorithms (b).
81
18-21 August 2009 Pyro Table 1 Quince, C., Lanzén, A., Curtis, T.P., Davenport, R.J., Hall, N., Head, I.M., Read, L.F., and Sloan, W.T. (2009) Accurate determination of microbial diversity from 454 pyrosequencing data. Nature Methods Advanced Online Publication Aug 2009. doi:10.1038/NMETH.1361
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.