18-21 August 2009 METAGENOMIC WORKSHOP James R. Cole, Ph.D. Ribosomal Database Project Center for Microbial Ecology Michigan State University

Slides:



Advertisements
Similar presentations
Luciano Brocchieri, PhD Research Interests. Summary of Research Interests 1.Gene identification and genome annotation 2.The evolution of genome-sequence.
Advertisements

Nitrogen transformations include denitrification to N 2 O or N 2, oxidation of ammonium to nitrate, (nitrification) and anaerobic ammonia oxidation (anammox).
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Metabarcoding 16S RNA targeted sequencing
Centers of Excellence for Influenza Research and Surveillance 6 th Annual Meeting Aug 1, 2012 Status of IRD Development.
FACE Soil Metagenome Comparisons in IMG Melissa Dsouza, Peter Hallin, Craig Herbold, Rima Upchurch, & Paul Wilkinson.
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
Bioinformatics and Phylogenetic Analysis
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Metagenomics Binning and Machine Learning
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
Metagenomic Analysis Using MEGAN4
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
Molecular Microbial Ecology
Site Classification for Re-calibration of the Alabama Index of Stream Macroinvertebrate Condition Ben Jessup and Jen Stamp Tetra Tech, Inc. SWPBA November.
Gene Expression Omnibus (GEO)
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
Water Quality Data, Maps, and Graphs Over the Web · Chemical concentrations in water, sediment, and aquatic organism tissues.
Roadmap for Soil Community Metagenomics of DOE’s FACE & OTC Sites
Microbial diversity and virulence probing of five different body sites Anu Rebbapragada, Pub. Health Ontario Central Lab. Canada Wei-Jen Lin, Cal State.
Bacterial Genetics - Assignment and Genomics Exercise: Aims –To provide an overview of the development and.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
Figure S1 The North Sea beach of the Dutch barrier island of Schiermonnikoog (N53°30’ E6°10’). The transect indicates the chronosequence along the developing.
Diversity and quantification of candidate division SR1 in various anaerobic environments James P. Davis and Mostafa Elshahed Microbiology and Molecular.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Microbial biomass and community composition of a tallgrass prairie soil subjected to simulated global warming and clipping A. Belay-Tedla, M. Elshahed,
Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Accurate estimation of microbial communities using 16S tags
Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.
What is BLAST? Basic BLAST search What is BLAST?
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Canadian Bioinformatics Workshops
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Convenience Sample of 4 Adults and 6 Infants. Adults 4 visits over 2 weeks; infants 2 visits over 2 weeks Adult specimens: 1) plaque (by method, teeth,
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Metagenomic survey of a biological tannery wastewater treatment plant in Modjo, Ethiopia Adey Feleke Desta*, Seyoum Leta***, Francesca Stomeo**, Joyce.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Soil Microbiome of Native and Invasive Marsh Grasses in Blackbird Creek, Delaware Lathadevi K.Chintapenta 1#, Gulnihal Ozbay 1#, Venu Kalavacharla 1* Figure.
Robert Edgar Independent scientist
What is BLAST? Basic BLAST search What is BLAST?
Metagenomic Species Diversity.
Recording Metadata Inbal adir 26/4/17.
Metagenomics Rob Edwards.
H = -Σpi log2 pi.
Metagenomics Microbial community DNA extraction
Volume 137, Issue 2, Pages (August 2009)
Volume 17, Issue 3, Pages (March 2015)
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

18-21 August 2009 METAGENOMIC WORKSHOP James R. Cole, Ph.D. Ribosomal Database Project Center for Microbial Ecology Michigan State University

18-21 August 2009 RDP Pyrosequencing Pipeline Tools for high-throughput analysis

18-21 August 2009 Additional Functions Shannon Index Rarefaction Alignment Merger Estimate S SPADE Phylip Chao 1 Estimate Library Compare Dereplicate PAST R Mothur Many others compatible! Export Formats for Common Tools

18-21 August 2009

SPADE

18-21 August 2009 PAlaentologicalSTatistics

18-21 August 2009 R

Cluster Based Method

18-21 August 2009 TM7 Clostridia Unclassifed Bacteria Actinobacteria Bacteroidetea Acidobacteria Unclassifed Proteobacteria Deltaproteobacteria Gammaproteobacteria Verrucomicrobia Bacilli Planctomycetes Gemmatimonadetes Unclassified Firmicutes Betaproteobacteria Alphaproteobacteria TM7 Clostridia Unclassifed Bacteria Actinobacteria Bacteroidetea Acidobacteria Unclassifed Proteobacteria Deltaproteobacteria Gammaproteobacteria Verrucomicrobia Bacilli Planctomycetes Gemmatimonadetes Unclassified Firmicutes Betaproteobacteria Alphaproteobacteria Position by cluster order (thousands) Species Genus Family Novel Species Genus Family Novel Pigeon Pea Bare Fallow Similarity Total abundance

18-21 August 2009 Pipeline Performance Processing Time –52 samples, 350, FLX reads Classifier ~ 2 CPU hrs. Aligner ~12 CPU hrs. Clustering ~2 CPU hrs. (depends on sample sizes) SeqMatch ~23 CPU hrs.

18-21 August 2009 Usage Stats 380 users since June 2008 April 2009 stats: –182 initial process jobs –1243 cluster jobs –832 alignment jobs –>11 million sequences aligned RDP Pyro tools distributed to several major institutions

18-21 August 2009 Analysis of 16S Variable Regions Important features

18-21 August 2009 v6 v4 v3 v1v2 rRNA Gene Regions Processed by the RDP Pyrosequencing Pipeline 5’5’ 3’ 16S rRNA Gene % of Sequence Covering Position

18-21 August 2009 V6 V3V4V1-V2 Canonical Positions Conserved Base Pairs % Size Range % Missing Pairs 2x x x x10 -4 % Half Pairs 4x x x x10 -4 % Paired % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32% Statistics from 300,000 Sanger Sequences (RDP release 10.11) Secondary-structure figures from

18-21 August 2009 V3 V6V3V4V1-V2 Canonical Positions Conserved Base Pairs % Size Range % Missing Pairs 2x x x x10 -4 % Half Pairs 4x x x x10 -4 % Paired % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32%

18-21 August 2009 V4 V6V3V4V1-V2 Canonical Positions Conserved Base Pairs % Size Range % Missing Pairs 2x x x x10 -4 % Half Pairs 4x x x x10 -4 % Paired % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32%

18-21 August 2009 V1 V2 V6V3V4V1-V2 Canonical Positions Conserved Base Pairs % Size Range % Missing Pairs 2x x x x10 -4 % Half Pairs 4x x x x10 -4 % Paired % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32%

18-21 August 2009 V6V3V4V1-V2 Canonical Positions Conserved Base Pairs % Size Range % Missing Pairs 2x x x x10 -4 % Half Pairs 4x x x x10 -4 % Paired % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32% Chance a Species’ Sequence is Identical to at Least One Other Species Based on 6,841 bacterial species type strain sequences Strain information from “The Living Tree Project” projects/living-tree/

18-21 August 2009 Chance Two Operons Differ in One Organism V6V3V4V1-V2 Canonical Positions Conserved Base Pairs % Size Range % Missing Pairs 2x x x x10 -4 % Half Pairs 4x x x x10 -4 % Paired % Aligned78%89%99%84% Identical Species 49%36%37%6% Different Operons 10%12%7%32% Based on 561 completed genome sequences with two or more rRNA operons

18-21 August 2009 V4 SangerFLX Avg. Size207 % Missing Pairs 0.3x x10 -4 % Half Pairs 8x x10 -2 % Paired62 % Aligned99% Quality of Recovered Structure

18-21 August 2009 V4 SangerFLX Avg. Size207 % Missing Pairs 0.3x x10 -4 % Half Pairs 8x x10 -2 % Paired62 % Aligned99% Quality of Recovered Structure

18-21 August 2009 Introduction to the Short Read Archive (SRA) myRDP SRA Prepkit

18-21 August 2009 SRA Submission Format

18-21 August N StudyExperiment AnalysisRunSample 1 1 N N N 1 Submission Six Different SRA Document Types

18-21 August 2009 myRDP SRA Prepkit myRDP SRA PREPKIT SEQUENCE READS XML DOCUMENTS NCBI-SRA EMBL-ERA METADATA SEQUENCING PROJECT myRDP SWS SUBMIT

18-21 August 2009

Sample Attributes Prefilled Genomic Standards Consortium MIMS (Minimal Information about a Metagenome Sequence)* *Nature Biotechnology 26, (2008)

18-21 August 2009 Functional Genes

18-21 August 2009 FGPR Home Page Screenshot

18-21 August 2009 FGPR Screenshots seed sequences active links to GenBank records active links to GenBank records organism name display/filter options custom analysis

18-21 August 2009 Functional Gene Pipeline/Repository Sequence Analysis interactive commands sub-selection for further analysis sub-selection for further analysis dynamic tree applet

18-21 August 2009 Functional Gene Processing 1)Remove Frameshifts 1)tBLASTX 2)GeneWise 2)Translate and align sequences 1)HMMER 2)MUSCLE 3)Determine conserved residues 1)Entropy plot 4)Compare to reference sequences 1)Determine functional subclass

18-21 August 2009 Entropy (Dioxygenease Genes)

18-21 August 2009 Interactive distance matrix display Couples matrix with taxonomy information Allows rapid detection of taxonomic inconsistencies Taxomatic: Interactive Taxonomy Explorer

18-21 August 2009 Integrated overlays Taxomatic: Interactive Taxonomy Explorer

18-21 August 2009 Integrated overlays Taxomatic: Interactive Taxonomy Explorer

18-21 August 2009 Integrated overlays Taxomatic: Interactive Taxonomy Explorer

18-21 August 2009 Integrated overlays Taxomatic: Interactive Taxonomy Explorer

18-21 August 2009 zoom and pan

18-21 August 2009 Can zoom down to individual sequences

18-21 August 2009 Megan Taxonomic analysis through metagenomic data

18-21 August 2009

Megan Modified k-nn LCA taxonomic classifier Requires BLAST result file Extracts taxonomy, cogs from matches Features from NCBI Prokaryotic Attributes Table

18-21 August 2009 MEGAN Screenshot1

18-21 August 2009 MEGAN Screenshot2

18-21 August 2009 MEGAN Screenshot3

18-21 August 2009 Metagenomics Analysis Pipelines Sequence Comparison

18-21 August 2009 General Considerations What databases are used? –GenBank nr (not good) –Pfam, TIGRfam, FIGfam? What search strategy is used? –BLAST, HMMER, Additional tools? Will they process my data –Will my data become public

18-21 August 2009 HMMER vs BLAST

18-21 August 2009 BMC Genomics Aziz

18-21 August 2009 The SEED & RAST Subsystems: Pathway database –Expert annotation –Curated simultaneously across many genomes FIGfams: Database of protein families – Derived from Subsystems database –Controlled addition of new family members RAST: Genome annotation system –Uses FIGfams for gene annotation –Uses Subsystems for pathway annotation

18-21 August 2009 The SEED & RAST

18-21 August 2009 fromPDF

18-21 August 2009 BMC RAST Fig. 2

18-21 August 2009 BMC RAST Fig. 4

18-21 August 2009

JGI’S IMG/M HOME

18-21 August 2009

CAMERA HOME

18-21 August 2009 CAMERA DASHBOARD

18-21 August 2009 CAMERA PROJECT SAMPLES

18-21 August 2009 Metadata Data about data

18-21 August 2009 Metadata Standards Minimum Information about a Microarray Experiment (MIAME) Minimum Information about a genome sequence (MIGS) Minimum Information about a metagenome sequence (MIMS)

18-21 August 2009 Nature Biotechnology 26, (2008)

18-21 August 2009 MIMS extension: select to report a set of uniform measurements for a given habitat: Water body: (temperature, pH, salinity, pressure, chlorophyll, conductivity, light intensity, dissolved organic carbon (DOC), current, atmospheric data, density, alkalinity, dissolved oxygen, particulate organic carbon (POC), phosphate, nitrate, sulfates, sulfides, primary production) (integer, unit) Box 1 Minimum Information about a Genome Sequence (MIGS): Habitat Specific Attributes

18-21 August 2009 To help establish a set of suggested attributes for soil sequence data In cooperation with: - The Genomic Standards Consortium - The International Soils Metagenome Sequencing Consortium (Terragenome) Soil Metadata Survey

18-21 August 2009 Soil Metadata Survey Summary Not Difficulty to obtain Importance Very Easy Hard

18-21 August 2009 Soil Metadata Survey Summary Not Difficulty to obtain Importance Very Easy Hard VERY IMPORTANT / EASY TO OBTAIN -- Chemical: pH (in water or Calcium chloride) Biological: plant cover (native) Soil/Geological: horizon Geographical: latitude and longitude, elevation Management: land use (e.g., urban, agri- culture, forestry), tillage (type), crops (current, rotation), fertilizers (type and annual amount) Climate: mean and seasonal rainfall, mean and seasonal temperatures Sampling: depth, composite design, moisture content at sampling area represented by composite sample, weight of sample used for DNA extraction

18-21 August 2009 Technology Issues Limitations of Pyrosequencing

18-21 August 2009 Gomez-Alvarez ISME Article

18-21 August 2009 Gomez-Alvarez Fig. 1 Figure 1 (a) Alignment of five sequences in a cluster demonstrates the types of sequencing errors and length variation (highlighted in gray) included in a cluster. (b) Number of reads in a cluster versus the cluster number, ordered from the largest to smallest sized cluster; both axes are plotted on a log 10 scale. (c) The best BLAST match and COG affiliation for four of the most abundant clusters in replicate soil metagenomes. (d) Distribution of exact duplicate and all replicate reads in a metagenomic dataset from soil (this study) and seawater metagenomes (Frias-Lopez et al., 2008; Mou et al., 2008). *Rep, technical replicates; +Sp, biological replicates. The number of reads in each category is presented in Table 1.

18-21 August 2009 Gomez-Alvarez Table 1 (left) Gomez-Alvarez, V., Teal, T.K., Schmidt, T.M. (July 2009) Accurate determination of microbial diversity from 454 pyrosequencing data. ISME Journal advance online publication. doi: /ismej Table 1 Total numbers of reads, exact duplicates and all replicate sequences, including duplicates, from representative metagenomic data sets Habitat (metagenome) Number of reads

18-21 August 2009 PyroNoise Article

18-21 August 2009 Pyro Fig. 1 Figure 1 | OTU number as a function of percentage sequence difference for 90 pyrosequenced 16S rRNA gene clones of known sequence. (a,b) Results are repeated for complete linkage (a) and average linkage algorithms (b).

18-21 August 2009 Pyro Fig. 2 Figure 2 | Proportion of sequences assigned to the correct OTU as a function of percentage sequence difference for pyrosequenced 16S rRNA gene clones of known sequence. (a,b) Results are repeated for complete linkage (a) and average linkage algorithms (b).

18-21 August 2009 Pyro Table 1 Quince, C., Lanzén, A., Curtis, T.P., Davenport, R.J., Hall, N., Head, I.M., Read, L.F., and Sloan, W.T. (2009) Accurate determination of microbial diversity from 454 pyrosequencing data. Nature Methods Advanced Online Publication Aug doi: /NMETH.1361