Comparative metagenomics quantifying similarities between environments

Slides:



Advertisements
Similar presentations
BioInformatics (3).
Advertisements

A novel method for measuring codon usage bias and estimating its statistical significance Codon usage bias or CUB, a phenomenon in which synonymous codons.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
The Computational Biology of Genetically Diverse Assemblages Allen Rodrigo 1, Frederic Bertels 1, Mehul Rathod 2, Sean Irvine 2, John Cleary 2,3, Peter.
Molecular Evolution Revised 29/12/06
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Bioinformatics and Phylogenetic Analysis
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Protein Sequence Classification Using Neighbor-Joining Method
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Metagenomics Binning and Machine Learning
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
Lecture 4 – Characters: Molecular First used by Luca Cavalli-Sforza and Anthony Edwards.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.
Bioinformatics Brad Windle Ph# Web Site:
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Metagenomics Assembly Hubert DENISE
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
CompostBin : A DNA composition based metagenomic binning algorithm Sourav Chatterji *, Ichitaro Yamazaki, Zhaojun Bai and Jonathan Eisen UC Davis
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
What is BLAST? Basic BLAST search What is BLAST?
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Canadian Bioinformatics Workshops
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Noriko Cassman CMGT Rio de Janeiro, Brasil Nov. 17, 2012
Phylogenetic genome analysis, phylogenomics
Rob Edwards San Diego State University
Canadian Bioinformatics Workshops
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
by Trina Rytwinski Carleton University, Ottawa, Ontario
Data Mining: Concepts and Techniques
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Presented By: Chinua Umoja
Metafast High-throughput tool for metagenome comparison
Basics of BLAST Basic BLAST Search - What is BLAST?
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Metagenomics Image: Iverson et al. 2012, Science.
Overview Identify similarities present in biological sequences and present them in a comprehensible manner to the biologists Objective Capturing Similarity.
This presentation uses animations and is best viewed as a slide show.
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
The ability of the SOP to sequence and identify unknown samples.
Basic Local Alignment Search Tool
Protein structure prediction.
Lecture 7 – Algorithmic Approaches
Screenshot of JCVI's Advanced Reference Viewer ( jcvi
Taxonomic identification and phylogenetic profiling
Phrap assemblies visualized with the Consed (53) program.
Community diversity and metagenome depth interact to influence assembly quality. Community diversity and metagenome depth interact to influence assembly.
Transcript length distribution resulting from different assemblies of the embryo samples across the three technologies (HiSeq, MiSeq, and PacBio). Transcript.
MetaPhase clustering results on the M-Y draft metagenome assembly.
Nonmetric multidimensional scaling (NMDS) (A) and dendrogram and heat map (B) of the relative abundances of the data in the custom database across the.
Comparison of species and function profiles with ultradeep sequencing data. Comparison of species and function profiles with ultradeep sequencing data.
Taxonomic composition over time was determined by (A) metagenomic sequencing and (B) qualitative culturomics. Taxonomic composition over time was determined.
Incorporating uncertainty in distance-matrix phylogenetics
Relative proportions of taxa and UPGMA hierarchical clustering of the mock communities. Relative proportions of taxa and UPGMA hierarchical clustering.
Toward Accurate and Quantitative Comparative Metagenomics
Intrastrand biases (nucleotide skew) in the T4 genome.
General overview of the bioinformatic pipelines for the 16S rRNA gene microbial profiling and shotgun metagenomics. General overview of the bioinformatic.
Presentation transcript:

Comparative metagenomics quantifying similarities between environments Bas E. Dutilh

Taxonomic or functional profiles Kip et al. Env. Microbiol. Rep. 2011 Boleij et al. Mol. Cell. Proteomics 2012 Trindade-Silva et al. PLoS ONE 2012

Clustering profiles Calculate pairwise distances Create cladogram

Clustering profiles Calculate pairwise distances Create cladogram (BioNJ) Gascuel Mol. Biol. Evol. 1997

Clustering profiles Calculate pairwise distances Manhattan distance Correlation between profiles High correlation ↔ similar environment Low correlation ↔ dissimilar environment Angle between vectors in n-dimensional space Small angle ↔ similar environment Large angle ↔ dissimilar environment Wootters distance between profiles Create cladogram (BioNJ) taxa / functions → frequency → freq taxon 2 → freq taxon 1 → ... ← freq taxon 3 Wootters Phys. Rev. D 1981

Metagenomes of water and water animals BlastN reads against Genbank, E-value ≤10−5 Taxonomic profiles including parent clades Wootters distance formula BioNJ cladogram Trindade-Silva et al. PLoS ONE 2012

Viral metagenomic samples % reads used (BlastN mapping to Genbank) human water BlastN reads against Genbank, E-value ≤10−3 Taxonomic profiles including parent clades Distance = 1 minus correlation BioNJ cladogram Dutilh et al. Bioinformatics 2012

Many unknowns in viral metagenomes Mokili et al. Curr. Opin. Virology 2012

K-mer (k=2) clustering Willner et al. Env. Microbiol. 2009

A sequence of 3 million nucleotides 2-mer profiles 4 * 4 = 16 dimensions (2-mers) ~3 million steps in each dimension 16 * 3.000.000 = 48.000.000 possibilities 4-mer profiles 4 * 4 * 4 * 4 = 256 dimensions (4-mers) 256 * 3.000.000 = 768.000.000 possibilities Sequencing reads or contigs (~200 nt) 4200 = 2.5 * 10120 possibilities

Cross-assembly Interpret viromes in terms of one another Combine sequencing reads from different metagenomes in a single assembly Use your favorite assembly tool Cross-contigs contain reads from more than 1 sample Cross-contigs directly represent the overlap between samples We interpret contigs as “metagenomic entities” Ready for e.g. BLAST searches Dutilh et al. Bioinformatics 2012

Sample size (# contigs) → Distance formulas Similarity based on number of cross-contigs (# cross-contigs) → Similarity Sample size (# contigs) →

Distance formulas Large metagenomes may share more cross-contigs with unrelated large samples than with closely related small samples Size correction necessary ← Minimum of the two sample sizes (# contigs) (# cross-contigs) → Similarity Sample size (# contigs) →

Sample size (# contigs) → Distance formulas Minimum metagenome size Weighted average metagenome size (SHOT) (# cross-contigs) → Similarity Sample size (# contigs) → Korbel et al. Trends Genet. 2002

Distance formulas Contig content gives qualitative distances: what metagenomic entities are there? Quantitative distances can be calculated by taking the number of incorporated reads into account reads in ctg2 → reads in ctg1 → ← reads in ctg3 Dutilh et al. Bioinformatics 2012

Simulated metagenomes 30 species each with decreasing overlap Firmicutes Proteobacteria Dutilh et al. Bioinformatics 2012

Six simulated metagenomes of ~25 Firmicutes each Three simulated metagenomes of ~25 Actinobacteria each Increasing noise (0-100% Proteobacteria) 0% 30% 60% 90% 10% 40% 70% 100% 20% 50% 80% Distance → Dutilh et al. Bioinformatics 2012

Six simulated metagenomes of ~25 Firmicutes each Three simulated metagenomes of ~25 Actinobacteria each Increasing noise (0-100% Proteobacteria) Dutilh et al. Bioinformatics 2012

Cladogram water human BlastN crAss Dutilh et al. Bioinformatics 2012

Similar numbers of utilized reads human water Percentage of reads used → Metagenomes → Dutilh et al. Bioinformatics 2012

http://edwards.sdsu.edu/crass/ Stand-alone on SourceForge Dutilh et al. Bioinformatics 2012

2 or 3 samples: no cladogram Dutilh et al. Bioinformatics 2012

Experiment

Cross-assembly Advantages: Fast programs available (Newbler) No full-length homology necessary Independent of reference database Sequence of shared entities for further analysis

Cross-assembly Dutilh et al. In preparation

Acknowledgements Robert Schmieder Jim Nulton Ben Felts Peter Salamon Robert A. Edwards John L. Mokili http://edwards.sdsu.edu/crass/