Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.

Slides:



Advertisements
Similar presentations
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Advertisements

Multivariate analysis of community structure data Colin Bates UBC Bamfield Marine Sciences Centre.
Metabarcoding 16S RNA targeted sequencing
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Yaron Fireizen, Vinay Rao, Lacy Loos, Nathan Butler, Dr. Julie Anderson, Dr. Evan Weiher ▪ Biology Department ▪ University of Wisconsin-Eau Claire From.
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
PCR - Polymerase Chain Reaction PCR is an in vitro technique for the amplification of a region of DNA which lies between two regions of known sequence.
Microbial Diversity.
A PCR-generated chimeric sequence usually comprises two phylogenetically distinct parent sequences and occurs when a prematurely terminated amplicon reanneals.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
Lecture 12 Splicing and gene prediction in eukaryotes
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
The Microbiome and Metagenomics
Molecular Microbial Ecology
Salinity drives archaeal distribution patterns in high altitude lake sediments on Tibetan Plateau Yongqin Liu, Tandong Yao Institute of Tibetan Plateau.
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Development and Evaluation of a Comprehensive Functional Gene array for Environmental Studies Zhili He 1,2, C. W. Schadt 2, T. Gentry 2, J. Liebich 3,
Probes can be designed in an evolutionary hierarchy.
Cornell University 2009 ASA-CSSA-SSSA Meetings High C/N ratio Refugia pH & aeration Physico- chemical sorption Surface change Microbes Nutrients Amending.
Diversity of uncultured candidate division SR1 in anaerobic habitats James P. Davis Microbial & Molecular Genetics Oklahoma State University.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
Work by Antonio Izzo Based on 36 soil cores from a total of 9 plots contained within a 2.5 hectare region.
SEMIARID GRASSLAND: Soil and root-associated fungal communities Dominant root-associated fungi Methodology Bouteloua gracilis, B. eriopoda, Sporobolus.
Microbial diversity: a super quick intro, I swear Meade Krosby.
 16S rRNA gene marker  intra-gene variability  primer selection  size & information content Primer selection, information content, alignment and length.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Roadmap for Soil Community Metagenomics of DOE’s FACE & OTC Sites
DNA barcoding: bane or boon (or both) for taxonomy? Donal A. Hickey, Concordia University, Montreal. Collaborators: Mehrdad Hajibabaei and Gregory Singer.
Introduction to Phylogenetics
Diversity and quantification of candidate division SR1 in various anaerobic environments James P. Davis and Mostafa Elshahed Microbiology and Molecular.
Microbial biomass and community composition of a tallgrass prairie soil subjected to simulated global warming and clipping A. Belay-Tedla, M. Elshahed,
Genomics and Forensics
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
The Microbiome and Metagenomics
PCB 3043L - General Ecology Data Analysis.
Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.
Northern Star Coral (Astrangia poculata) Populations from the New Jersey Coast. Abstract- This project investigated the distribution and molecular evolution.
Accurate estimation of microbial communities using 16S tags
University of Essex BIODEEP-WP3 Analysis of species diversity, community structures and phylogeny of microorganisms and meiofauna in the Mediterranean.
University of Essex BIODEEP-WP3 Analysis of species diversity, community structures and phylogeny of microorganisms and meiofauna in the Mediterranean.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Valentin Vasselon 1, Agnès Bouchez 1, Isabelle Domaizon 1, Maria Kahlert 2, Frédéric Rimet 1 Towards standardization of DNA extraction for next- generation.
Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
16S rRNA Experimental Design
Noha Youssef, Mostafa Elshahed
Metagenomic Species Diversity.
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Micelle PCR reduces artifact formation in 16S microbiota profiling
PNAS 2012 Alpha diversity: how many species are in each sample?
Gene expression from RNA-Seq
Research in Computational Molecular Biology , Vol (2008)
H = -Σpi log2 pi.
Exploration of the data set with age as a continuous variable.
Daniel A. Peterson, Daniel N. Frank, Norman R. Pace, Jeffrey I. Gordon 
Correlation and Regression
Comparison of DNA extraction methods.
Chapter 19 Molecular Phylogenetics
Ruth E. Ley, Daniel A. Peterson, Jeffrey I. Gordon  Cell 
Daniel A. Peterson, Daniel N. Frank, Norman R. Pace, Jeffrey I. Gordon 
Toward Accurate and Quantitative Comparative Metagenomics
Relative abundance of taxa in the 16S rRNA PCR amplicon and gDNA mock communities. Relative abundance of taxa in the 16S rRNA PCR amplicon and gDNA mock.
Presentation transcript:

Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule is made of sites with varying levels of evolutionary conservation, then the proportion of these sites in a specific amplicon would impact the pair wise distance values obtained in the dataset. To this end, we used the classification, put forward by the reviews of Baker et al. (2), and Van de Peer et al. (10), of all base pairs in the 16S rRNA gene of E. coli into conserved (C), variable (V), and highly variable (HV) to determine the % of C, V, and HV base pairs in each of the pyrosequencing fragments and compared it to the near full- length fragment. We used multiple regression and tested all possible combinations of percentages and ratios of C, V, and HV bases. The best model equation obtained was y (slope)= (30.5 x C/total) + (11.5 x HV/V) - (27.9 x HV/total) - (8.5 x C/V) + (5.25 x HV/C) – (0.001 x length) A Comparative Study of Species Richness Estimates Obtained Using Near Complete Fragments and Simulated Pyrosequencing-Generated Fragments in 16S rRNA Gene-Based Environmental Surveys N. H. Youssef 1, C. S. Sheik 2, L. R. Krumholz 2, F. Z. Najar 2, B. A. Roe 2, M. S. Elshahed 1 ; 1 Oklahoma State Univ., Stillwater, OK, 2 Univ. of Oklahoma, Norman, OK. N-088 Abstract It is not yet clear how the number of operational taxonomic units (OTUs), and hence species richness estimates, determined using pyrosequencing-generated fragments correlate with those assigned using near full-length 16S rRNA gene fragments. We constructed a 16S rRNA clone library from an undisturbed tall grass prairie soil (1132 clones), and used it to compare species richness estimates using 8 pyrosequencing-candidate fragments ( bp in length) to the near full-length fragment. While fragments encompassing the V1+V2, and V6 regions overestimated species richness, those encompassing V3, V7, and V7+V8 regions underestimated species richness, and those encompassing the V4, V5+V6, and V6+V7 provided estimates comparable to the near full-length fragment. Similar results were obtained when analyzing three other datasets. Regression analysis indicated base variability within an examined fragment could potentially explain those differences. Introduction Typical culture-independent 16S rRNA gene surveys of highly diverse ecosystems allow for the identification of only abundant members of the communities (1). Estimates obtained are highly dependant on sample size. The large number of 16S rRNA gene sequences produced with pyrosequencing (7) allows access to rare members of the community (4), as well as a relatively more accurate estimation of species richness. However, it is unclear how pair wise distances, and hence operational taxonomic unit (OTU) assignments and species richness estimates, computed using various shorter fragments will correlate to those computed using near complete 16S rRNA gene. Here, we constructed, sequenced, and analyzed a 16S rRNA library of 1132 clones, and compared OTU numbers, and species richness values obtained using the full-length datasets, and fragments simulating pyrosequencing output. We show that the choice of the pyrosequenced fragment could impact the number of OTUs, and species richness estimates with some fragments underestimating and others overestimating species richness when compared to longer near complete 16S rRNA gene fragments. Further, we established a regression analysis that explains the nature of the observed discrepancy using the proportion of the hypervariable, variable, and conserved bases within a fragment. Comparing species richness estimates in short and long fragments at various taxonomic cutoffs in Soil Okla-A clone library. All three species richness estimation methods as well as slopes of scatter plots were in general agreement with each other, as well as with results obtained from OTU assignments in describing the relationship between long and short fragments. Table 3. Parametric species richness estimates obtained using the near full-length sequences and each of the 8 short simulated regions studied at 5 different taxonomic cutoffs for Soil-Okla-A clone library Table 1. Variable sites encompassed, and base composition for the short simulated regions studied and the near full-length fragment Percentage of V (variable), HV (highly variable), and C (conserved) bases. NFL: Near full-length sequences Percentage of bases Regions Variable regions V HV C V1+V V V V5+V V V6+V V V7+V NFL Comparison of number of OTUs obtained using near complete and shorter fragment The number of OTUs obtained using short simulated fragments ranged between 0.44 to 2.10 times the values obtained using the near-full length16S. Fragments encompassing regions V1+V2 and V6 overestimated the number of OTUs at all taxonomic cutoffs. Fragments encompassing V3, V7, and V7+V8 regions underestimated OTU numbers. Fragments encompassing V4, and V5+V6, and V6+V7 gave, in general, comparable OTU numbers to the full sequence, as further evidenced by slope values of 0.97, 1, and 0.98, respectively (Table2). Materials and methods Site. Undisturbed tall grass prairie soil in central Oklahoma. DNA extraction. FastDNA spin kit for soil. PCR and cloning. Primers 8f-1492r. TOPO-TA cloning kit. Chimera. Bellerophon (version 3) function on Greengenes. Alignments. ClustalX program, Greengenes NAST aligner Clipping of shorter fragments. Jalview (3). Distance matrix, OTU assignments. PAUP. DOTUR. Scatter plots slopes. Species richness estimates. Chao, and ACE estimators. Six parametric distributions ( Other environments. Another soil ecosystem (5), digestive tract of Zebrafish(8), and ocean floor microbial community (9). Regression analysis. Multiple regression using MS Excel. Comparing OTUs, species richness estimates and slopes of scatter plots inshort and long fragments in libraries derived from other ecosystems. Trends obtained from OTU determinations and scatter plot slopes of: a Trembling Aspen soil (1152 clones), the digestive tract of Zebrafish (612 clones), and microbial communities inhabiting the ocean crust in the east pacific ridge (902 clones) were strikingly similar to those observed with soil Okla-A clone library (Table 4). Species richness estimates for these three environments mirrored the same trends (data not shown). Conclusions Regions V1+V2, as well as V6 overestimate diversity, regions V3, V7, and V7+V8 underestimate diversity, while regions V4, V5+V6, and V6+V7 give comparable estimate to near full-length fragments. This pattern held true for the various environments tested. The bias in species richness estimates could readily be explained by base variability. While previous studies suggested using region V4 for phylogenetic studies (6, 11), our evaluation of species richness suggests that V4, V5+V6, and V6+V7 regions provide estimates closest to longer fragments. Collectively, the V4-encompassing region appears to provide the best choice for both phylogenetic assignments and estimates consideration. Based on this study, we recommend the use of fragments (V4, V5+V6, V6+V7) for pyrosequencing studies concerned with species-richness determination in microbial communities. References 1.Axelrood, P. E., et al Can. J. Microbioil. 48: Baker, G. C., et al J. Microbiol. Methods 55: Clamp, M., et al Bioinformatics 20: Huber, J. A., et al Science 318: Lesaulnier, C., et al Environ. Microbiol. 10: Liu, Z., et al Nucleic Acids Res. 35: Margulies, M., et al Nature 437: Rawls, J. F., et al Cell 127: Santelli, C. M., et al Nature 453: Van de Peer, Y., et al Nucleic Acids Res. 24: Wang, Q., et al Appl. Environ. Microbiol. 73: Table 4. Slopes obtained for 3 different clone libraries derived from soil, zebrafish gut, and ocean floor as compared to KFS. Table2. Number of OTUs and ratios of species richness estimates obtained using the near full-length sequences and each of the 8 short simulated regions studied at 5 different taxonomic cutoffs for Soil-Okla- A clone library. EnvironmentV1+V2V3V4V5+V6V6V6+V7V7V7+V8 Trembling Aspen soil Zebrafish gut Basalt Oceanic Floor KFS References