Download presentation
Presentation is loading. Please wait.
1
Species Richness in Soil Bacterial Communities: A Proposed Approach to Overcome Sample-Size Bias
Noha Youssef, Mostafa Elshahed Oklahoma State University, Stillwater, OK Funding Provided By NSF Microbial Observatories Program JGI-Laboratory sequencing program OSU Start-up funds to Mostafa Elshahed N-229 Abstract Estimates of species richness based on 16S rRNA gene clone libraries are increasingly utilized to gauge the level of bacterial diversity within various ecosystems. However, previous studies have indicated that regardless of the method utilized, species richness estimates obtained is dependent on the size of clone libraries analyzed. We here propose an approach to overcome sample size bias in species richness estimates in complex microbial communities. Parametric (Maximum likelihood-based and rarefaction curve-based) and nonparametric approaches were used to estimate species richness in a library of 13,001 near full-length 16S rRNA clones derived from soil, as well as in multiple subsets of the original library. Species richness estimates obtained increased with the increase in library size. To obtain a sample size-unbiased estimate of species richness, we calculated the theoretical clone library sizes required to encounter the estimated species richness at various clone library sizes, used curve fitting to determine the theoretical clone library size required to encounter the “true” species richness, and subsequently determined the corresponding sample size-unbiased species richness value. Using this approach, sample size unbiased estimates of 17,230, 15,571, and 33,912 were obtained for the ML-based, rarefaction curve-based, and ACE-1 estimators, compared to bias uncorrected values of 15,009, 11,913, and 20,909, respectively. Methods: DNA extraction from 0.5 g of soil using a modified lysis and bead-beating protocol. PCR: A near complete 16S rRNA gene fragment was amplified using primer pair 27F and 1391R. Cloning: Invitrogen TOPO-TA cloning kit. Sequencing: 13,486 near complete 16S rRNA gene clones were generated, and sequenced at the Department of Energy Joint Genome Institute Alignment/ Distance matrix: Greengenes OTU assignment: DOTUR Chimera check: using Mallard, 485 sequences were identified as potential chimeras and removed from the dataset. The remaining 13,001 sequences were designated Kessler farm soil clones (KFS clones). Subsets of 100, 500, 1000, and 3289 clones were randomly drawn from the 13,001 KFS dataset and treated as smaller clone libraries. Species richness estimates for all subset clone libraries were estimated using all previously described parametric and nonparametric methods. Regardless of the model used, the estimate of species richness increased with sample size. The estimate versus sample size plot was generally linear until sample size 3289-clone after which it started to approach an asymptote. Effect of sample size on various species richness estimates. Towards a sample-size unbiased estimate of species richness. Since species richness estimate increases with clone library size, the estimates species richness is only a fraction of the true richness. To obtain a sample-size unbiased estimate of species richness we proposed the following approach. Using parametric ML-based species richness estimates (SRest) obtained at different clone library sizes, we calculated the theoretical clone library sizes required to observe the absolute majority (99, 99.9, or 99.99%) (CLth-99, CLth-99.9, or CLth-99.99) of the SRest at different actual clone library sizes (CLact). Repeating the procedure described above with rarefaction curve-based, and ACE-1 estimators, values of 15,571, and 33,912 were obtained as a sample size-unbiased SR estimate, respectively. The library size at which CLact = CLth i.e. the point at which theoretical clone library effort is met was 6.3 X 106. ML-based SREst Vs CLth-99 suggested 17, 230 as a sample size-unbiased SR estimate Species richness estimates used Parametric Non-Parametric Rarefaction curve-based Fitting models: Michaelis Menten (MM), MM with intercept, double MM, negative exponential, and negative exponential with intercept ML-based ACE, ACE-1 Chao, Chao_bc Poisson Negative Binomial Pareto-mixed Poisson Lognormal-mixed Poisson Inverse Gaussian-mixed Poisson Mixed exponential-mixed Poisson EstimateS Introduction Culture-independent 16S rRNA gene-based analysis has been utilized to study bacterial, archaeal, and eukaryotic diversity from various habitats. Collectively, these studies demonstrated that the scope of microbial diversity is far broader than previously implied based on cultivation analysis. The sizes of analyzed clone libraries have steadily been increasing which allowed for the utilization of various statistical approaches in evaluating microbial diversity (Dunbar, 2002; Schloss, 2006; Joen, 2006; Hong, 2006; Hughes, 2001). Species richness estimation methods are either parametric or nonparametric (O' Hara, 2005). Nonparametric estimators have been more commonly used in microbial ecology (Stach, 2004; Schloss, 2005; Hill, 2003). Parametric methods (maximum likelihood-,ML, and rarefaction curve-based) have recently been used to estimate bacterial and microeukaryotic diversities in complex microbial communities (Joen, 2006; Hong, 2006; Behnke, 2006; Stoeck, 2007; Zuendorf, 2006; Roesch, 2007). Regardless of the approach utilized for species richness determination, it has been observed that the estimated species richness value (SRest) is dependent on the size of the clone library used in the calculation (Dunbar, 2002; Roesch, 2007; Schloss, 2006). The problem could theoretically be rectified by adequate sampling effort to encounter all bacterial species in the sample. However, in complex microbial communities, a large fraction of the bacterial species is present in low abundance and even with recent advances in sequencing technologies (Margulies, 2005), a complete census of all bacterial species remains a daunting task. Comparison of mixed-Poisson parametric models used to fit the Kessler Farm Soil dataset frequency dataa References Behnke A, et al Appl. Environ. Microbiol. 72, Dunbar J. et al Appl. Environ. Microbiol. 6, Joen S-O, et al Appl. Environ. Microbiol. 72, Hill TCJ, et al FEMS Microbiol. Ecol. 43, 1-11. Hong S-H, et al Proc. Natl. Acad. Sci. USA 103, Hughes JB, et al Appl. Environ. Microbiol. 67, Marguiles M, et al Nature 437, O’Hara RB J. Animal Ecol. 74, Roesch LFW, et al ISME J. 1, Schloss PD & Handleman J PLoS Comput. Biol. 2, Schloss PD & Handleman J Appl. Environ. Microbiol. 71, Stach EM & Bull AT Antonie Van Leeuwenhoek 87, 3-9. Stoeck T, et al PLos One 8, e278. Zuendorf A, et al FEMS Microbiol. Ecol. 58, We utilized a large 16S rRNA gene clone library (13,001) to demonstrate the problems associated with SR determination. In spite of the large size of the library (13,001 clones), all SRest values continued to increase regardless of the estimation method used. We used a ML-based, rarefaction curve-based, and nonparametric-based estimate for our sample size-unbiased approach. A sample size-unbiased estimate (17,230) is still 15% higher than the SRest value obtained using the entire 13,001 clone library (15,009). The approach highlights the value of utilizing large clone library datasets in SR estimates. The larger the clone library, the closer the rarefaction curve, CLth Vs CLact, and CLth Vs SRest plots will be to reaching an asymptote. Pyrosequencing-based approaches could be useful for obtaining large 16S rRNA gene datasets from various habitats. Conclusions and Discussions Comparison of different models used to fit the rarefaction curve in Kessler Farm Soil dataseta Nonparametric estimators of species richness for Kessler Farm Soil dataset. Study Site An undisturbed tall grass prairie preserve in Kessler Farm Field Laboratory biological research station in central Oklahoma
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.