Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas.

Similar presentations


Presentation on theme: "Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas."— Presentation transcript:

1 Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas

2 Overview Human Microbiome Project 16S rRNA Reference and Test Sets Classifiers Accuracy of Classifications Results

3 Human Microbiome Project (HMP) Microorganism communities Human development Physiology Immunity Disease Nutrition Core Microbiome http://nihroadmap.nih.gov/hmp/

4 16S rRNA 16S Ribosomal RNA Large RNA component of the small subunit of the ribosome Phylogenetic Markers Species Identification 1542 bp

5 Using 16S for Species Identification Classifier Sequence Predicted Classification

6 Project Goal New Sequencing Technology Evaluate the accuracy of the classification of the 16S rRNA across different: Classifiers Regions of the sequence Phylogeny

7 Reference Dataset RDP Core Set Trusted Taxonomies 6,621 sequences Phylum: 27 Class: 43 Order: 97 Family: 258 Genus: 1352

8 GreenGenes’s Full Collection of Sequences Full Collection used by GreenGenes High phylogenetic diversity 188,073 sequences 188,073

9 Comparison of Taxonomy Predictions by Method Classified GreenGenes Core Set Using: RDP (Naïve Bayesian) kmerRank Blast All Match 135,269 sequences Phylum: 27 Class: 43 Order: 96 Family: 257 Genus: 1335 135,269 188,073

10 None Match: 19588 135269 32334 4934 15949 BLAST kmerRank RDP None Match 19588

11 CD-hit: Normalizing Genus Representation 3% difference between genera 21,179 sequences Phylum: 27 Class: 43 Order: 96 Family: 235 Genus: 1241 Li, 2006 188,073 135,269 21,179

12 Sliding Window: Producing our Localized Regions Van de Peer, 1996 Sliding Window Approach 300 bp window 25 bp overlap Sanger vs. 454-XLR = Full-length vs. localized region

13 Overall Accuracy of the Three Different Classifiers

14 Average BLASTN:.843 kmerRank:.830 RDP:.831

15 Overall Accuracy of the Three Different Classifiers Average BLASTN:.843 kmerRank:.830 RDP:.831 Standard Deviation BLASTN:.031 kmerRank:.030 RDP:.017

16 Genus Prediction Accuracy (per Phylum)

17 Average BLASTN:.843 kmerRank:.830 RDP:.831 Standard Deviation BLASTN:.107 kmerRank:.153 RDP:.142 Genus Prediction Accuracy (per Phylum)

18 Finding the 16S Region Providing the Most Reliable Prediction Accuracy

19 Clustering Phyla and Methods by Prediction Accuracy

20 Best method is Phylum-dependent Variation in accuracy impacted by depth of species coverage

21 Summary Central region of 16S is the most accurate, on average Of the methods examined, BLAST is most accurate across all 16S regions and all phyla, on average RDP-bayes is least variable across short sequence regions Best short sequence classification method is phylum-dependent

22 Acknowledgements Genome Sequencing and Analysis Program Brian Haas Dirk Gevers Michael Feldgarden Doyle Ward Chad Nusbaum Bruce Birren Administration Shawna Young Lucia Vielma Maura Silverstein


Download ppt "Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas."

Similar presentations


Ads by Google