Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Similar presentations


Presentation on theme: "A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen."— Presentation transcript:

1 A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen

2 Varieties of barcoding Assignment to existing species. Identification of new species. Assignment to taxonomic levels in general

3 Motivation 1.Environmental aDNA samples. 2.Putative Neandertal DNA. Often short query sequences. –Little information. Permissive PCR conditions. –Not always from the intended locus.

4 Given a set of database reference sequences from different species – according to which criteria should we assign new query sequences to taxonomic levels? ?

5 True species assignment Requires proper population genetic analyses quantifying variablity within species. Often not possible... –small database sample size for each species. –short query PCR products.

6 Phylogenetic alternative -Purely phylogenetic criteria which ignore population genetic problems. -Taxonomic annotation of database sequences is used to map phylogenetic groups to taxonomic levels. -The simpler approach has its own advangates: Less data required / Fewer assumptions

7 Monophyletic taxonomic group Ingroup or outgroup? Query

8 Estimating trees Estimation of a single tree is not sufficient because of the uncertainty regarding the phylogeny. We suggest instead to use a Bayesian approach which quantifies this uncertainty

9 Bayesian approach Let Q be the query sequence, X the database data, G a gene tree, and F a desired taxonomic group, then where G i is the ith gene tree sampled from p(G | X).

10 Assignment pipeline Summary Statistics Query Sequence Homology set Taxonomy summary Sampled trees Alignment Database (GenBank) NCBI blast Retrieval of sequences and taxonomy annotation ClustalW MrBayes

11 Summary statistics For each tree: –Find the sister clades to the query. –Find the consensus taxonomy for each clade. –Pick sister clade with most specific consensus taxonomy. For each taxonomic rank: –Find the fraction of consensus taxonomies that include taxonomic names of that rank.

12 Summary statistics For each tree: –Find the sister group to the query. –Find the list of taxonomic levels shared by the sequences in the sister group (consensus taxonomy) Sister groupQuery

13 Summary statistics For each tree: –Find the sister group to the query. –Find the list of taxonomic levels shared by the sequences in the sister group (consensus taxonomy) For each name of each taxonomic level: –Find the fraction of samples trees where the consensus taxonomy include that name.

14 Example taxonomy summary

15 Environmental Samples 379 environmental samples (aDNA) RBCL and TRNL markers. Aim is the identification of environmental flora

16 Orders >90% AsteralesBrassicalesCaryophyllalesConiferales DipsacalesEricalesFabalesFagales LamialesLepidopteraMalpighialesPoales PottialesRanunculalesRosalesSapindales SaxifragalesSolanalesZingiberales

17 Families >90% AmaranthaceaeAsteraceaeBetulaceaeBrassicaceae CaprifoliaceaeCaryophyllaceaeEricaceaeFabaceae FagaceaeJuncaceaeMusaceaePapaveraceae PinaceaePlantaginaceaePoaceaeRosaceae RutaceaeSalicaceaeSaxifragaceaeSolanaceae TaxaceaeTheaceae

18 Genera >90% AchilleaAlnusAruncusCerastium FagusMusaPiceaPinus PlantagoPoaSaxifragaSymphoricarpos Taxus

19 Botanical evaluation Temperate climate similar to central Sweden.

20 Testing putative Neandertal DNA Needless to say we have had several negative examples... One positive example: –Posterior probability of 91%.

21 Testing putative Neandertal DNA Needless to say we have had several negative examples... One positive example: –Posterior probability of 91%. Croatian squence with Neandertal characteristics point mutations. –sapiens sapiens with post prob. 67%

22 Problems No population genetic modelling: –Outgroup problem. –Species issues are is not addressed. –Lineage sorting - not reciprocal monophyli. Incomplete database

23 Advantages Phylogenetic uncertainty and statistical uncertainty of assignment is addressed. Posterior probability of assignment. Alternative to single tree assignment. Can be used on any database.

24 Conclusions The phylogenetic barcoding does not model the coalescence process. It is the appropriate method for assignment with little data, or when assigning to higher taxonomic levels. Bayesian approach offers a measure of confidence in assignment.


Download ppt "A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen."

Similar presentations


Ads by Google