Presentation is loading. Please wait.

Presentation is loading. Please wait.

DNA Barcode sequence identification incorporating taxonomic hierarchy and within taxon variability Damon P. Little Cullman Program for Molecular Systematics.

Similar presentations


Presentation on theme: "DNA Barcode sequence identification incorporating taxonomic hierarchy and within taxon variability Damon P. Little Cullman Program for Molecular Systematics."— Presentation transcript:

1 DNA Barcode sequence identification incorporating taxonomic hierarchy and within taxon variability Damon P. Little Cullman Program for Molecular Systematics Studies The New York Botanical Garden, Bronx, New York

2

3 test data sets (Little and Stevenson 2007) gymnosperm nuclear ribosomal internal transcribed spacer 2 (nrITS 2) 1,037 sequences 413 species 71 genera gymnosperm plastid encoded maturase K (matK) 522 sequences 334 species 75 genera

4 …alignment locussequences median unaligned length (IQR) aligned length nrITS 2 all137 (108–250) bp8,733 bp one per species196 (115–260) bp6,778 bp matK all1,561 (1,412–1,661) bp3,975 bp one per species1,601 (1,530–1,661) bp3,906 bp

5 pairwise divergence locussequencesmedianinterquartile range zero comparisons nrITS 2 all30.99%26.53–34.48%0.09% one per species29.39%25.75–33.30%0.21% matK all20.39%5.95–23.30%0.54% one per species21.38%8.13–23.89%0.42%

6 measuring precision and accuracy

7

8

9

10 precision methodnrITS2matK parsimony ratchet58% (13%)71% (41%) SPR search60% (11%)70% (41%) neighbor joining65% (8%)44% (23%) BLAST94% (81%)99% (67%) BLAT94% (82%)99% (69%) megaBLAST94% (80%)99% (61%) BLAST/parsimony ratchet86% (74%)77% (55%) BLAST/SPR87% (73%)76% (53%) BLAST/neighbor joining93% (71%)95% (56%) DNA–BAR98% (89%)100% (79%) DOME ID80% (80%)60% (60%) ATIM100% (83%)100 (67%)

11

12

13

14

15 accuracy to species methodnrITS2matK parsimony ratchet67% (46%)77% (60%) SPR search69% (47%)78% (58%) neighbor joining68% (42%)75% (52%) BLAST67% (63%)84% (68%) BLAT66% (62%)82% (67%) megaBLAST72% (68%)84% (64%) BLAST/parsimony ratchet78% (67%)80% (60%) BLAST/SPR79% (67%)78% (61%) BLAST/neighbor joining80% (64%)86% (56%) DNA–BAR65% (62%)73% (62%) DOME ID67% (66%)50% (50%) ATIM83% (71%)87% (53%)

16 lessons learned

17 “global” alignments do not work

18 precision methodnrITS2matK parsimony ratchet58% (13%)71% (41%) SPR search60% (11%)70% (41%) neighbor joining65% (8%)44% (23%) BLAST94% (81%)99% (67%) BLAT94% (82%)99% (69%) megaBLAST94% (80%)99% (61%) BLAST/parsimony ratchet86% (74%)77% (55%) BLAST/SPR87% (73%)76% (53%) BLAST/neighbor joining93% (71%)95% (56%) DNA–BAR98% (89%)100% (79%) DOME ID80% (80%)60% (60%) ATIM100% (83%)100 (67%)

19 accuracy to species methodnrITS2matK parsimony ratchet67% (46%)77% (60%) SPR search69% (47%)78% (58%) neighbor joining68% (42%)75% (52%) BLAST67% (63%)84% (68%) BLAT66% (62%)82% (67%) megaBLAST72% (68%)84% (64%) BLAST/parsimony ratchet78% (67%)80% (60%) BLAST/SPR79% (67%)78% (61%) BLAST/neighbor joining80% (64%)86% (56%) DNA–BAR65% (62%)73% (62%) DOME ID67% (66%)50% (50%) ATIM83% (71%)87% (53%)

20 “fuzzy” matches are not precise

21 precision methodnrITS2matK parsimony ratchet58% (13%)71% (41%) SPR search60% (11%)70% (41%) neighbor joining65% (8%)44% (23%) BLAST94% (81%)99% (67%) BLAT94% (82%)99% (69%) megaBLAST94% (80%)99% (61%) BLAST/parsimony ratchet86% (74%)77% (55%) BLAST/SPR87% (73%)76% (53%) BLAST/neighbor joining93% (71%)95% (56%) DNA–BAR98% (89%)100% (79%) DOME ID80% (80%)60% (60%) ATIM100% (83%)100 (67%)

22 accuracy to species methodnrITS2matK parsimony ratchet67% (46%)77% (60%) SPR search69% (47%)78% (58%) neighbor joining68% (42%)75% (52%) BLAST67% (63%)84% (68%) BLAT66% (62%)82% (67%) megaBLAST72% (68%)84% (64%) BLAST/parsimony ratchet78% (67%)80% (60%) BLAST/SPR79% (67%)78% (61%) BLAST/neighbor joining80% (64%)86% (56%) DNA–BAR65% (62%)73% (62%) DOME ID67% (66%)50% (50%) ATIM83% (71%)87% (53%)

23 autoapomorphies (unique characters) work... but not always present

24 precision methodnrITS2matK parsimony ratchet58% (13%)71% (41%) SPR search60% (11%)70% (41%) neighbor joining65% (8%)44% (23%) BLAST94% (81%)99% (67%) BLAT94% (82%)99% (69%) megaBLAST94% (80%)99% (61%) BLAST/parsimony ratchet86% (74%)77% (55%) BLAST/SPR87% (73%)76% (53%) BLAST/neighbor joining93% (71%)95% (56%) DNA–BAR98% (89%)100% (79%) DOME ID80% (80%)60% (60%) DOME ID*100% (100%) ATIM100% (83%)100 (67%)

25 accuracy to species methodnrITS2matK parsimony ratchet67% (46%)77% (60%) SPR search69% (47%)78% (58%) neighbor joining68% (42%)75% (52%) BLAST67% (63%)84% (68%) BLAT66% (62%)82% (67%) megaBLAST72% (68%)84% (64%) BLAST/parsimony ratchet78% (67%)80% (60%) BLAST/SPR79% (67%)78% (61%) BLAST/neighbor joining80% (64%)86% (56%) DNA–BAR65% (62%)73% (62%) DOME ID67% (66%)50% (50%) DOME ID*76% (75%)90% (90%) ATIM83% (71%)87% (53%)

26 some sequences are simply unidentifiable

27 ...remaining (insoluble) problems identical sequences for multiple terminals shared alleles between terminals use allele frequency as a predictor?

28 desirable methodologies and properties of Sequence IDentification Engines (SIDEs)

29 Sequence IDentification Engines (SIDEs) avoid global alignment by comparing short segments: pseudo–alignment use exact matches use autoapomorphies where possible...but allow the use of other characters too

30 context/text DNA recoding characters are defined by flanking context => pretext and postext permit “alignment–free” comparisons size and separation between pretext and postext must be arbitrarily delimited states (text) limited by the proximity of context terminals can be individual sequences or composites representing taxa

31 context/text DNA recoding

32 characters are defined by flanking context => pretext and postext permit “alignment–free” comparisons size and separation between pretext and postext is arbitrarily possible states (text) is limited by the length of the text terminals can be individual sequences or composites representing taxa

33 querying text/context database find pretext/text/postext in the query sequence and match to references

34 querying text/context database

35 find pretext/text/postext in the query sequence and match to references score terminals based on the number of matches final score can be raw or based a weighting function

36 possible weighting functions equal weights (raw score) number of distinct texts => up weights more variable characters 1/(number of distinct texts) => down weights more variable characters (number of texts)/(number of scores)

37 precision methodnrITS2matK parsimony ratchet58% (13%)71% (41%) SPR search60% (11%)70% (41%) neighbor joining65% (8%)44% (23%) BLAST94% (81%)99% (67%) BLAT94% (82%)99% (69%) megaBLAST94% (80%)99% (61%) BLAST/parsimony ratchet86% (74%)77% (55%) BLAST/SPR87% (73%)76% (53%) BLAST/neighbor joining93% (71%)95% (56%) DNA–BAR98% (89%)100% (79%) DOME ID80% (80%)60% (60%) ATIM100% (83%)100 (67%) BRONX 091% (90%)88% (84%) BRONX 1 96% (86%)98% (79%)

38 accuracy to species methodnrITS2matK parsimony ratchet67% (46%)77% (60%) SPR search69% (47%)78% (58%) neighbor joining68% (42%)75% (52%) BLAST67% (63%)84% (68%) BLAT66% (62%)82% (67%) megaBLAST72% (68%)84% (64%) BLAST/parsimony ratchet78% (67%)80% (60%) BLAST/SPR79% (67%)78% (61%) BLAST/neighbor joining80% (64%)86% (56%) DNA–BAR65% (62%)73% (62%) DOME ID67% (66%)50% (50%) ATIM83% (71%)87% (53%) BRONX 059% (58%)76% (71%) BRONX 172% (67%)92% (75%)

39 BRONX conclusions BRONX is more precise than existing algorithms BRONX is sometimes more accurate than existing algorithms BRONX is an incremental improvement

40 future directions improve the scoring function in BRONX dynamically size context/text benchmark additional datasets for all methods incorporate context/text recoding into a scalable version of the ATIM algorithm

41 acknowledgments Kenneth Cameron Santiago Madriñán Christian Schulz Dennis Stevenson


Download ppt "DNA Barcode sequence identification incorporating taxonomic hierarchy and within taxon variability Damon P. Little Cullman Program for Molecular Systematics."

Similar presentations


Ads by Google