Presentation is loading. Please wait.

Presentation is loading. Please wait.

Leveraging MinHash for rapid identification of nanopore data on mobile hardware Brian Ondov MinION Community Meeting New York, NY December 4th, 2015.

Similar presentations


Presentation on theme: "Leveraging MinHash for rapid identification of nanopore data on mobile hardware Brian Ondov MinION Community Meeting New York, NY December 4th, 2015."— Presentation transcript:

1 Leveraging MinHash for rapid identification of nanopore data on mobile hardware
Brian Ondov MinION Community Meeting New York, NY December 4th, 2015

2 Acknowledgement This work was funded under Contract No. HSHQDC-07-C awarded to Battelle National Biodefense Institute by the Department of Homeland Security (DHS) Science and Technology Directorate for the management and operation of the National Biodefense Analysis and Countermeasures Center a Federally Funded Research and Development Center. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the DHS or the U.S. Government. DHS does not endorse any products or commercial services mentioned in this presentation.

3 Real time Portable High error

4 ?

5 Real time Portable High error Streaming Significance Fast Robust
Low memory High error

6 K-mer based distance estimate
ACC ATG AGT CAG ATC CAT CCA CCG CGA CGT GAC GAT GGA GCA TGG GTA TAC TCG 50% = 0.50 Fan et al. 2015

7 K-mers Real time Portable High error Streaming Significance Fast
Robust Low memory High error

8 Reducing the problem Andrei Broder, 1997: “On the resemblance and containment of documents”

9 MinHash S ACC ATG AGT ATC CAG CAT CCA CCG CGA GAC CGT GGA GAT TGG GCA
TAC TCG S

10 MinHash Distance vs. ANI (500 E. coli)

11 RefSeq 600Gb 93Mb (6000x) Sketch: 26 cpuh Distance: 20 cpuh
k = 16, s = 400 600Gb 93Mb (6000x) 55,000 genomes Acinetobacter baumannii B. cereus group Klebsiella pneumoniae Escherichia coli & Shigella Mycobacterium tuberculosis Streptococcus agalactiae

12 Querying RefSeq Reads 1s Sketch Bloom filter 1s Repeated K-mers 1s

13 Streaming E. coli reads Reads Coverage LCA (lowest distance ties)
P-value (best) 100 15% Microbes (bacteria/archaea) 3.3e-1 200 27% Enterobacteriaceae (family) 6.3e-5 300 34% E. coli K12 2.6e-6 400 44% 2.8e-14 500 51% 1.9e-21 600 57% 3.8e-35 700 62% 6.5e-50 800 67% 5.8e-61 900 71% 2.3e-72 1000 75% 1.8e-86

14 Covering B. anthracis x 30

15 MinHash Real time Portable High error Streaming Significance Fast
Robust Low memory High error

16 ? 87mm 54mm

17 Future applications Andrei Broder, 1997: “On the resemblance and containment of documents” metagenomics Pre-alignment

18 MarBL NHGRI mash.readthedocs.org
Todd Treangen Nicholas Bergman Adam Mallonee Adam Phillippy Sergey Koren github.com/ MarBL NHGRI

19


Download ppt "Leveraging MinHash for rapid identification of nanopore data on mobile hardware Brian Ondov MinION Community Meeting New York, NY December 4th, 2015."

Similar presentations


Ads by Google