Leveraging MinHash for rapid identification of nanopore data on mobile hardware Brian Ondov MinION Community Meeting New York, NY December 4th, 2015.

Slides:



Advertisements
Similar presentations
Huntington Disease An overview
Advertisements

Figure 1 The oligonucleotide sequence containing the T-bulge which was investigated by Natrajan, et al. The thymidine of the T-bulge is denoted in bold.
Tomris Cesuroglu, MD Institute for Public Health Genomics PAOG nascholing Jeugdgezondheidszorg Maastricht, 25 January 2011.
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
Transcription & Translation Worksheet
Crick’s early Hypothesis Revisited. Or The Existence of a Universal Coding Frame Axel Bernal UPenn Center for Bioinformatics Jean-Louis Lassez Coastal.
Comparative Genome Analysis. Comparative yeast genomics Kellis et al (2003) Nature 423,
1 Essential Computing for Bioinformatics Bienvenido Vélez UPR Mayaguez Lecture 5 High-level Programming with Python Part II: Container Objects Reference:
In vitro expression of BVDV capsid protein Corpus Christi College, University of Oxford Glycobiology Institute, Department of Biochemistry KOR SHU CHAN.
IGEM Arsenic Bioremediation Possibly finished biobrick for ArsR by adding a RBS and terminator. Will send for sequencing today or Monday.
Application of MLVA-15 genotyping for typing of Brucella abortus isolates from India Dr. Gita Kumari.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Nature and Action of the Gene
This material is based upon work supported by the U.S. Department of Homeland Security, Science and Technology Directorate, Office of University Programs,
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Math 15 Introduction to Scientific Data Analysis Lecture 10 Python Programming – Part 4 University of California, Merced Today – We have A Quiz!
Processing Speed Training: Does it work? Lesley A. Ross, PhD Center for Healthy Aging Department of Human Development and Family Studies College of Health.
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
Constitutive Low+Med Regulated Low+Med ∙ ∙ ∙ Constitutive High+V.High Regulated High+V.High max 20bp window.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
Fig. S1 siControl E2 G1: 45.7% S: 26.9% G2-M: 27.4% siER  E2 G1: 70.9% S: 9.9% G2-M: 19.2% G1: 57.1% S: 12.0% G2-M: 30.9% siRNF31 E2 A B siRNF31 siControl.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
TRANSLATION: information transfer from RNA to protein the nucleotide sequence of the mRNA strand is translated into an amino acid sequence. This is accomplished.
Particle Dynamics and Multi- Channel Feature Dictionaries for Robust Visual Tracking Srikrishna Karanam, Yang Li, Rich Radke Dept. of Electrical, Computer,
Assembly of Paired-end Solexa Reads by Kmer Extension using Base Qualities Zemin Ning The Wellcome Trust Sanger Institute.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
ALTO Server Discovery draft-ietf-alto-server-discovery-03 IETF#83, Paris, France S. Kiesel, M. Stiemerling, N. Schwan, M. Scharf, H. Song
Prodigiosin Production in E. Coli Brian Hovey and Stephanie Vondrak.
Passing Genetic Notes in Class CC106 / Discussion D by John R. Finnerty.
The 3 rd Research on Theorem Proving MEC Meeting Hanyang University Proteome Research Lab Hanyang University Proteome Research Lab Park, Ji-Yoon.
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
Oligonucleotide-based Theorem Proving by Cross-Linking Gold Nanoparticle Assembly Park, Ji-Yoon.
GENE EXPRESSION. Transcription 1. RNA polymerase unwinds DNA 2. RNA polymerase adds RNA nucleotides (A ↔ U, G ↔ C) 3. mRNA is formed! DNA reforms a double.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Performance Profiling of NGS Genome Assembly Algorithms Alex Ropelewski Pittsburgh Supercomputing Center
Topic: Replication of DNA Standard: Explain the role of DNA in storing and transmitting cellular information.
DNA, RNA and Protein.
Ji-Yoon Park Nanoparticle-Based Theorem Proving.
DIVS Building Security. DIVS Board of Directors: Mississippi, Chair Maryland, Vice Chair Idaho, Secretary Kentucky Florida Iowa Indiana Hawaii Nevada.
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
The response of amino acid frequencies to directional mutation pressure in mitochondrial genomes is related to the physical properties of the amino acids.
Figure S1. Construction of pAL70
Nanoparticle-based Theorem Proving
Modelling Proteomes.
Application of 13 MHz SeaSonde Systems for Vessel Detection
Arkansas Department of Health: Healthcare Preparedness Program
Discussion and Conclusion
INTERPOL Washington Forty years of connecting police worldwide
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Supplemental Table 3. Oligonucleotides for qPCR
4th Nutrition Center Symposium November 10, 2018
Supplementary Figure 1 – cDNA analysis reveals that three splice site alterations generate multiple RNA isoforms. (A) c.430-1G>C (IVS 6) results in 3.
Mutations In DNA By Mr. Guardiola.
Huntington Disease (HD)
DNA By: Mr. Kauffman.
DNA and RNA.
DiVs Title Slide Welcome.
Gene architecture and sequence annotation
Early Cognitive Decline and the Aging Brain - Overview
Molecular engineering of photoresponsive three-dimensional DNA
Fundamentals of Protein Structure
Python.
Structure of the 5′ Portion of the Human Plakoglobin Gene
Station 2 Protein Synethsis.
6.096 Algorithms for Computational Biology Lecture 2 BLAST & Database Search Manolis Piotr Indyk.
2019 OSEP Leadership Conference
Title of Poster Site Visit 2018 Introduction Results
Presentation transcript:

Leveraging MinHash for rapid identification of nanopore data on mobile hardware Brian Ondov MinION Community Meeting New York, NY December 4th, 2015

Acknowledgement This work was funded under Contract No. HSHQDC-07-C-00020 awarded to Battelle National Biodefense Institute by the Department of Homeland Security (DHS) Science and Technology Directorate for the management and operation of the National Biodefense Analysis and Countermeasures Center a Federally Funded Research and Development Center. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the DHS or the U.S. Government. DHS does not endorse any products or commercial services mentioned in this presentation.

Real time Portable High error

?

Real time Portable High error Streaming Significance Fast Robust Low memory High error

K-mer based distance estimate ACC ATG AGT CAG ATC CAT CCA CCG CGA CGT GAC GAT GGA GCA TGG GTA TAC TCG 50% = 0.50 Fan et al. 2015

K-mers Real time Portable High error Streaming Significance Fast Robust Low memory High error

Reducing the problem Andrei Broder, 1997: “On the resemblance and containment of documents”

MinHash S ACC ATG AGT ATC CAG CAT CCA CCG CGA GAC CGT GGA GAT TGG GCA TAC TCG S

MinHash Distance vs. ANI (500 E. coli)

RefSeq 600Gb 93Mb (6000x) Sketch: 26 cpuh Distance: 20 cpuh k = 16, s = 400 600Gb 93Mb (6000x) 55,000 genomes Acinetobacter baumannii B. cereus group Klebsiella pneumoniae Escherichia coli & Shigella Mycobacterium tuberculosis Streptococcus agalactiae

Querying RefSeq Reads 1s Sketch Bloom filter 1s Repeated K-mers 1s

Streaming E. coli reads Reads Coverage LCA (lowest distance ties) P-value (best) 100 15% Microbes (bacteria/archaea) 3.3e-1 200 27% Enterobacteriaceae (family) 6.3e-5 300 34% E. coli K12 2.6e-6 400 44% 2.8e-14 500 51% 1.9e-21 600 57% 3.8e-35 700 62% 6.5e-50 800 67% 5.8e-61 900 71% 2.3e-72 1000 75% 1.8e-86

Covering B. anthracis x 30

MinHash Real time Portable High error Streaming Significance Fast Robust Low memory High error

? 87mm 54mm

Future applications Andrei Broder, 1997: “On the resemblance and containment of documents” metagenomics Pre-alignment

MarBL NHGRI mash.readthedocs.org Todd Treangen Nicholas Bergman Adam Mallonee Adam Phillippy Sergey Koren github.com/ MarBL NHGRI