SneakySnake: A New Fast and Highly Accurate Pre-Alignment Filter

Slides:



Advertisements
Similar presentations
Navigation Set Hierarchy Tom Gianos Chapter 2.2. Mike Dickheiser Works (worked?) for Red Storm Entertainment Works (worked?) for Red Storm Entertainment.
Advertisements

Yasuhiro Fujiwara (NTT Cyber Space Labs)
Communication Pattern Based Node Selection for Shared Networks
Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date:
Seven Minute Madness: Reconfigurable Computing Dr. Jason D. Bakos.
1 Available Bandwidth Measurement and the Obstacles to Overcome Accurate Measurement.
A Fault-tolerant Architecture for Quantum Hamiltonian Simulation Guoming Wang Oleg Khainovski.
A Study of GeneWise with the Drosophila Adh Region Asta Gindulyte CMSC 838 Presentation Authors: Yi Mo, Moira Regelson, and Mike Sievers Paracel Inc.,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Accelerating Read Mapping with FastHASH †† ‡ †† Hongyi Xin † Donghyuk Lee † Farhad Hormozdiari ‡ Samihan Yedkar † Can Alkan § Onur Mutlu † † † Carnegie.
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 FLUTE: Fast Lookup Table Based RSMT Algorithm.
Gene Matching Using JBits Steven A. Guccione Eric Keller.
Massively Parallel Mapping of Next Generation Sequence Reads Using GPUs Azita Nouri, Reha Oğuz Selvitopi, Özcan Öztürk, Onur Mutlu, Can Alkan Bilkent University,
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Maze Routing مرتضي صاحب الزماني.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
GLOBAL ROUTING Anita Antony PR11EC1011. Approaches for Global Routing Sequential Approach: – Route the nets one at a time. Concurrent Approach: – Consider.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.
“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.
Qq q q q q q q q q q q q q q q q q q q Background: DNA Sequencing Goal: Acquire individual’s entire DNA sequence Mechanism: Read DNA fragments and reconstruct.
EE4271 VLSI Design VLSI Channel Routing.
Canadian Bioinformatics Workshops
Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II.
Canadian Bioinformatics Workshops
Genome Read In-Memory (GRIM) Filter: Fast Location Filtering in DNA Read Mapping using Emerging Memory Technologies Jeremie Kim 1, Damla Senol 1, Hongyi.
SketchVisor: Robust Network Measurement for Software Packet Processing
FastHASH: A New Algorithm for Fast and Comprehensive Next-generation Sequence Mapping Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Can Alkan3, Onur.
Two-Dimensional Phase Unwrapping On FPGAs And GPUs
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Web: Parallel Computing Rabie A. Ramadan , PhD Web:
Data Center Network Architectures
Genomic Data Clustering on FPGAs for Compression
Real-Time Ray Tracing Stefan Popov.
ISP and Egress Path Selection for Multihomed Networks
Department of Computer Science
NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT Author: Souradip Sarkar; Gaurav Ramesh Kulkarni; Partha Pratim Pande; and Ananth.
Programmable Logic Controllers (PLCs) An Overview.
Buffer Insertion with Adaptive Blockage Avoidance
Multi-Core Parallel Routing
Genome Read In-Memory (GRIM) Filter Fast Location Filtering in DNA Read Mapping with Emerging Memory Technologies Jeremie Kim, Damla Senol, Hongyi Xin,
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Nanopore Sequencing Technology and Tools for Genome Assembly:
Finding Heuristics Using Abstraction
Dissertation for the degree of Philosophiae Doctor (PhD)
GateKeeper: A New Hardware Architecture
CS 598AGB Genome Assembly Tandy Warnow.
Genome Read In-Memory (GRIM) Filter:
GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping
MapView: visualization of short reads alignment on a desktop computer
Accelerator Architecture in Computational Biology and Bioinformatics, February 24th, 2018, Vienna, Austria Exploring Speed/Accuracy Trade-offs in Hardware.
Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1 Zülal.
Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1, Zülal.
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Resource Allocation in a Middleware for Streaming Data
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Department of Electrical Engineering Technion
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Implementation of a De-blocking Filter and Optimization in PLX
Fig. 1. Neighborhood map (N) and the Shouji bit-vector, for text T=GGTGCAGAGCTC and pattern P=GGTGAGAGTTGT for E =  Fig. 1. Neighborhood map (N)
Computer Architecture Lecture 10a: Simulation
Why systolic architectures?
By Nuno Dantas GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping Mohammed Alser §, Hasan Hassan †, Hongyi.
Apollo: A Sequencing-Technology-Independent, Scalable,
BitMAC: An In-Memory Accelerator for Bitvector-Based
FLUTE: Fast Lookup Table Based RSMT Algorithm for VLSI Design
Presentation transcript:

SneakySnake: A New Fast and Highly Accurate Pre-Alignment Filter on CPU and FPGA for Accelerating Sequence Alignment Mohammed Alser1,2 Can Alkan2 Onur Mutlu1,2,3 1 2 3 1: Read Mapping 2: Problem 3: Our Goal Fact: it remains challenging to sequence the entire DNA molecule as a whole. As a workaround: high throughput DNA sequencing (HTS) technologies can sequence only segments of the original molecule. This is relatively quick and cost-effective but it results in an excessive number of genomic reads. Hence we need read mapping to link the reads together and construct back the donor’s complete genome by 1) determining the location of each read within reference genome and 2) calculating its optimal sequence alignment. Calculating sequence alignment is a major performance bottleneck: Uses computationally expensive dynamic programming algorithms. Bottlenecked by memory bandwidth, e.g., Illumina NovaSeq 6000 generates 6 Terabases in < 24 hours They are unavoidable as they provide accurate information about the quality of the alignment. Majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Significantly reduce the time spent on calculating the sequence alignment of short sequences using pre-alignment filtering. To this end: We introduce new, fast, and very accurate pre-alignment filters, SneakySnake (for CPU) and Snake-on-Chip (for FPGA). 4: Key Ideas Quickly and accurately filters out highly dissimilar sequence pairs before applying sequence alignment algorithms. Provides fast and highly accurate filtering by reducing the sequence alignment problem to single net routing (SNR) problem [Lee+, IEEE-TCAS 1976] in VLSI chip layout. Judiciously leverages the parallelism-friendly architecture of modern FPGAs to greatly speed up the SneakySnake algorithm. 5: Single Net Routing (SNR) Problem 6: SneakySnake Walkthrough SNR Problem: finding the optimal routing path that: Includes the least number of horizontal escape segments, Passes through the minimum number of obstacles, Connects two IO terminals on a special grid layout. The number of obstacles in the solution to the SNR problem is a lower bound on the actual number of edits between two genomic sequences. Solving the SNR problem is much faster than solving the sequence alignment problem, as calculating the routing path after facing an obstacle is independent of the calculated path before this obstacle. 7: Evaluation & Key Takeaways Dataset Description: Set_1 & Set_2: each has 30 million pairs from mapping ERR240727_1 to the human genome using mrFAST’s e= 2, 40, respectively. Set_3 & Set_4: each has 30 million pairs from mapping SRR826471_1 using mrFAST’s e= 8, 100, respectively. Speedup vs. Existing Filters Key Results < 31412×, 20603×, and 64.1× fewer falsely- accepted sequences compared to GateKeeper / SHD (using Set_4, E= 10%), Shouji (using Set_4, E= 10%), and MAGNET (using Set_1, E= 1%), respectively. < 37.6× and 43.9× speedup with the addition of SneakySnake to Edlib [Šošic+, Bioinformatics 2017] (using Set_4, E= 0%) and Parasail [Daily+, Bioinformatics BMC 2016] (using Set_4, E =2%), respectively. < 154.7× and 150.2× speedup with the addition of Snake-on-Chip to Edlib [Šošic+, Bioinformatics 2017] (E= 0%) and Parasail [Daily+, Bioinformatics BMC 2016] (E =0%), respectively. < 1.4×, 3.4×, and 1.8× more speedup compared to that provided by adding Shouji, MAGNET, and GateKeeper as a pre-alignment filter, respectively. SneakySnake & Snake-on-Chip Open-source: https://github.com/CMU-SAFARI do not replace sequence alignment step. do not sacrifice any of the sequence aligner capabilities (scoring and backtracking), as they do not modify the aligner. Filtering Accuracy vs. Existing Filters Contact Us: https://people.inf.ethz.ch/alserm/