GateKeeper: A New Hardware Architecture

Slides:

Advertisements

Similar presentations

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.

Advertisements

Key idea: SHM identifies matching by incrementally shifting the read against the reference Mechanism: Use bit-wise XOR to find all matching bps. Then use.

Thank you for your introduction.

Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.

Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.

CMPT-884 Jan 18, 2010 Video Copy Detection using Hadoop Presented by: Cameron Harvey Naghmeh Khodabakhshi CMPT 820 December 2, 2010.

PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.

Field-Programmable Logic and its Applications INTERNATIONAL CONFERENCE August 30 – September 01, 2004 Albert A. Conti, Tom Van Court, Martin C. Herbordt.

A Fault-tolerant Architecture for Quantum Hamiltonian Simulation Guoming Wang Oleg Khainovski.

Accelerating Read Mapping with FastHASH †† ‡ †† Hongyi Xin † Donghyuk Lee † Farhad Hormozdiari ‡ Samihan Yedkar † Can Alkan § Onur Mutlu † † † Carnegie.

Gene Matching Using JBits Steven A. Guccione Eric Keller.

Massively Parallel Mapping of Next Generation Sequence Reads Using GPUs Azita Nouri, Reha Oğuz Selvitopi, Özcan Öztürk, Onur Mutlu, Can Alkan Bilkent University,

Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.

@ Carnegie Mellon Databases Inspector Joins Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki 2 Carnegie Mellon University Intel Research.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.

A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.

EE3A1 Computer Hardware and Digital Design

Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:

QCAdesigner – CUDA HPPS project

Performance Analysis of Packet Classification Algorithms on Network Processors Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University.

P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA

Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.

Qq q q q q q q q q q q q q q q q q q q Background: DNA Sequencing Goal: Acquire individual’s entire DNA sequence Mechanism: Read DNA fragments and reconstruct.

Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.

Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II.

Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.

Genome Read In-Memory (GRIM) Filter: Fast Location Filtering in DNA Read Mapping using Emerging Memory Technologies Jeremie Kim 1, Damla Senol 1, Hongyi.

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.

SketchVisor: Robust Network Measurement for Software Packet Processing

ESE532: System-on-a-Chip Architecture

FastHASH: A New Algorithm for Fast and Comprehensive Next-generation Sequence Mapping Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Can Alkan3, Onur.

Fang Fang James C. Hoe Markus Püschel Smarahara Misra

Design and Analysis of Low-Power novel implementation of encryption standard algorithm by hybrid method using SHA3 and parallel AES.

Overview Part 2 – Combinational Logic Functions and functional blocks

Two-Dimensional Phase Unwrapping On FPGAs And GPUs

Nanopore Sequencing Technology and Tools:

1Carnegie Mellon University

Real-Time Soft Shadows with Adaptive Light Source Sampling

Hardware Accelerator Test Bench for Error-Correcting Algorithms

Advanced Computer Networks

ESE532: System-on-a-Chip Architecture

Genomic Data Clustering on FPGAs for Compression

JIVE UniBoard Correlator (JUC) Firmware

Hyperthreading Technology

Department of Computer Science

Genome Read In-Memory (GRIM) Filter Fast Location Filtering in DNA Read Mapping with Emerging Memory Technologies Jeremie Kim, Damla Senol, Hongyi Xin,

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

Nanopore Sequencing Technology and Tools for Genome Assembly:

RECONFIGURABLE PROCESSING AND AVIONICS SYSTEMS

Milad Hashemi, Onur Mutlu, Yale N. Patt

Genome Read In-Memory (GRIM) Filter:

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping

Accelerator Architecture in Computational Biology and Bioinformatics, February 24th, 2018, Vienna, Austria Exploring Speed/Accuracy Trade-offs in Hardware.

Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1, Zülal.

The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.

Alternative Processor Panel Results 2008

HIGH LEVEL SYNTHESIS.

UNIVERSITY OF MASSACHUSETTS Dept

Department of Electrical Engineering Technion

Minwise Hashing and Efficient Search

Reseeding-based Test Set Embedding with Reduced Test Sequences

Fig. 1. Neighborhood map (N) and the Shouji bit-vector, for text T=GGTGCAGAGCTC and pattern P=GGTGAGAGTTGT for E = Fig. 1. Neighborhood map (N)

Lu Tang , Qun Huang, Patrick P. C. Lee

Performing Security Auditing In Hardware

UNIVERSITY OF MASSACHUSETTS Dept

By Nuno Dantas GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping Mohammed Alser §, Hasan Hassan †, Hongyi.

Apollo: A Sequencing-Technology-Independent, Scalable,

BitMAC: An In-Memory Accelerator for Bitvector-Based

ESE532: System-on-a-Chip Architecture

SneakySnake: A New Fast and Highly Accurate Pre-Alignment Filter

Presentation transcript:

GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping Mohammed Alser1 Hasan Hassan2 Hongyi Xin3 Oğuz Ergin2 Onur Mutlu4 Can Alkan1 1Bilkent University, 2TOBB University of Economics & Technology, 3Carnegie Mellon University, 4ETH Zürich Bioinformatics, Volume 33, Issue 21, 1 November 2017, Pages 3355–3363. 1: Read Mapping 2: Problem 3: Pre-Alignment Filtering Fact: until today, it remains challenging to sequence the entire DNA molecule as a whole. As a workaround: high throughput DNA sequencing (HTS) technologies are used to sequence short reads of copies of the original molecule. These technologies are relatively quick and cost-effective but result in an excessive number of short reads. Reads do not have any information about which part of genome they come from; hence read mapping is needed. It determines the optimal alignment and the potential location of each of the reads within a reference genome to construct the donor’s complete genome. Optimal alignment is computationally expensive. Bottlenecked by memory bandwidth, e.g., Illumina NovaSeq 6000 generates 6 Terabases per 36 hours for each genomic sample. Optimal alignment algorithms are unavoidable as they provide accurate information about the quality of the alignment. Majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. This wastes execution time and incurs significant computational burden. Our Goal: provide the first hardware accelerator architecture (as a pre-alignment filter) for quickly rejecting incorrect mappings (highly dissimilar read-reference pairs) that wastes execution time. Obtain low runtime. Highly-parallel hardware accelerator design. Reject most of incorrect mappings (low false positives). Accept all correct mappings (zero false negatives). 4: GateKeeper 5: GateKeeper Walkthrough Key Ideas: Build a processing core that is based on only parallel bitwise operations to examine a single mapping. Introduce parallelism to the pre-alignment step by integrating many hardware processing cores for examining many mappings in a parallel fashion. Exploit the large amounts of parallelism offered by FPGA* architectures to accelerate the performance of our processing cores. * FPGAs (field-programmable gate arrays) are the most commonly used reconfigurable hardware engines today and their computational capabilities are greatly increasing every generation due to increased number of transistors on the FPGA chip. An FPGA can be configured to include a large number of hardware execution units that are custom-tailored to the problem at hand (Aluru and Jammula, 2014; Herbordt et al., 2007; Trimberger, 2015). . Mechanism: Fast detection of base-substitutions: If Hamming distance is less than or equal to E, the user-defined edit distance threshold, then this mapping is accepted. Fast detection of insertions and deletions: Generate 2E deletion and insertion masks, by incrementally shifting the query to right or left, respectively, then compare against the reference segment. Apply bit-vector optimization using fast architecture design: Pre-process all the (2E+1) bit-vectors by encoding them into shorter binary format. Amend each 2 or less consecutive 0’s into 1’s as they are likely to be random matches. Calculate shifted Hamming distance (Xin et al., 2015): By ANDing all bit-vector masks, then conservatively count the 1’s in the AND mask. If their number is less than or equal to E then the mapping is accepted and passed to the alignment step. 6: Results & Conclusion Key Results of GateKeeper: 90x-130x faster than SHD (Xin et al., 2015) and the Adjacency Filter (Xin et al., 2013). 4x fewer false positives (falsely accepted incorrect mappings) the Adjacency Filter (Xin et al., 2013). 10x speedup with the addition of GateKeeper to the mrFAST mapper (Alkan et al., 2009). The first open-source FPGA-based filter for genome analysis. Github.com/BilkentCompGen/GateKeeper. As such, we hope that it catalyzes the development and adoption of such hardware accelerators in genome sequence analysis, which are becoming increasingly necessary to cope with the processing requirements of greatly increasing amounts of genomic data. Other Results: The number of processing cores is determined by the maximum data throughput (~13.3 billion bases per second provided by RIFFA (Jacobsen et al., 2015)) and the available FPGA resources. GateKeeper can examine up to 8 or 16 mappings concurrently (at 250 MHz) for an input read length of 300 and 100 bp, respectively. GateKeeper occupies 50% of the available FPGA slice LUTs and 91% of the available registers for an input read length of 100 and 300 bp, respectively. Pre-alignment filter does not replace alignment verification. Integrating the FPGA accelerators with the sequencer can help to hide the complexity and details of the underlying hardware. Chip Layout Speedup vs. Existing Filters Xilinx VC709 FPGA Chip layout and breakdown of the chip area for GateKeeper (for a read length of 300 bp and E=15). Billion mappings per 40 mins, log scale read length = 300 bp read length = 100 bp Filtering Accuracy vs. Existing Filters Number of examined mappings by GateKeeper, SHD, and the Adjacency Filter across different read lengths and E thresholds. SHD does not support 300 bp long reads Pre-alignment + Alignment Steps Overall mrFAST mapping time (in hours) with and without a pre-alignment step, with an edit distance threshold of 5%. Read length / E mrFAST version / pre-alignment type Filtering & Verification time (speed-up) Overall mapping time (speed-up) 100 bp / 5 edits 2.1 / No Pre-alignment 22.60 h (1x) 24.27 h (1x) 2.6 / Adjacency Filter 5.65 h (4x) 7.31 h (3.3x) 2.1 / GateKeeper 0.55 h (41x) 2.50 h (9.7x) 300 bp / 15 edits 0.94 h (1x) 1.02 h (1x) 0.04 h (24x) 0.12 h (8x) 0.003 h (279x) 0.09 h (11x) False positive rate (rate of incorrect mappings that are falsely accepted) of GateKeeper, SHD, and the Adjacency Filter across different edit distance thresholds (E) and read lengths.