Genome Read In-Memory (GRIM) Filter: Fast Location Filtering in DNA Read Mapping using Emerging Memory Technologies Jeremie Kim 1, Damla Senol 1, Hongyi.

Slides:

Advertisements

Similar presentations

Key idea: SHM identifies matching by incrementally shifting the read against the reference Mechanism: Use bit-wise XOR to find all matching bps. Then use.

Advertisements

Computer Vision Lecture 18: Object Recognition II

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.

1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.

Accelerating Read Mapping with FastHASH †† ‡ †† Hongyi Xin † Donghyuk Lee † Farhad Hormozdiari ‡ Samihan Yedkar † Can Alkan § Onur Mutlu † † † Carnegie.

1 The Google File System Reporter: You-Wei Zhang.

M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.

Massively Parallel Mapping of Next Generation Sequence Reads Using GPUs Azita Nouri, Reha Oğuz Selvitopi, Özcan Öztürk, Onur Mutlu, Can Alkan Bilkent University,

Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†

SHRiMP: Accurate Mapping of Short Reads in Letter- and Colour-spaces Stephen Rumble, Phil Lacroute, …, Arend Sidow, Michael Brudno.

Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.

CS 147 Virtual Memory Prof. Sin Min Lee Anthony Palladino.

Managing Web Server Performance with AutoTune Agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigus Presented by Changha Lee.

Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.

Qq q q q q q q q q q q q q q q q q q q Background: DNA Sequencing Goal: Acquire individual’s entire DNA sequence Mechanism: Read DNA fragments and reconstruct.

Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.

DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.

Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II.

RFVP: Rollback-Free Value Prediction with Safe to Approximate Loads Amir Yazdanbakhsh, Bradley Thwaites, Hadi Esmaeilzadeh Gennady Pekhimenko, Onur Mutlu,

April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.

These slides are based on the book:

FastHASH: A New Algorithm for Fast and Comprehensive Next-generation Sequence Mapping Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Can Alkan3, Onur.

Zorua: A Holistic Approach to Resource Virtualization in GPUs

Computer Architecture: Parallel Task Assignment

Memory Management.

Seth Pugsley, Jeffrey Jestes,

Nanopore Sequencing Technology and Tools:

1Carnegie Mellon University

Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,

Genomic Data Clustering on FPGAs for Compression

Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.

BitWarp Energy Efficient Analytic Data Processing on Next Generation General Purpose GPUs Jason Power || Yinan Li || Mark D. Hill || Jignesh M. Patel.

RFVP: Rollback-Free Value Prediction with Safe to Approximate Loads

Department of Computer Science

Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor

Anne Pratoomtong ECE734, Spring2002

Hasan Hassan, Nandita Vijaykumar, Samira Khan,

Genome Read In-Memory (GRIM) Filter Fast Location Filtering in DNA Read Mapping with Emerging Memory Technologies Jeremie Kim, Damla Senol, Hongyi Xin,

Nanopore Sequencing Technology and Tools:

Nanopore Sequencing Technology and Tools for Genome Assembly:

GateKeeper: A New Hardware Architecture

Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu

Chapter 9: Virtual-Memory Management

Today, DRAM is just a storage device

Ambit In-memory Accelerator for Bulk Bitwise Operations

Parallel and Multiprocessor Architectures – Shared Memory

Genome Read In-Memory (GRIM) Filter:

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping

Accelerator Architecture in Computational Biology and Bioinformatics, February 24th, 2018, Vienna, Austria Exploring Speed/Accuracy Trade-offs in Hardware.

Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1 Zülal.

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang †⋄, Jidong Zhai ⋄, Xipeng Shen #, Onur Mutlu ⋆, Wenguang.

Online Graph-Based Tracking

Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1, Zülal.

Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,

Fourier Transform of Boundaries

Hash Functions for Network Applications (II)

A microprocessor into a memory chip Dave Patterson, Berkeley, 1997

Computer Architecture Lecture 10a: Simulation

RAIDR: Retention-Aware Intelligent DRAM Refresh

CSE 542: Operating Systems

Tesseract A Scalable Processing-in-Memory Accelerator

By Nuno Dantas GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping Mohammed Alser §, Hasan Hassan †, Hongyi.

Presented by Florian Ettinger

Computer Architecture Lecture 30: In-memory Processing

Apollo: A Sequencing-Technology-Independent, Scalable,

BitMAC: An In-Memory Accelerator for Bitvector-Based

SneakySnake: A New Fast and Highly Accurate Pre-Alignment Filter

Presentation transcript:

Genome Read In-Memory (GRIM) Filter: Fast Location Filtering in DNA Read Mapping using Emerging Memory Technologies Jeremie Kim 1, Damla Senol 1, Hongyi Xin 1, Donghyuk Lee 1, Mohammed Alser 2, Hasan Hassan 3, Oguz Ergin 3, Can Alkan 2 and Onur Mutlu 1 1 Carnegie Mellon University, 2 Bilkent University, 3 TOBB University of Economics and Technology Part 2: GRIM-Filter Walkthrough Results & Conclusion o Baseline: mrFAST with FastHASH mapper code [Xin+, BMC Genomics 2013]. However, GRIM-Filter is fully complementary to other mappers, too. o Key Results of GRIM-Filter: 5.59x-6.41x less false negative locations, and 1.81x-3.65x end-to-end speedup over the state-of-the-art read mapper mrFAST with FastHASH. o We show the inherent parallelism of our filter and ease of implementation for 3D-stacked memory. There is great promise in adapting DNA read mapping algorithms to state-of-the-art and emerging memory and processing technologies. o Other Results: Examined sensitivity to number of bins: 450x65536 Examined sensitivity to q-gram size: 5 Found the best tradeoff between memory consumption, filtering efficiency, and runtime. 3D-Stacked Logic-in-Memory DRAM Design and implement a new filter that rejects incorrect mappings before the alignment step o Minimize the occurrences of unnecessary alignment o Maintain high sensitivity and comprehensiveness o Obtain low runtime and low false positive rate Accelerate read mapping by overcoming the memory bottleneck by utilizing 3D-stacked memory and its PIM capability to handle data- intensive computation o very fast and massively parallel operations on very large amounts of data nearby memory Read Mapping Recent technology that tightly couples memory and logic vertically with very high bandwidth connectors. Small and numerous Through Silicon Vias (TSVs) connecting each layer, enable higher bandwidth, lower latency and lower energy consumption. Logic layer enables fast, massively parallel operations on large sets of data, and provides the ability to run these operations near memory to alleviate the memory bottleneck. Logic layer can be customized for application-specific accelerators. Problem Read Mapping: Mapping billions of DNA fragments (reads) against a reference genome to identify genomic variants o Approximate string matching o Computationally expensive alignment using quadratic-time dynamic programming algorithm o Bottlenecked by memory bandwidth Three types of read mappers: o Suffix-array based mappers o Hash table based mappers o Hybrid Hash Table Based Mappers Seed-and-extend procedure to map reads against a reference genome allowing e indels. o High sensitivity o High comprehensiveness BUT o High runtime The most recent fastest hash table based read mapper, mrFAST with FastHASH [Xin+, BMC Genomics 2013] Our Goal For lower runtimes, location filters can efficiently determine whether a candidate mapping location will result in an incorrect mapping before performing the computationally expensive incorrect verification by alignment. They should be fast. GRIM-Filter Mechanism GRIM-Filter is based on two key ideas: o Modify q-gram string matching to enable parallel checking for multiple locations, and o Utilize a 3D-stacked DRAM architecture that both alleviates the memory bandwidth issue of our algorithm and parallelizes most of the filter. GRIM-Filter has two main parts: 1) Precomputation: Divide the reference genome into consecutive bins and generate existence bitvectors for each bin. 2) Filtering Algorithm: Filter locations by quickly determining whether it is possible for a read to map to a specific segment of the genome. Runtimes for GRIM-Filter across the benchmarks as error threshold varies. Part 1: Bins & Bitvectors False Negative Rates for GRIM-Filter as error threshold varies. Existence bitvectors represent the existence of all possible permutations of q-length nucleotide sequences in each bin. Bitvectors are distributed throughout the memory layers of 3D-stacked DRAM on top of the customized logic layer, which enables processing-in-memory and high parallelism. Memory ArrayCustomized Logic