Genomic Data Clustering on FPGAs for Compression

Slides:

Advertisements

Similar presentations

Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.

Advertisements

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.

 Understanding the Sources of Inefficiency in General-Purpose Chips.

Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.

Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.

Improving performance of Multiple Sequence Alignment in Multi-client Environments Aaron Zollman CMSC 838 Presentation.

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.

Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.

GPGPU platforms GP - General Purpose computation using GPU

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

Development of the Graphical User Interface and Improvement and Streamlining of NYMTC's Best Practice Model Jim Lam, Andres Rabinowicz, Srini Sundaram,

Independent Component Analysis (ICA) A parallel approach.

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

Analysis of algorithms Analysis of algorithms is the branch of computer science that studies the performance of algorithms, especially their run time.

Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.

“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)

COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.

GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.

November 29, 2011 Final Presentation. Team Members Troy Huguet Computer Engineer Post-Route Testing Parker Jacobs Computer Engineer Post-Route Testing.

Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.

DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:

Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.

Sub-fields of computer science. Sub-fields of computer science.

FastHASH: A New Algorithm for Fast and Comprehensive Next-generation Sequence Mapping Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Can Alkan3, Onur.

Backprojection Project Update January 2002

A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.

Virtual memory.

Evolutionary Technique for Combinatorial Reverse Auctions

School of Engineering University of Guelph

Ioannis E. Venetis Department of Computer Engineering and Informatics

How do we evaluate computer architectures?

Distributed Network Traffic Feature Extraction for a Real-time IDS

A Closer Look at Instruction Set Architectures

High Performance Computing on an IBM Cell Processor --- Bioinformatics

Parallel Density-based Hybrid Clustering

Architecture & Organization 1

The short-read alignment in distributed memory environment

FPGAs in AWS and First Use Cases, Kees Vissers

Hadoop Clusters Tess Fulkerson.

Department of Computer Science

Overview Introduction VPS Understanding VPS Architecture

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

CSCI1600: Embedded and Real Time Software

GateKeeper: A New Hardware Architecture

Architecture & Organization 1

TLC: A Tag-less Cache for reducing dynamic first level Cache Energy

Objective of This Course

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

Fast Sequence Alignments

Packet Classification with Evolvable Hardware Hash Functions

CSE 373 Data Structures and Algorithms

International Data Encryption Algorithm

Applying principles of computer science in a biological context

CS703 - Advanced Operating Systems

CSE 373: Data Structures and Algorithms

Concurrent Cache-Oblivious B-trees Using Transactional Memory

Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu

CSE 332: Data Abstractions Memory Hierarchy

CSCI1600: Embedded and Real Time Software

Coevolutionary Automated Software Correction

Accelerating Regular Path Queries using FPGA

Presentation transcript:

Genomic Data Clustering on FPGAs for Compression Andreas Zingg 24.10.2017

Background - Bioinformatics Important tool to guide therapeutic intervention. Improve the knowledge available to researchers interested in evolutionary biology. -> May lay the foundation for predicting disease susceptibility and drug response Andreas Zingg 24.10.2017

Background - Bioinformatics Important tool to guide therapeutic intervention. Improve the knowledge available to researchers interested in evolutionary biology. -> May lay the foundation for predicting disease susceptibility and drug response Andreas Zingg 24.10.2017

Background - Bioinformatics Genome Entirety of an organisms hereditary Information Encoded in DNA DNA Consists of nitrogenous Bases Bases appear in pairs Important tool to guide therapeutic intervention. Improve the knowledge available to researchers interested in evolutionary biology. -> May lay the foundation for predicting disease susceptibility and drug response Andreas Zingg 24.10.2017

Background - Bioinformatics Base Pairs Important tool to guide therapeutic intervention. Improve the knowledge available to researchers interested in evolutionary biology. -> May lay the foundation for predicting disease susceptibility and drug response Andreas Zingg 24.10.2017

Background - Bioinformatics Base Pairs Important tool to guide therapeutic intervention. Improve the knowledge available to researchers interested in evolutionary biology. -> May lay the foundation for predicting disease susceptibility and drug response Andreas Zingg 24.10.2017

Genomic Data DNA is cut into small sequences Important tool to guide therapeutic intervention. Improve the knowledge available to researchers interested in evolutionary biology. -> May lay the foundation for predicting disease susceptibility and drug response Andreas Zingg 24.10.2017

Genomic Data DNA is cut into small sequences Sequences are read by machine Important tool to guide therapeutic intervention. Improve the knowledge available to researchers interested in evolutionary biology. -> May lay the foundation for predicting disease susceptibility and drug response ACTGATTG GCCTATCGATGAC TGAT TATCGACG Andreas Zingg 24.10.2017

~ 300 GB The Problem Generated Data is really big One Human Genome generates data in the order of 300 GB ~ 300 GB This might take a while Andreas Zingg 24.10.2017

The Solution Compress the data! Andreas Zingg 24.10.2017

The Solution Compress the data! But how? Andreas Zingg 24.10.2017

Exploit data redundancy The Solution Exploit data redundancy Map the data to the human reference genome About 90% of genomic sequences share similarities with the human reference genome Andreas Zingg 24.10.2017

Mapping to the reference genome Human Reference Genome Aligned reads Andreas Zingg 24.10.2017

Mapping to the reference genome Can map about 90 % of sequences to the reference genome Compress Mapped sequences using their relative location to the reference Andreas Zingg 24.10.2017

Mapping to the reference genome What about the remaining 10%? Can map about 90 % of sequences to the reference genome Compress Mapped sequences using their relative location to the reference Andreas Zingg 24.10.2017

Clustering What about the remaining 10%? Find Clusters and map sequences to these Clusters Andreas Zingg 24.10.2017

Clustering What about the remaining 10%? Find Clusters and map sequences to these Clusters Using what algorithm? Andreas Zingg 24.10.2017

Clustering What about the remaining 10%? Find Clusters and map sequences to these Clusters Using what algorithm? K-Means? Andreas Zingg 24.10.2017

Clustering What about the remaining 10%? Find Clusters and map sequences to these Clusters Using what algorithm? K-Means? What should our K be? Andreas Zingg 24.10.2017

No Useful Clustering Algorithm No useful clustering algorithm for compression of genomic data Exact number of K does not matter As long as there are high correlated clusters, compression is possible Instead of a searching for exactly K clusters, find clusters using a small threshold neighbourhood function Present clustering Algorithm Andreas Zingg 24.10.2017

Matching function For 2 Sequences s1 and s2 a matching function is defined: le: sequence size d: Distance between sequences N: distance threshold Andreas Zingg 24.10.2017

Matching function N = 1 le = 8 Reverse Complement Match! Match! Match! No Match! Andreas Zingg 24.10.2017

Basic Clustering Idea Andreas Zingg 24.10.2017

Basic Clustering Idea Complexity: 𝑂 𝑛 2 Andreas Zingg 24.10.2017

Basic Clustering Idea Complexity: 𝑂 𝑛 2 More than 2 years on an Intel core i7 4790 Not practical Andreas Zingg 24.10.2017

Parallel Clustering Compare sequences with multiple cluster references at the same time Use FPGA board to implement parallel clustering algorithm To compare sequences FPGA can use 6-bit lookup tables Andreas Zingg 24.10.2017

Setup Modular interface to cluster sequences CPU and FPGA interchangeable Allows for performance and result comparison Andreas Zingg 24.10.2017

FPGA top hierarchy Andreas Zingg 24.10.2017

Matching Unit Andreas Zingg 24.10.2017

FPGA initialization phase Andreas Zingg 24.10.2017

FPGA main phase (multiple possible) Andreas Zingg 24.10.2017

Shortcomings Limited number of parallel clustering units Andreas Zingg 24.10.2017

Shortcomings Limited number of parallel clustering units Requires phase repetitions Andreas Zingg 24.10.2017

Shortcomings Limited number of parallel clustering units Requires phase repetitions Number of executions and memory latency increases Clustering process is slowed down Andreas Zingg 24.10.2017

Shortcomings Limited number of parallel clustering units Requires phase repetitions Number of executions and memory latency increases Clustering process is slowed down Worst case: None of the sequences match with the references of current clusters Cache must be able to store all sequences Andreas Zingg 24.10.2017

Proposed Workarounds Cache not big enough Increase memory capacity Cut input into smaller pieces that fit in cache, and handle those Parallelizable, however, solution might be sub-optimal Phase repetitions slow down clustering process Use HMC-Modules Use maximum number of parallel clustering units Maximum nr parallel units: limited FPGA size latency of sequence distribution over FPGA surface Andreas Zingg 24.10.2017

Test Setup Unmapped paired sequences of 126 bases from real human sample FPGA based version at 125MHz Software version on Intel Core i7-4790 Haswell 4-Core at 4GHz Andreas Zingg 24.10.2017

Runtime dependant on input size Andreas Zingg 24.10.2017

Times needed to cluster a real case file Software configuration ( : extrapolated) FPGA Hardware configuration ( : extrapolated) Andreas Zingg 24.10.2017

Results Software solution takes 2.6 years FPGAs take ~12 hours Make the task practical Speed gain: ~1000 x Energy saved: ~700 x Andreas Zingg 24.10.2017

Conclusion Goal achieved Opens path for new clustering based compression algorithms Proved even on large datasets, high Complexity algorithms ( 𝑂 𝑛 2 ) can run in reasonable amount of time when provided with specialized hardware Andreas Zingg 24.10.2017

My Take Well structured Easy to read and understand Interesting insight in a new field Speedup is not explained well Andreas Zingg 24.10.2017

Questions Andreas Zingg 24.10.2017