Challenging Cloning Related Problems with GPU-Based Algorithms

Slides:

Advertisements

Similar presentations

Linear Time Algorithm to Find All Relocation Positions for EUV Defect Mitigation Yuelin Du, Hongbo Zhang, Qiang Ma and Martin D. F. Wong ASPDAC13.

Advertisements

Multipattern String Matching On A GPU Author: Xinyan Zha, Sartaj Sahni Publisher: 16th IEEE Symposium on Computers and Communications Presenter: Ye-Zhi.

Author ： Xinming Chen,Kailin Ge,Zhen Chen and Jun Li Publisher ： ANCS, 2011 Presenter ： Tsung-Lin Hsieh Date ： 2011/12/14 1.

Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.

On RAM PRIORITY QUEUES MIKKEL THORUP. Objective Sorting is a basic technique for a lot of algorithms. e.g. find the minimum edge of the graph, scheduling,

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.

Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.

Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.

Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.

Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:

Mapping Genomes onto each other – Synteny detection CS 374 Aswath Manohar.

Space Efficient Alignment Algorithms and Affine Gap Penalties

1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:

1 A Linear Space Algorithm for Computing Maximal Common Subsequences Author: D.S. Hirschberg Publisher: Communications of the ACM 1975 Presenter: Han-Chen.

Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.

Sequence comparison: Local alignment

Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.

MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.

Author ： Ozgun Erdogan and Pei Cao Publisher ： IEEE Globecom 2005 (IJSN 2007) Presenter ： Zong-Lin Sie Date ： 2010/12/08 1.

Scalable Name Lookup in NDN Using Effective Name Component Encoding

Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units Kanupriya Gulati and Sunil P. Khatri Department of ECE, Texas A&M University,

Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,

Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.

GPEP : Graphics Processing Enhanced Pattern- Matching for High-Performance Deep Packet Inspection Author: Lucas John Vespa, Ning Weng Publisher: 2011 IEEE.

Parallel Characteristics of Sequence Alignments Kyle R. Junik.

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

A Pattern-Matching Scheme With High Throughput Performance and Low Memory Requirement Author: Tsern-Huei Lee, Nai-Lun Huang Publisher: TRANSACTIONS ON.

Streaming Big Data with Self-Adjusting Computation Umut A. Acar, Yan Chen DDFP January 2014 SNU IDB Lab. Namyoon Kim.

Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.

Search Algorithms Written by J.J. Shepherd. Sequential Search Examines each element one at a time until the item searched for is found or not found Simplest.

GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.

Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.

SWM: Simplified Wu-Manber for GPU- based Deep Packet Inspection Author: Lucas Vespa, Ning Weng Publisher: The 2012 International Conference on Security.

Gnort: High Performance Network Intrusion Detection Using Graphics Processors Date:101/2/15 Publisher:ICS Author:Giorgos Vasiliadis, Spiros Antonatos,

1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.

A new matching algorithm based on prime numbers N. D. Atreas and C. Karanikas Department of Informatics Aristotle University of Thessaloniki.

Efficient Signature Matching with Multiple Alphabet Compression Tables Publisher : SecureComm, 2008 Author : Shijin Kong,Randy Smith,and Cristian Estan.

1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,

Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating point.

NFV Compute Acceleration APIs and Evaluation

IM.Grid: A Grid Computing Solution for image processing

Introduction to Algorithms

Sequence comparison: Local alignment

Toward Advocacy-Free Evaluation of Packet Classification Algorithms

Cache Memory Presentation I

© 2013 Goodrich, Tamassia, Goldwasser

Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.

Parallel Computers Today

CSCI206 - Computer Organization & Programming

Regular Expression Matching in Reconfigurable Hardware

Algorithm design and Analysis

SigMatch Fast and Scalable Multi-Pattern Matching

Sequence Alignment with Traceback on Reconfigurable Hardware

CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.

Virtual Memory Hardware

2019/1/3 Exscind: Fast Pattern Matching for Intrusion Detection Using Exclusion and Inclusion Filters Next Generation Web Services Practices (NWeSP) 2011.

CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.

Quantum Computation and Information Chap 1 Intro and Overview: p 28-58

Contents Memory types & memory hierarchy Virtual memory (VM)

Ch. 2: Getting Started.

Author： Domenico Ficara ,Gianni Antichi ,Nicola Bonelli ,

Data Parallel Pattern 6c.1

COMP755 Advanced Operating Systems

Accelerating Regular Path Queries using FPGA

Presentation transcript:

Challenging Cloning Related Problems with GPU-Based Algorithms Authors : Thierry Lavoie、Michael Eilers-Smith、Ettore Merlo Publisher: ACM IWSC’10 Presenter: Ye-Zhi Chen Date: 2011/12/21

Introduction This paper describes an implementation of the Smith-Watterman algorithm for proper clone filtering

Algorithm To address the clone detection false positives problem by an appropriate filtering technique ; the DP-matching seemed to be an interesting choice - A B C X 1 ↖ ← ↑ 2 3

Algorithm

Algorithm GPU DP-matching ： Find what cells of the matrix are free of computational dependencies in order to compute their values on separate cores simultaneously It is simple to check that every cells on the anti-diagonals become free of any computational dependencies at the same moment because their value is solely dependent on the cells of the previous anti-diagonals.

Algorithm Let Vk represents the linear buffer computed at step k. Let fk be the following map between the Indexes of V and those of the matrix D : u can be seen as the index of threads , s1 and s2 ‘s first character are gaps

Algorithm - A B C X 1 ↖ ← ↑ 2 3

The characters which are compared top left Upper left

Algorithm Worst case problem: The worst case of the classical DP-matching algorithm has a quadratic running time. In the general worst case, the GPU-based implementation also has a running quadratic worst time. However, since a large number of cores perform the computation at the same time, the hidden quadratic constant can be divided by a large factor

Algorithm On very small instances of DP-matching problems, the CPU might outrun the GPU, mostly because of memory bandwidth limitations If computation on such very small instances is to be performed on a basis of one string matched against a set of strings, there’s a way of packing the data on the GPU to make the total computation more efficient.

p = len(ci) − max(len(cj)|cj ∈ C) Algorithm Let C be a set of strings and let c0 be an element of C. Lets define C’ as: C ’= C − {c0} The problem is then defined as matching c0 against all ci in C’. Practical implementations need to pad the strings to be matched.This will enforce the number of computational steps k to be the same in each sub matrix. The length of the padding p of a ci is defined as follow: p = len(ci) − max(len(cj)|cj ∈ C) Each padded ci of C’ is then concatenated to each other separated by a special blank character

k’s initial value is not 0,the initial value is |C’-1|*(max(len(ci)|ci∈C)+1) the number of computational steps k is reduced to 2*(max(len(ci)|ci ∈ C))-1

the indexes γ corresponding to these cells can be evaluated with this equation: γ = x ∗ (max(len(ci)|ci ∈ C) + 1) ∀ x ∈ {0..|C| − 1}

EXPERIMENTAL Equipment: Intel Core 2 Duo computer 3.00 GHz with 6MB of cache, 3GB of RAM and a GeForce 8800GT