Accelerated Spam Filtering with Enhanced KMP Algorithm on GPU

Slides:



Advertisements
Similar presentations
CS179: GPU Programming Lecture 5: Memory. Today GPU Memory Overview CUDA Memory Syntax Tips and tricks for memory handling.
Advertisements

Process management Information maintained by OS for process management  process context  process control block OS virtualization of CPU for each process.
03/20/2003Parallel IR1 Papers on Parallel IR Agenda Introduction Paper 1:Inverted file partitioning schemes in multiple disk systems Paper 2: Parallel.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
CPU & GPU Parallelization of Scrabble Word Searching Jonathan Wheeler Yifan Zhou.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Cliff Rhyne and Jerry Fu June 5, 2007 Parallel Image Segmenter CSE 262 Spring 2007 Project Final Presentation.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
1 Summary of lectures 1.Introduction to Algorithm Analysis and Design (Chapter 1-3). Lecture SlidesLecture Slides 2.Recurrence and Master Theorem (Chapter.
Tree-Based Density Clustering using Graphics Processors
CPSC 171 Introduction to Computer Science 3 Levels of Understanding Algorithms More Algorithm Discovery and Design.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Heterogeneity-Aware Peak Power Management for Accelerator-Based Systems Gui-Bin.
Boyer Moore Algorithm Idan Szpektor. Boyer and Moore.
GPU Architecture and Programming
GPEP : Graphics Processing Enhanced Pattern- Matching for High-Performance Deep Packet Inspection Author: Lucas John Vespa, Ning Weng Publisher: 2011 IEEE.
Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill Proceedings. The 25th Annual International Symposium.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
IP Routing Processing with Graphic Processors Author: Shuai Mu, Xinya Zhang, Nairen Zhang, Jiaxin Lu, Yangdong Steve Deng, Shu Zhang Publisher: IEEE Conference.
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
CS4432: Database Systems II Query Processing- Part 2.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Vector and symbolic processors
ACCELERATING QUERY-BY-HUMMING ON GPU Pascal Ferraro, Pierre Hanna, Laurent Imbert, Thomas Izard ISMIR 2009 Presenter: Chung-Che Wang (Focus on the performance.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Gnort: High Performance Network Intrusion Detection Using Graphics Processors Date:101/2/15 Publisher:ICS Author:Giorgos Vasiliadis, Spiros Antonatos,
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
1 Munther Abualkibash University of Bridgeport, CT.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Rabin & Karp Algorithm. Rabin-Karp – the idea Compare a string's hash values, rather than the strings themselves. For efficiency, the hash value of the.
University of Maryland Baltimore County
A Simple Approach for Author Profiling in MapReduce
These slides are based on the book:
TensorFlow– A system for large-scale machine learning
CE 454 Computer Architecture
Computer Architecture Chapter (14): Processor Structure and Function
COMPUTATIONAL MODELS.
Lecture – 2 on Data structures
Summary of lectures Introduction to Algorithm Analysis and Design (Chapter 1-3). Lecture Slides Recurrence and Master Theorem (Chapter 4). Lecture Slides.
Spatial Analysis With Big Data
Genomic Data Clustering on FPGAs for Compression
Introduction to Parallelism.
Parallel Sorting Algorithms
Rabin & Karp Algorithm.
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
CSCI1600: Embedded and Real Time Software
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Knuth-Morris-Pratt KMP algorithm. [over binary alphabet]
Parallel Analytic Systems
Amir Kamil and Katherine Yelick
Speech recognition, machine learning
Parallel k-means++ for Multiple Shared-Memory Architectures
EE 193: Parallel Computing
CSCI1600: Embedded and Real Time Software
Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Speech recognition, machine learning
ESE532: System-on-a-Chip Architecture
Presentation transcript:

Accelerated Spam Filtering with Enhanced KMP Algorithm on GPU Krishna Kalubandi Varalakshmi M Vellore Institute of Technology, VIT University, Vellore.

Spam Filtering Keyword Based Statistics Based Pattern Matching Algorithms and Counters

KMP Algorithm The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst case complexity to O(n).  Can we do better? Call KMP Algorithm in parallel, distribute the text. Handle boundary conditions.

Enhanced Algorithm to suite GPUs ParallelKMP(start, end, text, pattern, table, result) Table – Stores boundary matches Result – An array to store the positions of the matches   Begin j = start, k = end-1; m = pattern.size(), n = 0; left = 0, right = 0, count = 0; for( i = start; i < end; i++) { //do regular kmp search //if match count++; //boundary case x = m – (n + 1); if pattern[n] == text[j] { left++, j++ ;} else if(left) { left = 0; j = start; } if pattern[x] == text[k] {right++, k--;} else if(right) { right = 0; k = end-1;} n++; } Next table[threadIndx* 2] = left table[threadIndx* 2 + 1] = right result[threadIndx] = count End

Additional Enhancements and Results Overlapping Data Transfer Maintain ideal thread count to work per thread ratio

Spam Filter Naïve Bayesian Bag of Words for every word approach Clean the dataset using external libraries Get a list of most common spam triggering keywords For each email in dataset, get the count of matching keywords using parallel KMP. Train and test the model

Spam Filter Results Compared. Type Processor Word Type Run Time (s) Accuracy Serial CPU Every Word 27.72 92% Parallel CPU + GPU Keywords 13.56 85%

Future Work Prediction of the correct multiplier for a particular pattern size based on GPU architecture. For text sizes that are very large and cannot be fit into the GPU memory at once, overlapping data transfer can be further optimized. Parallel pattern matching on GPU clusters can also be explored. Spam filtering can also be further extended to match sequence of patterns instead of only words to reduce Bayesian noise. The accuracy of the existing algorithm can also be improved by striking a balance between the execution time and the required accuracy.

Questions