FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi.

Slides:



Advertisements
Similar presentations
Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
Advertisements

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Authors: Raphael Polig, Kubilay Atasu, and Christoph Hagleitner Publisher: FPL, 2013 Presenter: Chia-Yi, Chu Date: 2013/10/30 1.
Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm.
A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:
Router Architecture : Building high-performance routers Ian Pratt
10.2 Characteristics of Computer Memory RAM provides random access Most RAM is volatile.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
1 An Evolution of Pattern Matching within Network Intrusion Detection Systems Erik Anderson 9 November 2006.
Processor Technology and Architecture
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
1 A CAM-based keyword match processor architecture Author: Long Bu, John A. Chandy * Publisher: Microelectronics Journal 37 (2006) Presenter: Han-Chen.
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
Computer Organization and Architecture
A Signature Match Processor Architecture for Network Intrusion Detection Janardhan Singaraju, Long Bu and John A. Chandy Electrical and Computer Engineering.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
 Author: Tsern-Huei Lee  Publisher: 2009 IEEE Transation on Computers  Presenter: Yuen-Shuo Li  Date: 2013/09/18 1.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Modular SRAM-based Binary Content-Addressable Memories Ameer M.S. Abdelhadi and Guy G.F. Lemieux Department of Electrical and Computer Engineering University.
Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs by Zachary K. Baker and Viktor K. Prasanna University of Southern California, Los.
Important Components, Blocks and Methodologies. To remember 1.EXORS 2.Counters and Generalized Counters 3.State Machines (Moore, Mealy, Rabin-Scott) 4.Controllers.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
MEMORY ORGANIZTION & ADDRESSING Presented by: Bshara Choufany.
1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.
4. Computer Maths and Logic 4.2 Boolean Logic Logic Circuits.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
Author : Ioannis Sourdis, Vasilis Dimopoulos, Dionisios Pnevmatikatos and Stamatis Vassiliadis Publisher : ANCS’06 Presenter : Zong-Lin Sie Date : 2011/01/05.
RTL Hardware Design by P. Chu Chapter Poor design practice and remedy 2. More counters 3. Register as fast temporary storage 4. Pipelined circuit.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
A Dynamic Longest Prefix Matching Content Addressable Memory for IP Routing Author: Satendra Kumar Maurya, Lawrence T. Clark Publisher: IEEE TRANSACTIONS.
STRING SEARCHING ENGINE FOR VIRUS SCANNING Author : Derek Pao, Xing Wang, Xiaoran Wang, Cong Cao, Yuesheng Zhu Publisher : TRANSACTIONS ON COMPUTERS, 2012.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Dr. Shi Dept. of Electrical and Computer Engineering.
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Unit 1 Lecture 4.
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.
Hardwired Control Department of Computer Engineering, M.S.P.V.L Polytechnic College, Pavoorchatram. A Presentation On.
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Introduction to Intrusion Detection Systems. All incoming packets are filtered for specific characteristics or content Databases have thousands of patterns.
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Author: Yun R. Qu, Shijie Zhou, and Viktor K. Prasanna Publisher:
FPGA Based String Matching For Network Processing Applications
Cache Memory Presentation I
Regular Expression Matching in Reconfigurable Hardware
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
Presentation transcript:

FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi ENGG*3050 RCS Winter 2014 March 24, 2014

Content Introduction String Lookup Cache ◦ Architectures ◦ System Interaction ◦ Systems comparison Network Intrusion Detection ◦ Architectures ◦ System Interaction ◦ Implementations Critique 1

Keywords Network processing String matching Content Addressable Memory (CAM) & Cache Bottlenecks Fixed-Size/Non-Fixed-Size keys Cascading, propagating Parallelism 2

Introduction String matching are used in search engines, and network intrusion detection Network processing applications require frequent string matching for specific keywords As networks gets faster, it becomes more difficult for GPP to keep up Bottlenecks are found in memory and also in slow implementation algorithms/methods 3

Current Implementations Software AlgorithmsHardware Implementation Rabin-Karp ◦ Compares hashes of inputs instead of direct character matching Knuth-Morris-Pratt ◦ Character by character matching; skips non-matching Boyer-Moore ◦ Uses pre-computed functions to determine shifting distance Finite automata methods ◦ Translates finite automata graphs to FPGA circuitry CAMs ◦ Caches and lookup tables ◦ Cellular automata ◦ Finite state machines 4

STRING LOOKUP CACHE Section I 5

String Lookup Cache Hardware implementation based on CAMs, cellular automaton and caching Caches retain frequently used values, reducing the need to constantly look up address values Compatible with parallel processing, prefix sharing and pattern partitioning Very high throughputs with low area overhead Drawback of CAMs and hardware caches is the reliance on fixed-size keys ◦ Implementations for non-fixed-size keys requires additional overhead 6

System Architecture 7

Content Addressable Memory Hardware implementation of 2D [associative] arrays/ADT In VLSI, the cells are transistors In an FPGA, storage cells are registers, comparators are XOR gates 8

CAM as Character Match Array (CMA) Takes characters from the network processor on successive clock cycles Columns corresponds to a character in keyword Input character is applied simultaneously to all n columns Column match signal becomes high if all input bits matches Storage cell used to indicate end of keyword 9

Processor Element (PE) Array An array of finite state machines that carries out the approximate match algorithm May contain multiple keywords from the CAM Takes the match signals from the CAM and sets a PE flag which are forwarded to subsequent PEs Evaluates entire input strings in linear time relative to the size of the input stream 10

CMA and PE Interaction 11

Map Table and Outputs The map table takes the PE# and outputs the address to the value or an indirect pointer to the value object The map table has as many slots as there are PEs If words are too long, it can cause holes in the map table 12

System Interaction 13

Implementations Comparison FPGA ImplementationSoftware Implementation Number of characters Slices Frequency (MHz) Time per search (ns) Throughput (Gb/s) Throughput (Gb/s) Searches per second 254 M318 M307 MSearches per second 887K766K632K Xilinx Virtex-II Pro FPGA (XC2VP230-7)1GHz PowerPC Computer 14

NETWORK INTRUSION DETECTION Section II 15

Network Intrusion Detection The process of identifying and analyzing packets that may contain threats to the organization’s network Time consuming process that grows quickly as defined rule-set or signatures grows large String matching is the most computationally intensive part of the intrusion detection ◦ Every incoming packet is compared against several pre-defined signatures 16

Problems in the CAM Architecture CAM-based designs cannot easily handle regular expressions NIDs signatures are not of a fixed-size ◦ (ie. CAM contains FOO and BAR, input stream is AFOOBARCD. In a 3-character size setup, the comparisons will be made against AFO, OBA and RCD; none of these will match and will slip right through the detection system) CAM arrays are very large in area 17

Proposed Solution Use discrete comparators instead of CAMs ◦ Sacrifices the ability to update signatures dynamically; a fair tradeoff as signatures change relatively infrequently Use p-rows of comparators for parallelism to match several characters in one clock cycle Remove the aligned keyword approach as incoming streams may not be aligned to a certain size boundary 18

System Architecture 19

Processor Architecture 20

Processor Architecture 21

Processor Element Flow Start at the beginning of the signature Based on previous PE and current PE If previous signal and current signal is a match, propagate match signal until end of signature At the end of the signature, if entire signature match, flag the sig_match output 22

Signature Match Processor Example Input string ‘144’ performed over 2 clock cycles ‘1’ is checked in first cycle, sets off a match signal into the SMA ‘4’ is checked in second cycle, sets off match signal into the SMA Match signal for ‘1’ is present from previous clock cycle 23

Signature Match Processor Example The ‘4’ is duplicated, so it simply propagates the first match signal to the second as a carry Since this is the end of the signature, the output is a match due to the propagated match signals && sig_end 24

Address Output Logic In order for the SMP to be useful, we also need to know which signatures caused the match This is handled by the word match buffer, which maintains the position of the signature match When the last character being processed has been reached, the match address output logic begins working on the buffer entries 25

Address Output Logic A binary tree is used for the matching signatures Decoding starts, and a signal is sent to the control circuitry stating there are matches A pointer then propagates up the tree, generating a bit of the final address based on matches Binary trees are fast and efficient, time to process is ~M cycles where M is the number of matches 26

FPGA Implementation As parallelism increases, throughput increases, frequency decreases due to complexity As characters increases, area increases, frequency decreases and throughput decreases 27

Implementation Comparison 28

Critique New terms and unknown works referred to Difficult to follow in some areas due to inconsistencies and how the topic is presented Lots of procedure / methodology on implementation Very detailed works Good examples to strengthen theoretical explanations Implementation data given for comparison purposes 29

QUESTIONS? 30

References All figures and information used in this presentation pulled from the article Janardhan Singaraju, John A. Chandy*, FPGA Based String Matching For Network Processing, ScienceDirect Microprocessors and Microsystems, December 14,