An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

Slides:



Advertisements
Similar presentations
Deep Packet Inspection: Where are We? CCW08 Michela Becchi.
Advertisements

Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.
Authors: Wei Lin, Bin Liu Publisher: ICPADS, 2008 (IEEE International Conference on Parallel and Distributed Systems) Presenter: Chia-Yi, Chu Date: 2014/03/05.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
An Efficient Regular Expressions Compression Algorithm From A New Perspective Authors : Tingwen Liu,Yifu Yang,Yanbing Liu,Yong Sun,Li Guo Tingwen LiuYifu.
Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
March 2006Vineet Bafna Designing Spaced Seeds March 2006Vineet Bafna Project/Exam deadlines May 2 – Send to me with a title of your project May.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
11 An Improved Algorithm to Accelerate Regular Expression Evaluation Authors: Michela Becchi and Patrick Crowley Publisher: ANCS’07 Present: Kia-Tso Chang.
A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
Aho-Corasick String Matching An Efficient String Matching.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Memory-Efficient Regular Expression Search Using State Merging Department of Computer Science and Information Engineering National Cheng Kung University,
 Author: Tsern-Huei Lee  Publisher: 2009 IEEE Transation on Computers  Presenter: Yuen-Shuo Li  Date: 2013/09/18 1.
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author: Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author : Michela Becchi 、 Patrick Crowley Publisher : ANCS’07 Presenter : Wen-Tse Liang.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
REGULAR LANGUAGES.
Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
A Regular Expression Matching Algorithm Using Transition Merging Department of Computer Science and Information Engineering National Cheng Kung University,
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author : Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
Deterministic Finite Automaton for Scalable Traffic Identification: the Power of Compressing by Range Authors: Rafael Antonello, Stenio Fernandes, Djamel.
Memory Compression Algorithms for Networking Features Sailesh Kumar.
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
Bit Weaving: A Non-Prefix Approach to Compressing Packet Classifiers in TCAMs Author: Chad R. Meiners, Alex X. Liu, and Eric Torng Publisher: 2012 IEEE/ACM.
Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Author : Yang Xu, Lei Ma, Zhaobo Liu, H. Jonathan Chao Publisher : ANCS 2011 Presenter : Jo-Ning Yu Date : 2011/12/28.
Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
LaFA Lookahead Finite Automata Scalable Regular Expression Detection Authors : Masanori Bando, N. Sertac Artan, H. Jonathan Chao Masanori Bando N. Sertac.
An Improved DFA for Fast Regular Expression Matching Author : Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.
Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Advanced Algorithms for Fast and Scalable Deep Packet Inspection Author : Sailesh Kumar 、 Jonathan Turner 、 John Williams Publisher : ANCS’06 Presenter.
Author : Tzu-Fang Sheu,Nen-Fu Huang and Hsiao-Ping Lee Publisher : IEEE Globecom, 2006 Presenter : Tsung-Lin Hsieh Date : 2012/05/16 1.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Deterministic Finite Automata Nondeterministic Finite Automata.
Efficient Signature Matching with Multiple Alphabet Compression Tables Publisher : SecureComm, 2008 Author : Shijin Kong,Randy Smith,and Cristian Estan.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Memory-Efficient Regular Expression Search Using State Merging
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Presentation transcript:

An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture for networking and communications systems, 2007 Presenter: Ching Hsuan Shih Date: 2014/02/26

Outline I. Introduction II. Motivation III. The Proposal IV. Reducing the Alphabet V. Encoding VI. Experimental Evaluation

I. Introduction Signature-based deep packet inspection has taken root as a dominant security mechanism in networking devices and computer systems. Regular expressions are more expressive than simple patterns of strings and therefore able to describe a wider variety of payload signatures. There has been a amount of recent work on implementing regular expressions, particularly with representations based on deterministic finite automata (DFA).

I. Introduction (Cont.) DFAs have attractive properties that explain the attention they have received. They have predictable and acceptable memory bandwidth requirements. For any given regular expression, a DFA with a minimum number of states can be found [3].

I. Introduction (Cont.) DFAs corresponding to large sets of regular expressions containing complex patterns can be prohibitively large in terms of numbers of states and transitions. Yu et al. [15] have proposed segregating rules into multiple groups and evaluating the corresponging DFAs concurrently. Delayed Input DFA (D 2 FA) [9] redundant transitions common to a pair of states with a single default transition.

I. Introduction (Cont.) D 2 FA has three weaknesses It requires a user-provided parameter value which can only be determined experimentally for a given rule-set. It creates a data-structure whose worst-case paths may be traversed for each input character processed. It requires multiple passes over large support data structures during the construction phase. We propose an improved simplified algorithm for building default transitions that addresses the problems above.

II. Motivation In this section, we describe the D 2 FA approach [9]. The basic goal of the D 2 FA is to reduce the amount of memory needed to represent all the state transitions in a DFA.

II. Motivation (Cont.) During the string matching operation, the traversal of D 2 FA will be performed according to the Aho-Corasick algorithm [1], treating default transitions as failure pointers. The heuristic proposed in [9] to build a D 2 FA can be explored systematically as a maximum spanning tree problem on an undirected graph. This maximum spanning tree problem can be solved with Kruskal’s algorithm [5].

II. Motivation (Cont.) After the operation of Kruskal’s algorithm, the root of each tree can be selected. The node having the smallest maximum distance from any vertices within the same tree is chosen. Direct all default transitions towards the root of the default transition tree. In order to limit the maximum default path length, a heuristic is proposed to address this problem by determining a maximum spanning tree forest with bounded diameter.

II. Motivation (Cont.)

III. The Proposal We now take advantage of a simple fact: DFA traversal always starts at a single initial state S 0 We propose a more general compression algorithm which leads to a traversal time bound independent of the maximum default transition path length.

III. The Proposal (Cont.) Definition: For each state s, we define its depth as the minimum number of states visited when moving from s 0 to s in the DFA.

III. The Proposal (Cont.) Lemma: With any string of length N, a 2N time bound is guaranteed on all D 2 FA having only “backwards” transitions. A string of length N implies N labeled transitions to be followed and the number of default transitions is always at least one less than the number of labeled transitions taken. For a string of length N, the total number of state traversals cannot be higher than 2N-1.

III. The Proposal (Cont.) 3.1 Problem Formulation The problem can be now formulated as an instance of maximum spanning tree on a directed graph.

III. The Proposal (Cont.) 3.2 An example

III. The Proposal (Cont.) 3.3 Algorithm The whole problem is reduced to having each state select the state with lower depth having the most number of outgoing transitions in common with it.

III. The Proposal (Cont.)

IV. Reducing the Alphabet The basic idea is the following: In an alphabet ∑, two symbols c i and c j will fall into the same class if they are treated the same way in all DFA states. In other words, given the transition function δ(states, Σ)→states, δ(s,c i )= δ(s,c j ) for each state s belonging to the DFA. In practical scenarios (ASCII alphabet) this table will contain 256 entries, with a maximum width of 1 byte (for heavily compressed alphabets 5-6 bits per character may suffice).

V. Encoding 5.1 Bitmaps A scheme [18] consists of associating a bitmap as large as the alphabet size to each DFA state. Bits corresponding to uncompressed labeled transitions present in the current state can be set to 1; the remaining bits are set to 0. State identifiers can be simply represented through their base address in memory. The length of the necessary bitmaps can substantially decrease after alphabet reduction.

V. Encoding (Cont.) 5.2 Content addressing A technique [16] consists in representing state identifiers with content labels, which are stored in memory as next state transitions. A state content label contains several fields: A state discriminator The list of characters for which a labeled transition is defined An identifier for the default transition state The size of a content label depends on the number of labeled transitions defined for the corresponging state.

VI. Experimental Evaluation

VI. Experimental Evaluation (Cont.)