Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.

Slides:

Advertisements

Similar presentations

Advertisements

Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood.

Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection Nan Hua 1, Haoyu Song 2, T. V. Lakshman 2 1 Georgia Tech, 2 Bell Labs, Alcatel-Lucent.

Authors: Wei Lin, Bin Liu Publisher: ICPADS, 2008 (IEEE International Conference on Parallel and Distributed Systems) Presenter: Chia-Yi, Chu Date: 2014/03/05.

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna

Massively Parallel Cuckoo Pattern Matching Applied For NIDS/NIPS  Author: Tran Ngoc Thinh, Surin Kittitornkun  Publisher: Electronic Design, Test and.

1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.

Authors: Raphael Polig, Kubilay Atasu, and Christoph Hagleitner Publisher: FPL, 2013 Presenter: Chia-Yi, Chu Date: 2013/10/30 1.

A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:

Pipelined Parallel AC-based Approach for Multi-String Matching Department of Computer Science and Information Engineering National Cheng Kung University,

1 An Evolution of Pattern Matching within Network Intrusion Detection Systems Erik Anderson 9 November 2006.

Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.

A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:

Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.

Fast Packet Classification Using Bloom filters Author: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, John Lockwood Publisher: Architecture for networking.

1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.

Pipelined Architecture For Multi-String Match Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Aho-Corasick String Matching An Efficient String Matching.

1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman

ECE 526 – Network Processing Systems Design Network Security: string matching algorithm Chapter 17: George Varghese.

1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:

A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.

Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.

1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,

Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.

Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.

A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.

 Author: Tsern-Huei Lee  Publisher: 2009 IEEE Transation on Computers  Presenter: Yuen-Shuo Li  Date: 2013/09/18 1.

Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,

Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.

 Author: Kubilay Atasu, Florian Doerfler, Jan van Lunteren, Christoph Hagleitner  Publisher: 2013 FPL  Presenter: Yuen-Shuo Li  Date: 2013/10/30 1.

Author ： Ozgun Erdogan and Pei Cao Publisher ： IEEE Globecom 2005 (IJSN 2007) Presenter ： Zong-Lin Sie Date ： 2010/12/08 1.

Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,

An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.

Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:

Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.

1 Memory Management (b). 2 Paging  Logical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter.

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:

06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],

9.1 Operating System Concepts Paging Example. 9.2 Operating System Concepts.

Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.

A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.

Author : Yang Xu, Lei Ma, Zhaobo Liu, H. Jonathan Chao Publisher : ANCS 2011 Presenter : Jo-Ning Yu Date : 2011/12/28.

Author ： Randy Smith & Cristian Estan & Somesh Jha Publisher ： IEEE Symposium on Security & privacy,2008 Presenter ： Wen-Tse Liang Date ： 2010/10/27.

Author : Weirong Jiang, Yi-Hua E. Yang, and Viktor K. Prasanna Publisher : IPDPS 2010 Presenter : Jo-Ning Yu Date : 2012/04/11.

TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.

A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:

Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.

Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]

Gnort: High Performance Network Intrusion Detection Using Graphics Processors Date:101/2/15 Publisher:ICS Author:Giorgos Vasiliadis, Spiros Antonatos,

Lecture #5 Advanced Computation Theory Finite Automata.

Kleene’s Theorem and NFA

Non Deterministic Automata

Paging and Segmentation

Regular Expression Matching in Reconfigurable Hardware

Statistical Optimal Hash-based Longest Prefix Match

Lecture 3: Main Memory.

Packet Classification Using Coarse-Grained Tuple Spaces

Author： Domenico Ficara ,Gianni Antichi ,Nicola Bonelli ,

Compact DFA Structure for Multiple Regular Expressions Matching

Pipelined Architecture for Multi-String Matching

Worst-Case TCAM Rule Expansion

Presentation transcript:

Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29

  Simple Multipattern Matching Algorithm  Scaling to Arbitrary String Lengths  Basic Ideas  Data Structures and Algorithm  Evaluation with Snort Outline 2

  Notation  T[i…j] : the substring of T starting at location i and ending at j.  Method  Use hash tables to maintain for each set of strings with different lengths.  Let L be the largest length of any string in S.  When we consider a window of L bytes starting at location i, i.e., T[i…(i + L - 1)], we have L possible prefix strings T[i…(i + L -1)], T[i…(i + L - 2)], …,T[i…i] each of which must be looked up individually in the corresponding hash table. Simple Multi-pattern Matching Algorithm 3

  After all these prefixes have been looked up we can move the window forward by a single byte such that now we consider bytes from location i + 1 forming the window T[(i+1)…(i+L)].  Secondly, for applications such as NIDS, most of the network data does not contain any string of interest, so we use Bloom filters to reduce the unsuccessful hash table accesses.  We maintain a separate Bloom filter corresponding to each hash table kept in the off-chip memory.  Before we execute a hash table query, we first query the corresponding on-chip Bloom filter to check if the string under inspection is present in the off-chip hash table. 4 Simple Multi-pattern Matching Algorithm

 5

  Use r = 4 parallel machines. 6 Simple Multi-pattern Matching Algorithm

  Hash computation of the DetectString1 algorithm over long strings (like 100 characters) consumes more resources and introduces more latency.  We extend the algorithm to handle arbitrarily large strings. 7 Simple Multi-pattern Matching Algorithm

  Background  Aho-Corasick automaton example:  Basic ideas  To split up longer strings into multiple fixed length shorter segments and use our first algorithm to match the individual segments. 8 Scaling to Arbitrary String Lengths

  For instance, if we can detect strings of up to length four characters using Bloom filter technique and if we would like to detect a string “technically” then we cut this string into three pieces “tech”, “nica” and “lly”. These segments can be stitched in a chain and represented as a small state machine with four states q0, q1, q2 and q3. 9 Scaling to Arbitrary String Lengths

 l lly e  We first segment each string into k character segments (k = 4 for the purpose of illustration) and the left over portion which we call tails.  We treat each of these k-character segments as a symbol.  There are six unique symbols in the given example namely {tech, nica, tele, phon, elep, hant} and the tails associated with these strings are {l, lly, e}. 10 Scaling to Arbitrary String Lengths Construction Symbol Tail

 ＊ We have shown the failure transition to only non-q0 states for clarity in the figure. 11 Scaling to Arbitrary String Lengths

 ＊ The tails do not create any state to which the automaton jumps. ＊ Tails simply indicate the completion of a string. 12 Scaling to Arbitrary String Lengths  Since this algorithm essentially builds an Aho-Corasick Non- deterministic Finite Automaton (NFA) that jumps ahead by k characters at a time, we refer to this machine as Jump-ahead Aho- Corasick NFA (JACK- NFA).

 13 Search Scaling to Arbitrary String Lengths ＊ Example 1: Text : technically tech nica lly

  If a string were to appear in the middle of the k character group, it will not be detected.  For instance, if we begin scanning text “xytechnical...,” then the machine will view the text as “xyte chni cal...,” and since “xyte” is not the beginning of any string and is not a valid 4-character segment associated with state q0, the automaton will never jump to a valid state causing the machine to miss the detection. 14 Scaling to Arbitrary String Lengths Problem

  We deploy k machines each of which scans the text with one byte offset.  In our example, if we deploy 4 machines, then the first machine scans the text as “xyte chni cal…,” whereas the second machine scans it as “ytec hnic al...,” and the third machine scans it as “tech nica l…,” and finally the fourth machine scans it as “echn ical...”. Since the string appears at the correct boundary for the third machine, it will be detected by it.  Therefore, by using k machines in parallel, we will never miss detection of a string. 15 Scaling to Arbitrary String Lengths Solution

  These k machines need not be physical machines. They can be four virtual machines emulated by a single physical machine. 16 Scaling to Arbitrary String Lengths consume one character at a time

  To shift the text stream by r characters at a time, we use r physical machines, which emulate k virtual machines,where r ≦ k. 17 Scaling to Arbitrary String Lengths consume two characters at a time

 18 Scaling to Arbitrary String Lengths  Data Structures and Algorithm

 19 Scaling to Arbitrary String Lengths  LPM2 can be performed in the same way as LPM1 algorithm with the exception that we have to check for the prefixes (tails) associated with only the current state of the machine. k = 4 j i

  Implementation of a physical machine. 20 Scaling to Arbitrary String Lengths

  We model the efficiency of our architecture in terms of the number of memory accesses required per hash table probe.  Notation f i : false positive probability of Bloom filter i p i : true positive probability of Bloom filter i t s : memory accesses required for a successful search in any hash table t u : memory accesses required for an unsuccessful search in any hash table T i : the cumulative memory accesses required in hash table accesses starting from Bloom filter i down to 1. Analysis

  The strings of NIDS to be searched appear very rarely in the streaming data. With a carefully constructed hash table, the accesses required for a successful search and an unsuccessful search is approximately the same [10]. Hence we will assume that t u = t s = 1. →  We can simplify the expression above by deriving a pessimistic upper bound on Ti through the following theorem: → 22 Analysis - 1

  Notation r : number of parallel engines τ : the time required for one memory access while manipulating a hash entry. F : the system clock frequency(assume that the system clock frequency is the same as that of the off-chip SRAM memory used to store the hash tables.)  Throughput → 23 Analysis - 1

  Notation  i : Bloom filter No.  h : number of hash functions  n : number of strings  m : the optimal amount of memory p 1 = 1/100 τ = 8ns, F = 250MHz Evaluation with Snort

 21 Evaluation with Snort - 1 ＊ The string length up to 16 bytes.

 22 Evaluation with Snort - 2 ＊ K = 16