Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

Shift-based Pattern Matching for Compressed Web Traffic Presented by Victor Zigdon 1* Joint work with: Dr. Anat Bremler-Barr 1* and Yaron Koral 2 The SPC.
Multipattern String Matching On A GPU Author: Xinyan Zha, Sartaj Sahni Publisher: 16th IEEE Symposium on Computers and Communications Presenter: Ye-Zhi.
Space-Time Tradeoffs in Software-based Deep Packet Inspection Author: Anat Bremler-Barr, Yotam Harchol, and David Hay Published in Proc. IEEE HPSR 2011.
Two-dimensional pattern matching M.G.W.H. van de Rijdt 23 August 2005.
An On-Chip IP Address Lookup Algorithm Author: Xuehong Sun and Yiqiang Q. Zhao Publisher: IEEE TRANSACTIONS ON COMPUTERS, 2005 Presenter: Yu Hao, Tseng.
THE CHURCH-TURING T H E S I S “ TURING MACHINES” Pages COMPUTABILITY THEORY.
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Author: Anat Bremler-Barr, Yaron Koral, Shimrit Tzur David, David Hay Publisher:
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
EECS Presentation Web Tap: Intelligent Intrusion Detection Kevin Borders.
1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.
1 Efficient String Matching : An Aid to Bibliographic Search Alfred V. Aho and Margaret J. Corasick Bell Laboratories.
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
An Efficient and Scalable Pattern Matching Scheme for Network Security Applications Department of Computer Science and Information Engineering National.
Aho-Corasick String Matching An Efficient String Matching.
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
SHOCK: A Worst-Case Ensured Sub-linear Time Pattern Matching Algorithm for Inline Anti-Virus Scanning Author: Nen-Fu Huang, Wen-Yen Tsai Publisher: IEEE.
CSE7701: Research Seminar on Networking
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan, Timothy Sherwood Appeared in ISCA 2005 Presented by: Sailesh.
Information and Coding Theory Heuristic data compression codes. Lempel- Ziv encoding. Burrows-Wheeler transform. Juris Viksna, 2015.
CMPS 3223 Theory of Computation
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
Author : Ozgun Erdogan and Pei Cao Publisher : IEEE Globecom 2005 (IJSN 2007) Presenter : Zong-Lin Sie Date : 2010/12/08 1.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Shift-based Pattern Matching for Compressed Web Traffic Author: Anat Bremler-Barr, Yaron Koral,Victor Zigdon Publisher: IEEE HPSR,2011 Presenter: Kai-Yang,
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
Efficient Processing of Multi-Connection Compressed Web Traffic Yaron Koral 1 with: Yehuda Afek 1, Anat Bremler-Barr 1 * 1 Blavatnik School of Computer.
THE CHURCH-TURING T H E S I S “ TURING MACHINES” Part 1 – Pages COMPUTABILITY THEORY.
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author : Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
A Pattern-Matching Scheme With High Throughput Performance and Low Memory Requirement Author: Tsern-Huei Lee, Nai-Lun Huang Publisher: TRANSACTIONS ON.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
Author : Yang Xu, Lei Ma, Zhaobo Liu, H. Jonathan Chao Publisher : ANCS 2011 Presenter : Jo-Ning Yu Date : 2011/12/28.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Compressing Bi-Level Images by Block Matching on a Tree Architecture Sergio De Agostino Computer Science Department Sapienza University of Rome ITALY.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Gnort: High Performance Network Intrusion Detection Using Graphics Processors Date:101/2/15 Publisher:ICS Author:Giorgos Vasiliadis, Spiros Antonatos,
Deep Packet Inspection as a Service Author : Anat Bremler-Barr, Yotam Harchol, David Hay and Yaron Koral Conference: ACM 10th International Conference.
Lecture Three: Finite Automata Finite Automata, Lecture 3, slide 1 Amjad Ali.
CSCI 2670 Introduction to Theory of Computing
Advanced Algorithms Analysis and Design
Information and Coding Theory
James Logan CS526 Dr. Chow April 29, 2009
Authors Bo Sun, Fei Yu, Kui Wu, Yang Xiao, and Victor C. M. Leung.
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Hierarchy of languages
THEORY OF COMPUTATION Lecture One: Automata Theory Automata Theory.
Speculative Parallel Pattern Matching
DEFLATE Algorithm Kent.
Author: Yaron Weinsberg ,Shimrit Tzur-David ,Danny Dolev and Tal Anker
Presentation transcript:

Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr, A. ; Interdiscipl. Center, Efi Arazi Sch. of Comput. Sci., Herzlia, Israel ; Koral, Y. 1

Introduction At the heart of almost every modern security tool is a pattern-matching algorithm. Multipattern matching on compressed traffic requires two time-consuming phases, namely traffic decompression and pattern matching. Most current security tools either ignore scanning compressed traffic or disable the option for compressed traffic. 2

LZ77 Algorithm The LZ77 algorithm compresses data that has already appeared in the past (a sliding window of the last 32 kB of data) by encoding it with a pair (distance, length), where distance is a number in [1, 32768] length is a number in [3, 258] 3

Examples abcdefabcd abcdef(6, 4) Test Test</[6;12] Blah blah blah blah blah! Blah b[D=5, L=18]! 4

Aho–Corasick (AC) algorithm The basic AC algorithm constructs a deterministic finite automaton (DFA) for detecting all occurrences of given patterns by processing the input in a single pass. 5

AC DFA for patterns “abcd”, “nba”, “nbc” 6

AC DFA for patterns “arrows”, “row”, “sun”, “under” 7

Failure Function 8

Example: arrowsunderows 9

AC DFA for patterns “WOMAN”, “MAN”, “MEAT”, “ANIMAL” 10

Transition Table of Automata 11

Failure Function 12

Failure Function Table 13

The AC algorithm requires significantly more time than decompression. Decompression is based on consecutive memory reading from the sliding window, hence it has low read-per-byte cost. The AC algorithm employs a very large DFA that is accessed with random memory reads, which typically does not fit in cache, thus requiring main memory accesses. 14

Aho–Corasick-based algorithm on Compressed HTTP (ACCH) The basic observation is that if the referred string does not completely contain matched patterns, then the pointer also contains none. The algorithm may skip scanning bytes in the uncompressed data where the pointer occurs. 15

Three Special Cases 1.The pattern starts prior to the pointer, and only its suffix is in the pointer. 2.The pointer contains a pattern prefix, and its remaining bytes occur after. 16

In order to detect those patterns, we need to scan a few bytes within the pointer starting and ending points denoted by pointer left boundary and right boundary scan areas, respectively. If no matches occurred within the referred string, the algorithm skips the remaining bytes between the pointer boundaries denoted by internal area. 17

18

3.A pattern ends within the referred string. 19

Left Boundary (First Case) The algorithm should continue scanning the pointer bytes (in the uncompressed data) as long as the number of bytes scanned within the pointer is smaller than the current DFA state’s depth. 20

21

Right Boundary (Second Case) If the depth of the corresponding byte is below some constant CDepth, this byte is marked with a status of Uncheck; otherwise the byte is marked Check. The algorithm locates the last occurrence of an Uncheck status position within the referred string. Let unchkPos be the corresponding position within the pointer. The scan is resumed from unchkPos – CDepth + 2 bytes prior to the pointer end, and the DFA state is set to start state. 22

Internal Area (Third Case) A byte with a Match status means that a pattern (or more) ends at its location. Let matchPos be the position within the pointer, corresponding to the position of the Match status within the referred string. Using the status information, we could detect whether an entire pattern was referred to by the pointer by scanning a few bytes prior to the matchPos in the same manner as in the case of the right boundary. 23

24

Experimental Results Two data sources: traffic that was captured by a corporate firewall for 15 min and a list of the most popular Web pages taken from the Alexa Web site. Two signature data sets: one of ModSecurity, an open-source Web application firewall, and the other of Snort, an open-source network intrusion prevention and detection system. 25

26

27

Conclusion Surprisingly, it is faster to do pattern matching on compressed data, with the penalty of decompression, than doing pattern matching on uncompressed traffic. 28

Reference Dictionary Matching Automata The Aho- Corasick Algorithm Dictionary Matching Automata The Aho- Corasick Algorithm Importance of Aho-Corasick String Matching Algorithm in Real World Applications Importance of Aho-Corasick String Matching Algorithm in Real World Applications 演算法筆記 - String Matching 演算法筆記 - String Matching 29