1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: 2009.9.1 Publisher/Conf.

Slides:



Advertisements
Similar presentations
Shift-based Pattern Matching for Compressed Web Traffic Presented by Victor Zigdon 1* Joint work with: Dr. Anat Bremler-Barr 1* and Yaron Koral 2 The SPC.
Advertisements

Space-Time Tradeoffs in Software-based Deep Packet Inspection Author: Anat Bremler-Barr, Yotam Harchol, and David Hay Published in Proc. IEEE HPSR 2011.
Authors: Wei Lin, Bin Liu Publisher: ICPADS, 2008 (IEEE International Conference on Parallel and Distributed Systems) Presenter: Chia-Yi, Chu Date: 2014/03/05.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
CSCI 3280 Tutorial 6. Outline  Theory part of LZW  Tree representation of LZW  Table representation of LZW.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Author: Anat Bremler-Barr, Yaron Koral, Shimrit Tzur David, David Hay Publisher:
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.
A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:
Pipelined Parallel AC-based Approach for Multi-String Matching Department of Computer Science and Information Engineering National Cheng Kung University,
An Efficient IP Address Lookup Algorithm Using a Priority Trie Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Mar. 11, 2008.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
1 FPGA-based ROM-free network intrusion detection using shift-OR circuit Department of Computer Science and Information Engineering National Cheng Kung.
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
An Efficient and Scalable Pattern Matching Scheme for Network Security Applications Department of Computer Science and Information Engineering National.
Pipelined Architecture For Multi-String Match Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Aho-Corasick String Matching An Efficient String Matching.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,
Memory-Efficient Regular Expression Search Using State Merging Department of Computer Science and Information Engineering National Cheng Kung University,
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Data Compression Basics & Huffman Coding
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
Gzip Compression and Decompression 1. Gzip file format 2. Gzip Compress Algorithm. LZ77 algorithm. LZ77 algorithm.Dynamic Huffman coding algorithm.Dynamic.
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author: Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Shift-based Pattern Matching for Compressed Web Traffic Author: Anat Bremler-Barr, Yaron Koral,Victor Zigdon Publisher: IEEE HPSR,2011 Presenter: Kai-Yang,
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
A Regular Expression Matching Algorithm Using Transition Merging Department of Computer Science and Information Engineering National Cheng Kung University,
High-Speed Packet Classification Using Binary Search on Length Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Jan. 14, 2008.
Efficient Processing of Multi-Connection Compressed Web Traffic Yaron Koral 1 with: Yehuda Afek 1, Anat Bremler-Barr 1 * 1 Blavatnik School of Computer.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
Bahareh Sarrafzadeh 6111 Fall 2009
Author : Yang Xu, Lei Ma, Zhaobo Liu, H. Jonathan Chao Publisher : ANCS 2011 Presenter : Jo-Ning Yu Date : 2011/12/28.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Lossy Compression of Packet Classifiers Author: Ori Rottenstreich, J’anos Tapolcai Publisher: 2015 IEEE International Conference on Communications Presenter:
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
An Improved Multi-Pattern Matching Algorithm for Large-Scale Pattern Sets Author : Zhan Peng, Yu-Ping Wang and Jin-Feng Xue Conference: IEEE 10th International.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
15-853Page :Algorithms in the Real World Data Compression III Lempel-Ziv algorithms Burrows-Wheeler Introduction to Lossy Compression.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Hierarchical Hybrid Search Structure for High Performance Packet Classification Authors : O˜guzhan Erdem, Hoang Le, Viktor K. Prasanna Publisher : INFOCOM,
Deep Packet Inspection as a Service Author : Anat Bremler-Barr, Yotam Harchol, David Hay and Yaron Koral Conference: ACM 10th International Conference.
COMP261 Lecture 22 Data Compression 2.
Data Compression.
A DFA with Extended Character-Set for Fast Deep Packet Inspection
Scalable Memory-Less Architecture for String Matching With FPGAs
Memory-Efficient Regular Expression Search Using State Merging
Number Systems Instructions, Compression & Truth Tables.
A New String Matching Algorithm Based on Logical Indexing
DEFLATE Algorithm Kent.
Pipelined Architecture for Multi-String Matching
Author: Yaron Weinsberg ,Shimrit Tzur-David ,Danny Dolev and Tal Anker
Design principles for packet parsers
Table 3. Decompression process using LZW
CPS 296.3:Algorithms in the Real World
Presentation transcript:

1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf. : IEEE INFOCOM 2009 April 2009 Page(s): Dept. of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

2 Outline 1. Introduction 2. Background 3. Naive Decompression with Aho-Corasick V.S Compressed HTTP based Aho-Corasick 4. Experiment 5. Conclusions

3 Introduction HTTP compression, also known as content encoding, is a publicly defined way to compress textual content transferred from web servers to browsers. This standards-based method of delivering compressed content is built into HTTP 1.1, and most modern browsers that support HTTP 1.1 support GZIP compression.

4 Introduction ( 一 ) LZ77 Compression LZ77 compression technique is that we can compress a series of bytes (characters) if we spot that this series of bytes has already appeared in the past. For example: the text: ”abcdefabcd”, will be compressed to: ”abcdef(6,4)”, i.e., return 6 bytes and copy 4 bytes from that point. ( 二 ) Huffman Coding In HTTP, Huffman encodes uncompressed bytes and pointers (i.e., as numbers). GZIP compression ( encode )

5 Introduction 1) Remove the HTTP header and store the Huffman dictionary of the specific session in memory. Note that different HTTP sessions would have different Huffman dictionaries. 2) Decode the Huffman mapping of each symbol to the original byte or pointer representation using the specific Huffman dictionary table. 3) Decode the LZ77 part. 4) Perform multi-patterns matching on the uncompressed traffic. GZIP compression ( decode )

6 Background we encode this series of bytes (denoted by repeated string) by the pair (distance,length) (1) distance is a number between (32KB) indicates the distance in bytes of the repeated string (2) length is a number between indicates the length of the string in bytes. LZ77 Compression U – uncompressed traffic

7 Naive Decompression with Aho-Corasick pattern matching Pattern set ={ abcd, nba, nbc} distance length u => unmatch m=> match

8 Naive Decompression with Aho-Corasick pattern matching Define ( ㄧ ) Trf - the input, compressed traffic. (after Huffman decompression) ( 二 ) SW in1···32KB - the sliding window of LZ77.(stored pointer data) ( 三 ) SW inj -is the information about the uncompressed byte which is located j bytes before current byte. ( 四 ) FSM (state, byte) - AC FSM receives state and byte and returns the next state, where startStateFSM is the initial FSM state. (failure function & transition function) ( 五 ) Match (state) - if state is ”match state” it stores information about the matched pattern, otherwise NULL (output function)

9 Naive Decompression with Aho-Corasick pattern matching Not pointer area pointer area ( repeated string ) Aho-Corasickbased algorithm Decompress traffic

10 Compressed HTTP based Aho-Corasick Improve method ( ㄧ ) Our key idea is to store data which is produced by scanning uncompressed traffic. ( 二 ) We will use the stored data either to find a possible match or to skip this traffic, if pattern matching algorithm encounters the scanned traffic.

11 Compressed HTTP based Aho-Corasick Define ( ㄧ ) Trf - the input, compressed traffic. (after Huffman decompression) ( 二 ) SW in1···32KB - the sliding window of LZ77.(stored pointer data) ( 三 ) SW inj -is the information about the uncompressed byte which is located j bytes before current byte. ( 四 ) FSM (state, byte) - AC FSM receives state and byte and returns the next state, where startStateFSM is the initial FSM state. (failure function & transition function) ( 五 ) Match (state) - if state is ”match state” it stores information about the matched pattern, otherwise NULL (output function)

12 Compressed HTTP based Aho-Corasick Define ( 六 ) Depth - the depth of a state s is defined as the number of edges in the shortest simple route between the start state to state s in the FSM. ( 七 ) CDepth - a constant parameter of improve algorithm (threshold) ( 八 ) SW inj - the information about the j th byte is a record of two: SW inj.b – byte and SW inj.st –status

13 Compressed HTTP based Aho-Corasick CDepth threshold improve

14 Not pointer area pointer area ( repeated string ) Case 1 left boundary

15 Experiment

16 Experiment CDepth equals 0 represents the naive algorithm SNORT => published rules on June 08 ModSecurity => open source web application firewall. Reduced set =>snort rules removed no effect on the textual HTML search

17 Conclusion Our algorithm, achieves elimination of up to 75% of data scans based on information stored in the compressed data and gain up to 70% improvement in the performance of multi-patterns matching algorithm. As far as we know we are the first paper, that analyzes the problem of ’on-the-fly’ multi- patterns matching algorithms on compressed HTTP traffic, and suggest a solution.