Shift-based Pattern Matching for Compressed Web Traffic Author: Anat Bremler-Barr, Yaron Koral,Victor Zigdon Publisher: IEEE HPSR,2011 Presenter: Kai-Yang,

Slides:



Advertisements
Similar presentations
Shift-based Pattern Matching for Compressed Web Traffic Presented by Victor Zigdon 1* Joint work with: Dr. Anat Bremler-Barr 1* and Yaron Koral 2 The SPC.
Advertisements

Information Retrieval in Practice
Space-Time Tradeoffs in Software-based Deep Packet Inspection Author: Anat Bremler-Barr, Yotam Harchol, and David Hay Published in Proc. IEEE HPSR 2011.
Massively Parallel Cuckoo Pattern Matching Applied For NIDS/NIPS  Author: Tran Ngoc Thinh, Surin Kittitornkun  Publisher: Electronic Design, Test and.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Author: Anat Bremler-Barr, Yaron Koral, Shimrit Tzur David, David Hay Publisher:
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.
Procedures of Extending the Alphabet for the PPM Algorithm Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
Compression & Huffman Codes
Lempel-Ziv Compression Techniques Classification of Lossless Compression techniques Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 LZ78 Encoding Algorithm.
Compression Word document: 1 page is about 2 to 4kB Raster Image of 1 page at 600 dpi is about 35MB Compression Ratio, CR =, where is the number of bits.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
CS336: Intelligent Information Retrieval
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
Data Representation CS105. Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
Lossless Data Compression Using run-length and Huffman Compression pages
1 Scalable Pattern-Matching via Dynamic Differentiated Distributed Detection (D 4 ) Author: Kai Zheng, Hongbin Lu Publisher: GLOBECOM 2008 Presenter: Han-Chen.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Gzip Compression and Decompression 1. Gzip file format 2. Gzip Compress Algorithm. LZ77 algorithm. LZ77 algorithm.Dynamic Huffman coding algorithm.Dynamic.
VPC3: A Fast and Effective Trace-Compression Algorithm Martin Burtscher.
SHOCK: A Worst-Case Ensured Sub-linear Time Pattern Matching Algorithm for Inline Anti-Virus Scanning Author: Nen-Fu Huang, Wen-Yen Tsai Publisher: IEEE.
Lecture 10 Data Compression.
Data Compression For Images. Data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units)
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,
Scalable Name Lookup in NDN Using Effective Name Component Encoding
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Efficient Processing of Multi-Connection Compressed Web Traffic Yaron Koral 1 with: Yehuda Afek 1, Anat Bremler-Barr 1 * 1 Blavatnik School of Computer.
Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
A Pattern-Matching Scheme With High Throughput Performance and Low Memory Requirement Author: Tsern-Huei Lee, Nai-Lun Huang Publisher: TRANSACTIONS ON.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
Bit Weaving: A Non-Prefix Approach to Compressing Packet Classifiers in TCAMs Author: Chad R. Meiners, Alex X. Liu, and Eric Torng Publisher: 2012 IEEE/ACM.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
An Improved Multi-Pattern Matching Algorithm for Large-Scale Pattern Sets Author : Zhan Peng, Yu-Ping Wang and Jin-Feng Xue Conference: IEEE 10th International.
SWM: Simplified Wu-Manber for GPU- based Deep Packet Inspection Author: Lucas Vespa, Ning Weng Publisher: The 2012 International Conference on Security.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
1 Space-Efficient TCAM-based Classification Using Gray Coding Authors: Anat Bremler-Barr and Danny Hendler Publisher: IEEE INFOCOM 2007 Present: Chen-Yu.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Computer Sciences Department1. 2 Data Compression and techniques.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Data Compression: Huffman Coding in Weiss (p.389)
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Applied Algorithmics - week7
Prepared for SEO Analysis Prepared for 17 June 2014.
13 Text Processing Hongfei Yan June 1, 2016.
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
DEFLATE Algorithm Kent.
Podcast Ch23d Title: Huffman Compression
Author: Yaron Weinsberg ,Shimrit Tzur-David ,Danny Dolev and Tal Anker
Transport Layer Identification of P2P Traffic
Table 3. Decompression process using LZW
Presentation transcript:

Shift-based Pattern Matching for Compressed Web Traffic Author: Anat Bremler-Barr, Yaron Koral,Victor Zigdon Publisher: IEEE HPSR,2011 Presenter: Kai-Yang, Liu Date: 2011/11/2

INTRODUCTION Two-thirds of the top 1000 most popular sites like Yahoo!, Google, MSN, YouTube, Facebook and others use HTTP compression to enhance the speed of their content downloads. 2

The GZIP Algorithm LZ77 compression LZ77 compression technique is that we can compress a series of bytes (characters) if we spot that this series of bytes has already appeared in the past. The algorithm replaces each repeated string by (distance,length) pair. For example: the text: ‘abcdefgabcde’ can be compressed to: ‘abcdefg(7,5)’; LZ77 refers to the above pair as “pointer” and to uncompressed bytes as “literals”. Huffman Coding- reduce the symbol coding size by encoding frequent symbols with fewer bits. 3

INTRODUCTION Recent work (ACCH algorithm) presents technique for pattern matching on compressed traffic that decompresses the traffic and then uses data from the decompression phase to accelerate the process. We present Shift-based Pattern matching for Compressed traffic algorithm, SPC, that accelerates MWM on compressed traffic. 4

THE MODIFIED WU-MANBER ALGORITHM MWM trims all patterns to their m bytes prefix, where m is the size of the shortest pattern. MWM chooses predefined group of bytes, namely B, to determine the shift value. MWM starts by precomputing two tables: a skip shift table called ShiftTable and a patterns hash table, called Ptrns. The scan is performed using a virtual scan window of size m. The shift value is determined by indexing the ShiftTable with the B bytes suffix of the scan window. 5

6

THE MODIFIED WU-MANBER ALGORITHM 7

SHIFT-BASED PATTERN MATCHING FOR COMPRESSED TRAFFIC (SPC) The bytes referred by the pointers were already scanned; hence, if we have a prior knowledge that an area does not contain patterns, we can skip scanning most of it. Observe that even if no patterns were found when the referred area was scanned, patterns may occur in the boundaries of the pointer. The general method of the algorithm is to use a combined technique that scans uncompressed portions of the data using MWM and skips scanning most of the data represented by the LZ77 pointers. 8

9

10

11

SHIFT-BASED PATTERN MATCHING FOR COMPRESSED TRAFFIC (SPC) 12

EXPERIMENTAL RESULTS Data Set We collected HTTP pages encoded with GZIP taken from a list constructed from the Alexa website that maintains web traffic metrics and top-site lists. Pattern Set Our pattern-sets were gathered from two different sources: ModSecurity, an open source web application firewall and Snort, an open source network intrusion prevention system. 13

SPC Characteristics Analysis In order to understand the impact of B and m we examined the character of skip ratio, Sr, the percentage of characters the algorithm skips. The Snort pattern set contains many short patterns, specifically 410 distinct patterns of length ≤ 3, 539 of length 4 and 381 of length 5. To circumvent this problem we inspected the containing rules. We can eliminate most of the short patterns by using longer pattern within the same rule or relying on specific flow parameters. 14

EXPERIMENTAL RESULTS(Skip Ratio) 15

EXPERIMENTAL RESULTS(Throughput) 16

EXPERIMENTAL RESULTS(Storage) 17