Efficient Processing of Multi-Connection Compressed Web Traffic Yaron Koral 1 with: Yehuda Afek 1, Anat Bremler-Barr 1 * 1 Blavatnik School of Computer.

Slides:



Advertisements
Similar presentations
Data compression. INTRODUCTION If you download many programs and files off the Internet, we have probably encountered.
Advertisements

Shift-based Pattern Matching for Compressed Web Traffic Presented by Victor Zigdon 1* Joint work with: Dr. Anat Bremler-Barr 1* and Yaron Koral 2 The SPC.
Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Space-Time Tradeoffs in Software-based Deep Packet Inspection Author: Anat Bremler-Barr, Yotam Harchol, and David Hay Published in Proc. IEEE HPSR 2011.
Data Compression CS 147 Minh Nguyen.
Delta Encoding in the compressed domain A semi compressed domain scheme with a compressed output.
Multi-Core Packet Scattering to Disentangle Performance Bottlenecks Yehuda Afek Tel-Aviv University.
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Author: Anat Bremler-Barr, Yaron Koral, Shimrit Tzur David, David Hay Publisher:
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.
Wan Accelerators: Optimizing Network Traffic with Compression Introduction & Motivation Results for Trained Files Another Compression Method Approach &
MCA 2: Multi Core Architecture for Mitigating Complexity Attacks Yaron Koral (TAU) Joint work with: Yehuda Afek (TAU), Anat Bremler-Barr (IDC), David Hay.
Lempel-Ziv Compression Techniques
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
Chapter 7 Special Section Focus on Data Compression.
Gzip Compression and Decompression 1. Gzip file format 2. Gzip Compress Algorithm. LZ77 algorithm. LZ77 algorithm.Dynamic Huffman coding algorithm.Dynamic.
Memory Allocation CS Introduction to Operating Systems.
Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
1 Routing with a clue Anat Bremler-Barr Joint work with Yehuda Afek & Sariel Har-Peled Tel-Aviv University.
Source Coding-Compression
A Low-Cost Memory Remapping Scheme for Address Bus Protection Lan Gao *, Jun Yang §, Marek Chrobak *, Youtao Zhang §, San Nguyen *, Hsien-Hsin S. Lee ¶
Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Shift-based Pattern Matching for Compressed Web Traffic Author: Anat Bremler-Barr, Yaron Koral,Victor Zigdon Publisher: IEEE HPSR,2011 Presenter: Kai-Yang,
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
ORange: Multi Field OpenFlow based Range Classifier Liron Schiff Tel Aviv University Yehuda Afek Tel Aviv University Anat Bremler-Barr Inter Disciplinary.
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Positional Data Organization and Compression in Web Inverted Indexes Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication Engineering,
Bits and Huffman Encoding Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing.
+ Selection Sort Method Joon Hee Lee August 12, 2012.
Performance of Compressed Inverted Indexes. Reasons for Compression  Compression reduces the size of the index  Compression can increase the performance.
Faster Approximate String Matching over Compressed Text By Gonzalo Navarro *, Takuya Kida †, Masayuki Takeda †, Ayumi Shinohara †, and Setsuo Arikawa.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
An Overview of Different Compression Algorithms Their application on compressing inverted files.
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
1 Space-Efficient TCAM-based Classification Using Gray Coding Authors: Anat Bremler-Barr and Danny Hendler Publisher: IEEE INFOCOM 2007 Present: Chen-Yu.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
LZW (Lempel-Ziv-welch) compression method The LZW method to compress data is an evolution of the method originally created by Abraham Lempel and Jacob.
Computer Sciences Department1. 2 Data Compression and techniques.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
HUFFMAN CODES.
Compression & Huffman Codes
Data Compression.
Applied Algorithmics - week7
Data Compression.
Chapter 7 Special Section
Data Compression CS 147 Minh Nguyen.
Lecture 3: Main Memory.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Chapter 7 Special Section
DEFLATE Algorithm Kent.
Thesis Presented By Mohammad Abul Kalam Azad C Shabbir Ahmad C Francis Palma Tony C Supervised by S. M. Kamruzzaman Assistant.
Chapter 8 & 9 Main Memory and Virtual Memory
15 Data Compression Foundations of Computer Science ã Cengage Learning.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Efficient Processing of Multi-Connection Compressed Web Traffic Yaron Koral 1 with: Yehuda Afek 1, Anat Bremler-Barr 1 * 1 Blavatnik School of Computer Sciences Tel-Aviv University, Israel 2 Computer Science Dept. Interdisciplinary Center, Herzliya, Israel ⋆ Supported by European Research Council (ERC) Starting Grant no

Compressed Web Traffic Compressed web traffic increases in popularity

DPI on Compressed Web Traffic Thousands concurrent sessions Uncompressed Traffic Compressed (Mem. Req. 32KB per Session) SpaceTime 80%40% Contribution: Improve

Background: Compressed HTTP uses GZIP Two stage algorithm: LZ77 –Stage 1: LZ77 Goal: reduce string presentation size Technique: repeated strings compression Huffman Coding –Stage 2: Huffman Coding Goal: reduce the symbols code size Technique: frequent symbols  fewer bits

Background: LZ77 Compression Compress repeated strings 32KB –Looking only 32KB back (window) Encode repeated strings by {distance,length} ABCDEFABCD  ABCDEF {6,4} Pointers might be recursive

Background: LZ77 Compression Decompression is cheap –Copy consecutive bytes to buffer –Enjoys cache boost due to spatial locality Compression is expensive –Maintain locations of previous strings triplets –Locate (a larger) prior occurrence of current text... We use this observation later on…

Current state Uncompressed active session buffer New Packet

General Idea: Keep “compressed” buffer Keep buffers in a “compressed” form Uncompress “active session” only Compressed active session buffer New Packet

1 st attempt – use original data Problem: pointers may point out-of buffer boundary Distance between a pointer to its literals may be the entire session! This usually exceeds 32KB… … (200,3)… …(300,5) …abc…hello… … (30,000,4) Packets 32KB Buffer INVALID!

2 nd attempt – re-compress buffer Upon new packet arrival –Unzip old buffer –Unzip packet –Process data –Calculate new buffer boundary –gzip buffer Pro’s: –Space efficient: 83% less memory Con’s: –Time expansive: 20 times slower!!! TIME EXPANSIVE!!

Our solution: Swap Out of boundary Pointers (SOP) Buffer PACKING Technique – Light Compression SOP uses the original GZIP compressed form (as in 1 st attempt) Swap invalid pointer with its referred literals SOP Compared to 2 nd attempt: 2.6% more memory –Space 2.6% more memory 81.4% faster –Time: 81.4% faster

ello …... … abc … … hello …abc…hello… … … (200,3)… … hello …abc…hello… … … (200,3)… …(300,5) …abc…hello… … (30,000,4) (30,000,4)…... (a) Compressed Buffer (b) Uncompressed Buffer (c) SOP Buffer How it works

SOP – Packing Algorithm Upon new packet arrival 1.Unzip old buffer 2.Unzip packet 3.Process data 4.Calculate new buffer boundary 5.Swap out-of boundary pointers with literals New Packet

SOP – Time Considerations Each packet is decompressed several times –Uncompressed size ~ 4.6KB (  32/4.6=6.9) Decompression is cheap! (still…) Gzip Decompression time

SOP-Indexed Keep indices to chunks within buffer Decompress only required chunks SOP-Indexed as compared SOP 5.8% –Space loss: 5.8% 10.3% –Time gained: 10.3% New Packet Chunk Indices

DPI of Compressed Traffic ACCH: Aho-Corasick based algorithm for Compressed HTTP (INFOCOM 2009) General Idea: skip scanning repeated-strings Memory Req.: 2-bit status vector per byte 32KB40KB

Solution: Pack Vector with Data ACCH Algorithm: ACCH Algorithm: skip scanning pointer area DPI Scan Aho-Corasick ACCH 80% Skips around 80% of data scans 25% Status-vector increases space requirement by 25% 80% Skips around 80% of data scans 25% Status-vector increases space requirement by 25%

Experimental Results: Packing Methods

Experimental Results: DPI +Packing Unzip entire session. Avg. Size = 170KB SOP 1.39, 5.17KB ACCH 0.36, 37.4KB SOP+ACCH 0.64, 6.19KB Naïve 1.1, 29KB

Conclusion HTTP compression - gains popularity High memory requirements  ignored by FWs SOP reduces space requirement by over 80%. SOP with ACCH 80% less memory and 40% faster.