Download presentation
Presentation is loading. Please wait.
Published byBrittany Summers Modified over 9 years ago
1
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Author: Anat Bremler-Barr, Yaron Koral, Shimrit Tzur David, David Hay Publisher: IEEE INFOCOM, 2012 Presenter: Kai-Yang, Liu Date: 2011/12/21
2
INTRODUCTION Gzip or Deflate work well as the compression for each individual response, but in many cases there is a lot of common data shared by a group of pages. Therefore, compression methods of the next generation are inter-file, where there is one dictionary that can be referenced by several files. 2
3
The VCDIFF Compression Algorithm VCDIFF encoding process uses three types of instructions, called delta instructions: ADD(i, str) means to append to the output i bytes, which are specified in parameter str. RUN(i, b) means to append i times the byte b. COPY(p, x) means that the interval [p, p + x) should be copied from the dictionary. 3
4
Example Dictionary : DBEAACDBCABC The plain-text that should be considered is therefore ABDDBEAAAACDBCABAB CAACBCDBADBC 4
5
Aho-Corasick Algorithm patterns set: { E,BE,BD,BCD,BCAA,CDBCAB } 5
6
The Offline Phase The dictionary is scanned from the first symbol using the Aho-Corasick algorithm. State array : Match list : 6
7
The Offline Phase Dictionary : DBEAACDBCABC 7
8
Four Kinds of Matches Patterns that are fully contained within an ADD or COPY instruction. Patterns whose prefix is within a COPY instruction. Patterns whose suffix is within a COPY instruction. 8
9
The Online Phase Scanning the delta file by the AC algorithm. ADD instruction : simply scanning it by traversing the automaton. COPY (p, x) instruction : Step1: Scan the copied symbols from the dictionary one by one, until when scanning a symbol b p+i we reach a state in the automaton whose depth is less or equal to i. ※ Find all the patterns of fourth category 9
10
The Online Phase Step2: We check the Matched list to find any patterns in the dictionary that ends within interval [ p, p+x). ※ Find all the patterns of second category Step3: We obtain the state State[p+x-1]. From that state, we follow failure transitions in the automaton, until we reach a state s whose depth is less or equal to x. 10
11
11 Dictionary : DBEAACDBCABC Match List:
12
12 Dictionary : DBEAACDBCABC Match List:
13
13 Dictionary : DBEAACDBCABC Match List:
14
14 Dictionary : DBEAACDBCABC Match List:
15
15 Dictionary : DBEAACDBCABC Match List:
16
16 Dictionary : DBEAACDBCABC Match List:
17
17 Dictionary : DBEAACDBCABC Match List:
18
18 Dictionary : DBEAACDBCABC Match List:
19
19 Dictionary : DBEAACDBCABC Match List:
20
20 Dictionary : DBEAACDBCABC Match List:
21
Optimizations Efficient pattern lookups in the Matched list: Save the Matched list as a balanced tree. Add an array of pointers of the dictionary size. Given a COPY (p, x) instruction, one can cache the corresponding internal matches within [p, p+x-1] in a hash-table whose key is “(p, x)”. 21
22
REGULAR EXPRESSIONS INSPECTION Anchors are extracted from the regular expression offline. Then, our algorithm is applied on the SDCH-compressed traffic with the anchors as the patterns set. For example the regular expression \d{6}ABCDE\s+123456\d*XYZ$ if we matched the anchor ABCDE at position x1 and the anchor XYZ at position x 2, the interval [x 1 -10,x 2 ] should be passed to the regular expressions engine for re- examination. 22
23
Experimental Results Data Sets : We first downloaded the dictionary from google.com and used the 1000 most popular Google search queries. Pattern Sets : The signatures data sets are drawn from a snapshot of Snort rules as of October 2010. we also constructed for each input file a synthetic patterns file. 23
24
Experimental Results 24
25
Experimental Results 25
26
Experimental Results 26
27
Experimental Results 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.