SHOCK: A Worst-Case Ensured Sub-linear Time Pattern Matching Algorithm for Inline Anti-Virus Scanning Author: Nen-Fu Huang, Wen-Yen Tsai Publisher: IEEE ICC,2010 Presenter: Kai-Yang, Liu Date: 2012/1/4
INTRODUCTION Challenges of an inline multi-pattern matching algorithm: Must be fast enough to scan millions of packets in the gigabit environment. It is desirable for small memory footprint of the algorithm to scale well for the ever-growing virus patterns. must perform well under a high volume of virus- infected traffic to avoid becoming the bottleneck. 2
ClamAV ClamAV provides an anti-virus engine and a regularly updated virus database. ClamAV virus signatures can be classified as one of the four categories: basic, regular expression (regex), MD5, and others. 3
Basic Patterns long minimum (>= 10 bytes ) average pattern length >= 25 bytes 4
The Proposed SHOCK Algorithm SHOCK(Shift/Hash with Overlap Check) algorithm consists of an offline preprocessing phase and an online pattern matching phase. The shift table is constructed using the same approach as in the WM algorithm with block size two and we calculate the hash value of the 2-byte prefix of each pattern. 5
Example m = 4 B = 2 totorose 6 ot0 to0 se0 oo1 os1 ro2
When a matched pattern is found, there may be another consecutive pattern in the text with prefix overlapping suffix of the currently matched one. 7
Example totorose 8 ot0 to0 se0 oo1 os1 ro2
For a pattern to be stored in the nextPat list of the current pattern, the number of its prefix characters which overlap suffix of the current pattern must be greater than or equal to P SP_TH. 9
Although only a quite small number of patterns has long nextPat list when P SP_TH = 8,they must be specially handled to avoid the worst-case scenario. 10
Bitmap-offset-indexing structure only for those patterns with nextPat list length greater than the parameter, P BMAP_TH 11
Example P SP_TH = 3 back_sh = P SP_TH -1 = 2 totorose 12
EXPERIMENTAL RESULTS 13
EXPERIMENTAL RESULTS 14
EXPERIMENTAL RESULTS 15