Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.

Slides:

Advertisements

Similar presentations

Deep Packet Inspection: Where are We? CCW08 Michela Becchi.

Advertisements

Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.

Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.

Space-Time Tradeoffs in Software-based Deep Packet Inspection Author: Anat Bremler-Barr, Yotam Harchol, and David Hay Published in Proc. IEEE HPSR 2011.

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.

1 IP-Lookup and Packet Classification Advanced Algorithms & Data Structures Lecture Theme 08 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Spring 2006CS 685 Network Algorithmics1 Principles in Practice CS 685 Network Algorithmics Spring 2006.

Network Algorithms, Lecture 4: Longest Matching Prefix Lookups George Varghese.

1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.

M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.

IP Routing Lookups Scalable High Speed IP Routing Lookups.

Multi-Core Packet Scattering to Disentangle Performance Bottlenecks Yehuda Afek Tel-Aviv University.

Deep Packet Inspection as a Service Yaron Koral† Joint work with Anat Bremler-Barr‡, Yotam Harchol† and David Hay† †The Hebrew University, Israel ‡IDC.

Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.

MCA 2: Multi Core Architecture for Mitigating Complexity Attacks Yaron Koral (TAU) Joint work with: Yehuda Afek (TAU), Anat Bremler-Barr (IDC), David Hay.

Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.

Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:

Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.

CS 268: Lectures 13/14 (Route Lookup and Packet Classification) Ion Stoica April 1/3, 2002.

1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.

1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman

ECE 526 – Network Processing Systems Design Network Security: string matching algorithm Chapter 17: George Varghese.

1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:

A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.

Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.

Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.

Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,

Chapter 9 Classification And Forwarding. Outline.

Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.

Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs.

CSE7701: Research Seminar on Networking

1 Routing with a clue Anat Bremler-Barr Joint work with Yehuda Afek & Sariel Har-Peled Tel-Aviv University.

Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.

Deep Packet Inspection as a Service Anat Bremler-Barr IDC Herzliya Joint work with Yotam Harchol, David Hay and Yaron Koral The Hebrew University Appeared.

A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan, Timothy Sherwood Appeared in ISCA 2005 Presented by: Sailesh.

Author ： Ozgun Erdogan and Pei Cao Publisher ： IEEE Globecom 2005 (IJSN 2007) Presenter ： Zong-Lin Sie Date ： 2010/12/08 1.

Sujayyendhiren RS, Kaiqi Xiong and Minseok Kwon Rochester Institute of Technology Motivation Experimental Setup in ProtoGENI Conclusions and Future Work.

Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,

An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.

ORange: Multi Field OpenFlow based Range Classifier Liron Schiff Tel Aviv University Yehuda Afek Tel Aviv University Anat Bremler-Barr Inter Disciplinary.

Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:

Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)

StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:

Memory Compression Algorithms for Networking Features Sailesh Kumar.

Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.

Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University

A Resource Efficient Content Inspection System for Next Generation Smart NICs Karthikeyan Sabhanatarajan, Ann Gordon-Ross* The Energy Efficient Internet.

A Pattern-Matching Scheme With High Throughput Performance and Low Memory Requirement Author: Tsern-Huei Lee, Nai-Lun Huang Publisher: TRANSACTIONS ON.

TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.

A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.

Author ： Randy Smith & Cristian Estan & Somesh Jha Publisher ： IEEE Symposium on Security & privacy,2008 Presenter ： Wen-Tse Liang Date ： 2010/10/27.

A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]

Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.

IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.

Deep Packet Inspection as a Service Author : Anat Bremler-Barr, Yotam Harchol, David Hay and Yaron Koral Conference: ACM 10th International Conference.

IP Routers – internal view

CSE7701: Research Seminar on Networking

Load Balancing Memcached Traffic Using SDN

HEXA: Compact Data Structures for Faster Packet Processing

James Logan CS526 Dr. Chow April 29, 2009

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Implementing an OpenFlow Switch on the NetFPGA platform

KUO-KUN TSENG, YUAN-CHENG LAI, YING-DAR LIN, and TSERN-HUEI LEE

Presentation transcript:

Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel. IEEE HPSR 2011 Parts of this work were supported by European Research Council (ERC) Starting Grant no ⋆ Supported by the Check Point Institute for Information Security

2 Outline Motivation Background New Compression Techniques Experimental Results Conclusions

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Network Intrusion Detection Systems Classify packets according to: – Header fields: Source IP & port, destination IP & port, protocol, etc. – Packet payload (data) 3 Internet IP packet IP packet Deep Packet Inspection Motivation

BackgroundNew Compression TechniquesExperimental ResultsConclusions Deep Packet Inspection (D)RAM Cache Memory High Capacity Slow Memory Locality-based Low Capacity Fast Memory The environment: Motivation 4

BackgroundNew Compression TechniquesExperimental ResultsConclusions Our Contributions Literature assumption: try to fit data structure in cache  Efforts to compress the data structures Our paper: Is it beneficial? In reality, even in non-compressed implementation, most memory accesses are done to the cache BUT One can attack the non-compressed implementation by reducing its locality, getting it out of cache - and making it much slower! How to mitigate this attack? Compress even further - our new techniques: 60% less memory 5 Motivation

BackgroundNew Compression TechniquesExperimental ResultsConclusions Complexity DoS Attack Find a gap between average case and worst case Engineer input that exploits this gap Launch a Denial of Service attack on the system 6 Internet Real-Life Traffic Throughput Motivation

7 Outline Motivation Background New Compression Techniques Experimental Results Conclusions

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Aho-Corasick Algorithm Build a Deterministic Finite Automaton Traverse the DFA, byte by byte Accepting state  pattern found Example: {E, BE, BD, BCD, CDBCAB, BCAA} 8 [Aho, Corasick; 1975] s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A B E CB E C B E C D E B C D E C E B C E B C E B C E C B B Background B BCDBCAB Input: s0s0 s 12 s2s2 s5s5 s6s6 s9s9 s 10 s 11

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Aho-Corasick Algorithm Naïve implementation: Represent the transition function in a table of |Σ|×|S| entries – Σ: alphabet – S: set of states Lookup time: one memory access per input symbol Space: In reality: 70MB to gigabytes… 9 [Aho, Corasick; 1975] Background ABCDE S0S S1S S2S S3S S4S S5S S6S S7S S8S :

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Potential Complexity DoS Attack 1.Exhaustive Traversal Adversarial Traffic – Traverses as much states of the automaton – Bad locality - Bad for naïve implementation (will not utilize cache) 10 s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A Background

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Alternative Implementation Failure transition goes to the state that matches the longest suffix of the input so far Lookup time: at most two memory accesses per input symbol (via amortized analysis) Space: at most, # of symbols in pattern set, depends on implementation 11 [Aho, Corasick; 1975] B E CB E C B E C D E B C D E C E B C E B C E B C E B C B B s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A Forward Transition Failure Transition Background s 10 s5s5 s7s7 s0s0 s1s1

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Potential Complexity DoS Attack 1.Exhaustive Traversal Adversarial Traffic -Traverses as much states of the automaton -Bad locality - Bad for naïve implementation (will not utilize cache) 2.Failure-path Traversal Adversarial Traffic -Traverses as much failure transitions -Bad for failure-path based automaton (as much memory accesses per input symbol) 12 s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A Background

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A Prior Work: Compress the State Representation 13 symbolABCDE forward:136 Lookup Table 7 failure: False match: ABCDE Bitmap Encoded Bitmap: Length=|Σ| forward:136 7 failure: False match: symbolAD forward:136 Linear Encoded 7 failure: False match: 2 size: Background Experimental ResultsConclusions Can count bits using popcnt instruction

14 Outline Motivation Background New Compression Techniques Experimental Results Conclusions

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Path Compression One-way branches can be represented using a single state – Similarly to PATRICIA tries Problem: Incoming failure transitions Solution: Compress only states with no incoming failure transitions 15 New Compression Techniques s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A s0s0 s7s7 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9's9' C C E D B E D D BCAB A A (B) (BC) (BCA) (BCAB) Tuck et al. Our Path Compression 100% 75% 2004

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Pointer Compression 16 In Snort IDS pattern-set, 79% of the fail pointers point to states in depths 0, 1, 2 Add two bits to encode depth of pointer: 00: Depth 0 01: Depth 1 10: Depth 2 11: Depth 3 and deeper DepthPointers 0 (s 0 )13% 131% 235% ≥ 321% New Compression Techniques Depth ≤ 2 16 bits pointer2 bits 11 Depth > 2 16 bits pointer2 bits16 bits pointer

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Pointer Compression 17 DepthPointers 0 (s 0 )13% 131% 235% ≥ 321% New Compression Techniques Tuck et al. Our Path Compression 100% 75% Pointer Comp. 41% 2004 Determine next state from pointer depth: -0: Go to root -1: Use a lookup table using last symbol -2: Use a hash table using last two symbols -≥ 3: Use the stored pointer SymbolState A - B s2s2 C s7s7 D - E s1s1 Depth 1 Lookup Table:Depth 2 Hash Table: hash table Last 2 symbols Next state

18 Outline Motivation Background New Compression Techniques Experimental Results Conclusions

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Setup 19 System 1System 2 TypeMacBook ProiMac CPUCore 2 Duo 2.53GHz dual coreCore i7 2.93GHz quad core L1 Cache:16KB (data, per core) L2 Cache:3MB (shared)256KB (per core) L3 Cache:-8MB (shared) SnortClamAV* Patterns31,09416,710 States in Naïve Implementation 77,182745,303 Test Systems Pattern-Sets Experimental Results Real-life traffic logs taken from MIT DARPA * We used only half of ClamAV signatures for our tests

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Space Requirement 20 Experimental Results Memory Footprint [MB]

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results Memory Accesses per Input Symbol 21

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results L1 Data Cache Miss Rate 22 Intel Core 2 Duo (2 cores) 16KB L1 Data Cache 3MB L2 Cache L1 Data Cache Miss Rate

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results L2 Cache Miss Rate 23 Intel Core 2 Duo (2 cores) 16KB L1 Data Cache 3MB L2 Cache Real-Life Traffic: 0.7% L2 Cache Miss Rate Real-Life Traffic: 0.7% L2 Cache Miss Rate Adversarial Traffic: 23% L2 Cache Miss Rate Adversarial Traffic: 23% L2 Cache Miss Rate Maximal L2 Miss Rate: 0.06% Maximal L2 Miss Rate: 0.06% L2 Cache Miss Rate

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results Space vs. Time: % Our Implementation Naïve Implementation Experimental Results

25 Outline Motivation Background New Compression Techniques Experimental Results Conclusions

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions 26 Naïve Aho-Corasick implementation It is crucial to model the cache in software-based Deep Packet Inspection: Naïve Aho-Corasick implementation has a huge memory footprint, but works well on real-life traffic due to locality of reference Naïve implementation can be easily attacked, making it 7 times slower, even though it has constant number of memory accesses We also show new compression techniques: 60% less memory than best prior-art compression Stable throughput, better performance under attacks Conclusions

Questions? Thank you!

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Our Contributions 28 Motivation Several new compression techniques 60% less memory Several new compression techniques 60% less memory We suggest: Aho-Corasick algorithm does not run in a constant time (throughput is dependent on input!) A complexity attack on Aho-Corasick that exploits the cache-RAM architecture We analyze Aho-Corasick algorithm, the standard for exact string-matching in Deep Packet Inspection: Single memory access per input symbol Throughput is independent of the input

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Our Contributions 29 Motivation Literature: compress data structures to fit in cache Is it always beneficial? In reality, even in non-compressed implementation, most memory accesses are to the cache One can attack the non-compressed implementation by reducing locality to get it out of cache Several new compression techniques - 60% less memory Can data structures fit in cache?

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Path Compression Tuck et al., 2006 – Hardware solution: – Compress one-way branches of some fixed length (e.g. 4) into a single transition – Add a skip counter to each failure pointer In software, we can compress one-way branches of any length – Problem: Unbounded skip counter width – Solution: compress only states which have no incoming failure transition 30 s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A s0s0 s7s7 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D BCAB s9's9' A A (B) (BC) (BCA) s8's8' 85% less states About 25% space reduction on real-life pattern-sets s0s0 s7's7' s1s1 s2s2 s3s3 s5s5 s4s4 C C E DBCA B ED S 13 's6s6 D s8's8' A Match on A (DB) (DBC) Skip 1 (ε) (B) Match on B Skip 0 s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A Skip 0 New Compression Techniques

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Leaves Compression By definition, leaves have no forward transitions Their single purpose is to indicate a match – We can push this indication up by adding a bit to each pointer – Then, leaves can be eliminated from the automaton - by copying their failure transition up 31 s0s0 s7s7 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8's8' BCAB s9's9' A A (B) (BC) (BCA) s0s0 s7s7 s2s2 s5s5 C C E* D B D* s 13 D* BCAB* A A* (B) (BC) (BCA) E* s8's8' 3% more space reduction Reduces number of transitions taken s0s0 s7s7 s1s1 s2s2 s3s3 s5s5 s4s4 C C E* D B D* s 14 s 13 s6s6 D* s8's8' BCAB* s9's9' A A* (B) (BC) (BCA) New Compression Techniques

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions For pointers to states in depth > 2 (only 21% of the pointers in Snort) Original pointer width – log 2 |S| Using a global lookup table of size |∑| entries to link last symbol to next stateCannot use a global table of |∑|×|∑| entries – it is too big! Instead, use a global hash table to map last two chars to next state (of depth 2) (In Snort – 1524 hash entries replace pointers) Pointer Compression 32 In Snort IDS pattern-set, 79% of the fail pointers point to states in depths 0, 1, 2 If we compact these pointers representation we can significantly reduce memory footprint Variable-size pointers: DepthPointers 0 (s 0 )13% 131% 235% Go to s 0 Use last symbol to find depth 1 state Use last two symbols to find depth 2 state 45% more space reduction over path compression! New Compression Techniques

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Pointer Compression 33 In Snort IDS pattern-set, 79% of the fail pointers point to states in depths 0, 1, 2 DepthPointers 0 (s 0 )13% 131% 235% New Compression Techniques s0s0 s7s7 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9's9' C C E D B E D D BCAB A A (B) (BC) (BCA) (BCAB) s0s0 s7s7 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9's9' C C E D B E D D BCAB A A (B) (BC) (BCA) (BCAB) 4 fail-pointers: B:s0s0 (Depth 0) 0000 C:s2s2 (Depth 1) 0010 A:s5s5 (Depth 2) 0101 B:s 13 (Depth 3) SymbolState A - B s2s2 C s7s7 D - E s1s1 Depth 1 Lookup Table:Depth 2 Hash Table: hash table Last 2 symbols Next state Tuck et al. Our Path Compression 100% 75% Pointer Comp. 41% 2004 In Snort: 9901 pointers to s 0 are compresses to two bits Depth 1 lookup table: 256 pointers compress 23,745 pointers Depth 2 hash table: 1,524 hash entries compress 26,748 pointers Only 21% of the pointers are widened by two bits

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Traffic Types Real-life Traffic Logs – Taken from MIT DARPA Exhaustive Traversal Adversarial Traffic – Traverses as much states of the automaton – Bad locality - Bad for naïve automaton Failure-path Traversal Adversarial Traffic – Traverses as much failure transitions – Bad for failure-path based automaton 34 s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A Experimental Results

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results The impact of O(1) lookup complexity: 35 Linear Encoding (Compressed) Bitmap Encoding Lookup Table Naïve

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results Throughput - Snort: 36 Intel Core 2 Duo 16KB L1 Data Cache 3MB L2 Cache Two threads Experimental Results -88% -30% Good Locality Adversarial Traffic

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results Throughput - ClamAV: 37 Intel Core 2 Duo 16KB L1 Data Cache 3MB L2 Cache Two threads Experimental Results -73% -50% Not scalable: 40% slower than Snort

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results Throughput: 38 Snort Linear Encoding (Compressed) Bitmap Encoding Lookup Table Linear Encoding (Non-Compressed) Naïve Throughput [Mbps] ClamAV Linear Encoding (Compressed) Bitmap Encoding Lookup Table Linear Encoding (Non-Compressed) Naïve Throughput [Mbps] Good Locality Adversarial Traffic -88% -30% -73% -50% Intel Core 2 Duo 16KB L1 Data Cache 3MB L2 Cache Experimental Results

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results Cache 39 Linear Encoding (Compressed) Bitmap Encoding Lookup Table Naïve Intel Core 2 Duo (2 cores) 16KB L1 Data Cache 3MB L2 Cache Real-Life Traffic: 0.7% L2 Cache Miss Rate Real-Life Traffic: 0.7% L2 Cache Miss Rate Adversarial Traffic: 23% L2 Cache Miss Rate Adversarial Traffic: 23% L2 Cache Miss Rate Maximal L2 Miss Rate: 0.06% Maximal L2 Miss Rate: 0.06%

Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results Is it all because of the cache? 40 Intel Core i7 (4 cores) 16KB L1 Data Cache 256KB L2 Cache per core 8MB L3 Cache - shared Naïve implementation achieves much higher throughput Still, adversarial traffic drops its throughput by 56% CPU cores only work on pattern matching. What if they had some more tasks? -56%