Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Similar presentations


Presentation on theme: "Advanced Algorithms for Fast and Scalable Deep Packet Inspection"— Presentation transcript:

1 Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Sailesh Kumar Jonathan Turner John Williams

2 Why Regular Expressions Acceleration?
RegEx are now widely used Network intrusion detection systems, NIDS Layer 7 switches, load balancing Firewalls, filtering, authentication and monitoring Content-based traffic management and routing RegEx matching is expensive Space: Large amount of memory Bandwidth: Requires 1+ state traversal per byte RegEx is performance bottleneck In enterprise switches from Cisco, etc Many security appliances Use DFA, 1+ GB memory, still sub-gigabit throughput Need to accelerate RegEx! To specify signatures in NIDS. Security has become a critical issue and pattern matching based NIDS today use regex. To represent rules in application layer switches and to perform application level load balancing. To specify rules in firewalls, for traffic filtering, metering and monitoring. There are also applications where traffic is routed based upon its content. For instance all video traffic originating from Google can be identified with a specific pattern and then routed according the service level agreement. Regex is expensive because DFAs are fast but requires a lot of memory. NFAs are slow and can’t meet the performance requirements. We need one memory access per byte, which makes it one of the most expensive packet processing tasks. Regex is performance bottleneck in many systems which performs deep packet inspection. Like Cisco uses more than a GB of memory and still has a sub-gigabit throughput. Given the importance of deep packet inspection, there is a clear need to accelerate regex, and we need dedicated regex acceleration engines.

3 Can we do better? Well studied in compiler literature
What’s different in Networking? Can we do better? Construction time versus execution time (grep) Traditionally, (construction + execution) time is the metric In networking context, execution time is critical Also, there may be thousands of patterns DFAs are fast But can have exponentially large number of states Algorithms exist to minimize number of states Still 1) low performance and 2) gigabytes of memory Regex has been widely studied and used in past in particular in the compiler community. Therefore we raise these questions. Can we still do better. Is there anything wrong with the conventional approach. One of the concerns is that traditionally regex performance benchmark was the construction time of the automaton plus the parsing or execution time. In networking context, construction time is not that important which changes the problem. We can afford to have few hours or even days to construct an efficient data structure which will enable high speed parsing. Clearly a good starting point will be DFA, because its known to be the fastest way to perform regex. The problem with DFA is that when rules are complex, DFA becomes large. Moreover, when rules have many closures and unions, DFA explodes in size. Also, it has been shown that no of states in a DFA can’t be reduced further. So, what can we do? How can we reduce the memory requirements? If we will try to implement a dedicated custom solution like and ASIC, then reducing the memory requirements becomes critical, as no of bits are less and bandwidth is high.

4 Delayed Input DFA (D2FA), SIGCOMM’06
Many transitions 256 transitions per state 50+ distinct transitions per state (real world datasets) Need 50+ words per state Reduce number of transitions in a DFA 2 1 3 b 4 5 a d c Three rules a+, b+c, c*d+ Look at state pairs: there are many common transitions. How to remove them? In this paper, we propose methods to represent DFAs much more compactly while preserving the parsing performance. Since, we can’t reduce no of states, we reduce no of transitions. 256 transitions per state. In many networking datasets Cisco and Bro and Snort rules, there are 50+ distinct transitions per states, which makes the table compression techniques not that effective. We need more than 50 words per state, where a word is a state identifier. Here is an example of an automata 4 transitions per state

5 Delayed Input DFA (D2FA), SIGCOMM’06
Many transitions 256 transitions per state 50+ distinct transitions per state (real world datasets) Need 50+ words per state Reduce number of transitions in a DFA Alternative Representation c c a a Three rules a+, b+c, c*d+ 2 c d c 2 a d a a a a b a a a b b b c c 1 3 c 5 1 3 c 5 b b b b c d d b d c d d b d 4 4 d 4 transitions per state d Fewer transitions, less memory

6 D2FA Operation Heavy edges are called default transitions
Take default transitions, whenever, a labeled transition is missing 1 3 a b 2 5 4 c d DFA D2FA

7 D2FA versus DFA D2FAs are compact but requires multiple memory accesses Up to 20x increased memory accesses Not desirable in off-chip architecture Can D2FAs match the performance of DFAs YES!!!! Content Addressed D2FAs (CD2FA) CD2FAs require only one memory access per byte Matches the performance of a DFA in cacheless system Systems with data cache, CD2FA are 2-3x faster CD2FAs are 10x compact than DFAs

8 Introduction to CD2FA How to avoid multiple memory accesses of D2FAs?
Avoid lookup to decide if default path needs to be taken Avoid default path traversal Solution: Assign labels to each state, labels contain: Characters for which it has labeled transitions Information about all of its default states Characters for which its default states have labeled transitions R c d a b all ab,cd,R cd,R V U find node R at location R Content Labels find node U at hash(c,d,R) find node V at hash(a,b,hash(c,d,R))

9 Introduction to CD2FA (R, a) (R, b) … (Z, a) (X, p) (Z, b)
all all Z U c cd,R lm,Z l Y d m pq,lm,Z V a P ab,cd,R X b q hash(p,q,hash(l,m,Z)) (X, p) (X, q) (V, a) (V, b) hash(c,d,R) lm,Z pq,lm,Z Input char = d a hash(a,b,hash(c,d,R)) Current state: V (label = ab,cd,R) → X (label = pq,lm,Z)

10 Construction of CD2FA We seek to keep the content labels small
Twin Objectives: Ensure that states have few labeled transitions Ensure that default paths are as small as possible D2FA construction heuristic based upon maximum weight spanning tree creates long default paths Limit default paths => less space efficient D2FAs Proposed new heuristic called CRO to construct D2FAs Runs in 3 phases: Construction, Reduction and Optimization Default path bound = 2 edges => CRO algorithm constructs upto 10x space efficient D2FAs CD2FAs are constructed from these D2FAs

11 Memory Mapping in CD2FA (R, a) (R, b) … (Z, a) (Z, b) R Z R all
U c Y cd,R lm,R l d m pq,lm,R V a X P ab,cd,R b q WE HAVE ASSUMED THAT HASHING IS COLLISION FREE hash(p,q,hash(l,m,Z)) hash(a,b,hash(c,d,R)) hash(c,d,R)) COLLISION

12 Collision-free Memory Mapping
hash(abc, …) Four states a b c , …. b c 4 memory locations p hash(pqr, …) p q r , …. q r l hash(def, …) hash(mln, …) l m n , …. WE NEED SYSTEMATIC APPRAOCH m n hash(lmn, …) d hash(edf, …) d e f , …. e f

13 Bipartite Graph Matching
Left nodes are state content labels Right nodes are memory locations Map state labels to unique memory locations An edge for every choice of content label Perfect matching problem With n left and right nodes Need O(logn) random edges n = 1M implies, we need ~20 edges per node If we provide slight memory over-provisioning We can uniquely map state labels with much fewer edges In our experiments, we found perfect matching without memory over-provisioning

14 Memory Reduction Results

15 Throughput Results 3x Faster 4KB cache

16 Conclusion We have proposed CD2FAs
Matches/surpasses a DFA in throughput 10x less memory than table compressed DFA Novel randomized memory mapping algorithm based upon maximum matching in bipartite graph Zero space overhead Zero bandwidth overhead Thank you and Questions???


Download ppt "Advanced Algorithms for Fast and Scalable Deep Packet Inspection"

Similar presentations


Ads by Google