Memory Compression Algorithms for Networking Features Sailesh Kumar.

Slides:

Advertisements

Similar presentations

Chapter 5: Tree Constructions

Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.

1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.

Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)

M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.

A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:

Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:

Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.

1 HEXA: Compact Data Structures or Faster Packet Processing Author: Sailesh Kumar, Jonathan Turner, Patrick Crowley, Michael Mitzenmacher. Publisher: ICNP.

1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:

A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.

1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,

Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.

Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.

Efficient Protocols for Massive Data Transport Sailesh Kumar.

IP Address Lookup Masoud Sabaei Assistant professor

Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.

A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan, Timothy Sherwood Appeared in ISCA 2005 Presented by: Sailesh.

An Improved Algorithm to Accelerate Regular Expression Evaluation Author ： Michela Becchi 、 Patrick Crowley Publisher ： ANCS’07 Presenter ： Wen-Tse Liang.

An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.

CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley, Jonathan Turner Washington University in St. Louis.

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.

Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.

An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:

ITEC 2620A Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: 2620a.htm Office: TEL 3049.

Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author ： Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.

Doctoral Dissertation Proposal: Acceleration of Network Processing Algorithms Sailesh Kumar Advisors: Jon Turner, Patrick Crowley Committee: Roger Chamberlain,

 Author: Domenico Ficara, Stefano Giordano, Gregorio Procissi, Fabio Vitucci, Gianni Antichi, Andrea Di Pietro  Publisher: 2008 ACM SIGCOMM  Presenter:

1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,

Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.

A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung.

INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:

Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.

TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.

SybilGuard: Defending Against Sybil Attacks via Social Networks.

Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.

Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.

High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.

Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.

Author : Yang Xu, Lei Ma, Zhaobo Liu, H. Jonathan Chao Publisher : ANCS 2011 Presenter : Jo-Ning Yu Date : 2011/12/28.

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

Author ： Randy Smith & Cristian Estan & Somesh Jha Publisher ： IEEE Symposium on Security & privacy,2008 Presenter ： Wen-Tse Liang Date ： 2010/10/27.

Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.

TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.

A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:

Dynamic Pipelining: Making IP-Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar Presented by Sailesh Kumar.

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.

An Improved DFA for Fast Regular Expression Matching Author ： Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.

Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]

Author : Masanori Bando and H. Jonathan Chao Publisher : INFOCOM, 2010 Presenter : Jo-Ning Yu Date : 2011/02/16.

Advanced Algorithms for Fast and Scalable Deep Packet Inspection Author ： Sailesh Kumar 、 Jonathan Turner 、 John Williams Publisher ： ANCS’06 Presenter.

Tries 07/28/16 11:04 Text Compression

A DFA with Extended Character-Set for Fast Deep Packet Inspection

Mark Redekopp David Kempe

HEXA: Compact Data Structures for Faster Packet Processing

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

Presentation transcript:

Memory Compression Algorithms for Networking Features Sailesh Kumar

2 - Sailesh Kumar - 11/28/2015 Outline n Regular expressions based packet content inspection (main focus) »D 2 FA »CD 2 FA n Packet header processing »HEXA (History based Encoding, eXecution and Addressing)

3 - Sailesh Kumar - 11/28/2015 Why care about Regular Expressions? n Widely used »Network intrusion detection systems, NIDS »Layer 7 switches, load balancing »Firewalls, filtering, authentication and monitoring »Content-based traffic management and routing n Expensive »Space: Large amount of memory »Bandwidth: Requires 1+ state traversal per byte n Performance bottleneck »In enterprise switches, etc »Security appliances –Use DFA, 1+ GB memory, still sub-gigabit throughput »Need to accelerate RegEx!

4 - Sailesh Kumar - 11/28/2015 Can we do Better? n Well studied in compiler literature »What’s different in Networking? »Can we do better? n Performance metric (grep) »Traditionally, (construction + execution) time is the metric »In networking context, execution time is critical »Also, there may be thousands of patterns n DFAs are fast »But can have exponentially large number of states »Algorithms exist to minimize number of states »Still 1) low performance and 2) gigabytes of memory n How to achieve high performance? »Use ASIC/FPGA –On-chip memories provides ample bandwidth –Volume and need for speed justifies custom solution »Limited memory, need space efficient representation!

5 - Sailesh Kumar - 11/28/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+ b+c c*d b 4 5 a d a c a b d a c b c b b a c d d d c 4 transitions per state Look at state pairs: there are many common transitions. How to remove them?

6 - Sailesh Kumar - 11/28/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+ b+c c*d+ 1 3 a a a b b c b b c d d d c 4 transitions per state Alternative Representation d c a b d c a 1 3 a a a b b c b b c d d d c d c a b d c a Fewer transitions, less memory

7 - Sailesh Kumar - 11/28/2015 D 2 FA Operation 1 3 a a a b b c b b c d d d c d c a b d c a 1 3 a c c b d Input stream: a b d DFA and D 2 FA visits the same accepting state after consuming a character Heavy edges are called default transitions Take default transitions, whenever, a labeled transition is missing DFA D 2 FA Three rules a+ b+c c*d+

8 - Sailesh Kumar - 11/28/2015 D 2 FA Operation 1 3 a a a b b c b b c d d d c d c a b d c a 1 3 a c c b d Any set of default transitions will suffice if there are no cycles of default transitions Thus, we need to construct trees of default transitions So, how to construct space efficient D 2 FAs? while keeping default paths bounded d c b c b d a 5 5 a c c Above two set of default transitions trees are also correct However, we may traverse 2 default transitions to consume a character Thus, we need to do more work => lower performance

9 - Sailesh Kumar - 11/28/2015 D 2 FA Construction n Present systematic approach to construct D 2 FA n Begin with a state minimized DFA n Construct space reduction graph »Undirected graph, vertices are states of DFA »Edges exist between vertices with common transitions »Weight of an edge = # of common transitions b 4 5 a d a c a b d a c b c b b a c d d d c

10 - Sailesh Kumar - 11/28/2015 D 2 FA Construction n Convert certain edges into default transitions »A default transition reduces w transitions (w = wt. of edge) »If we pick high weight edges => more space reduction »Find maximum weight spanning forest »Tree edges becomes the default transitions n Problem: spanning tree may have very large diameter »Longer default paths => lower performance b 4 5 a d a c a b d a c b c b b a c d d d c # of transitions removed = =11 root

11 - Sailesh Kumar - 11/28/2015 D 2 FA Construction n We need to construct bounded diameter trees »NP-hard »Small diameter bound leads to low trees weight –Less space efficient D 2 FA »Time-space trade-off n We propose heuristic algorithm based upon Kruskal’s algorithm to create compact bounded diameter D 2 FAs »Details in SIGCOMM 2006 paper b 4 5 a d a c a b d a c b c b b a c d d d c

12 - Sailesh Kumar - 11/28/2015 Results n We ran experiments on »Cisco RegEx rules »Linux application protocol classifier rules »Bro rules »Snort rules (subset of rules) Size of DFA versus D 2 FA (No default path length bound applied)

13 - Sailesh Kumar - 11/28/2015 Space-Time Tradeoff Longer default path => more work but less space Space efficient region Default paths have length 4+ Requires 4+ memory accesses per character We propose memory architecture Which enables us to consume one character per clock cycle

14 - Sailesh Kumar - 11/28/2015 Outline n Regular expressions based packet content inspection (main focus) »D 2 FA »CD 2 FA n Packet header processing »HEXA (History based Encoding, eXecution and Addressing)

15 - Sailesh Kumar - 11/28/2015 D 2 FA versus DFA n D 2 FAs are compact but requires multiple memory accesses »Up to 20x increased memory accesses »Not desirable in off-chip architecture n Can D 2 FAs match the performance of DFAs »YES!!!! »Content Addressed D 2 FAs (CD 2 FA) n CD 2 FAs require only one memory access per byte »Matches the performance of a DFA in cacheless system »Systems with data cache, CD 2 FA are 2-3x faster n CD 2 FAs are 10x compact than DFAs

16 - Sailesh Kumar - 11/28/2015 Introduction to CD 2 FA, ANCS’06 n How to avoid multiple memory accesses of D 2 FAs? »Avoid lookup to decide if default path needs to be taken »Avoid default path traversal n Solution: Assign labels to each state, labels contain: »Characters for which it has labeled transitions »Information about all of its default states »Characters for which its default states have labeled transitions find node R at location R R c d a b all ab,cd,R cd,R R V U find node U at hash(c,d,R) find node V at hash(a,b,hash(c,d,R)) Content Labels

17 - Sailesh Kumar - 11/28/2015 Introduction to CD 2 FA R c d all ab,cd,R cd,R R V U Input char = hash(a,b, c,d,R) Z l m P q all X Y pq,lm,Z lm,Z hash(c,d,R) Current state: V (label = ab,cd,R) hash(p,q, l,m,Z) a b d a (R, a) (R, b) … (Z, a) (Z, b) … lm,Z pq,lm,Z (X, p) (X, q) (V, a) (V, b) → X (label = pq,lm,Z)

18 - Sailesh Kumar - 11/28/2015 Construction of CD 2 FA n We seek to keep the content labels small n Twin Objectives: »Ensure that states have few labeled transitions »Ensure that default paths are as small as possible n D 2 FA construction heuristic based upon maximum weight spanning tree creates long default paths »Limit default paths => less space efficient D 2 FAs n Proposed new heuristic called CRO to construct D 2 FAs »Runs in 3 phases: Construction, Reduction and Optimization »Default path bound = 2 edges => CRO algorithm constructs upto 10x space efficient D 2 FAs »CD 2 FAs are constructed from these D 2 FAs

19 - Sailesh Kumar - 11/28/2015 Memory Mapping in CD 2 FA R c d all ab,cd,R cd,R R V U Z l m P q all X Y pq,lm,R lm,R a b (R, a) (R, b) … (Z, a) (Z, b) … WE HAVE ASSUMED THAT HASHING IS COLLISION FREE hash(a,b,hash(c,d,R)) hash(c,d,R)) hash(p,q,hash(l,m,Z)) COLLISION

20 - Sailesh Kumar - 11/28/2015 Collision-free Memory Mapping a a b c p q r l m n d e f bc, …. pqr, n, def, hash (abc, …) hash (def, …) hash (pqr, …) hash (lmn, …) hash (edf, …) lm hash (mln, …) WE NEED SYSTEMATIC APPRAOCH Four states 4 memory locations

21 - Sailesh Kumar - 11/28/2015 Bipartite Graph Matching n Bipartite Graph »Left nodes are state content labels »Right nodes are memory locations »Map state labels to unique memory locations »An edge for every choice of content label »Perfect matching problem n With n left and right nodes »Need O(logn) random edges »n = 1M implies, we need ~20 edges per node n If we provide slight memory over-provisioning »We can uniquely map state labels with much fewer edges n In our experiments, we found perfect matching without memory over-provisioning

22 - Sailesh Kumar - 11/28/2015 Memory Reduction Results

23 - Sailesh Kumar - 11/28/2015 Throughput Results 3x Faster 4KB cache

24 - Sailesh Kumar - 11/28/2015 Outline n Regular expressions based packet content inspection (main focus) »D 2 FA »CD 2 FA n Packet header processing »HEXA (History based Encoding, eXecution and Addressing)

25 - Sailesh Kumar - 11/28/2015 HEXA, ICNP’07 n HEXA (History-based Encoding, eXecution and Addressing) »Challenges the assumption that graph structures must store log 2 n bits pointers to identify successor nodes »Requires only 2-bit versus 20-bit pointers (for 1 million nodes) n Useful for »IP lookup tries (directed acyclic graph) »Simple finite automaton such as Aho-Corasick String Matchers

26 - Sailesh Kumar - 11/28/2015 Tries - Traditional Implementation Addrdata 10, 2, 3 20, 4, 5 31, NULL, 6 41, NULL, NULL 50, 7, 8 61, NULL, NULL 70, 9, NULL 81, NULL, NULL 9 There are nine nodes; we will need 4-bit node identifiers Total memory = 9 x 9 bits A node will require 9-bits Two 4-bit child pointers One flag indicates if node is a prefix

27 - Sailesh Kumar - 11/28/2015 HEXA based Implementation Define HEXA identifier of a node as the path which leads to it from the root Notice that these identifiers are unique Thus, they can potentially be mapped to unique memory addresses

28 - Sailesh Kumar - 11/28/2015 HEXA based Implementation Use hashing to map the HEXA identifier to memory address If we have a minimal perfect hash function f -A function that maps elements to unique location Then we can store the trie as shown below f(010) = 5 f(011) = 3 f(0100) = 6 f(-) = 4 f(0) = 7 f(1) = 9 f(00) = 2 f(01) = 8 f(11) = 1 Addrnode memPrefix 1 1,0,0 P3 2 1,0,0 P2 3 1,0,0 P4 4 0,1,1 5 0,1,0 6 1,0,0 P5 7 0,1, ,0,1 P1 Here we use only 3-bits per node in fast path IP addr …. The prefix, we were looking

29 - Sailesh Kumar - 11/28/2015 Devising One-to-one Mapping n Finding a minimal perfect hash function is difficult »One-to-one mapping is essential for HEXA to work n Use discriminator bits »Attach c-bits to every HEXA identifier, that we can modify »Thus a node can have 2 c choices of identifiers »We now need to store these c-bits for every child instead of a single flag n With multiple choices of HEXA identifiers for a node, reduce the problem to a bipartite graph matching »We need to find a perfect matching in the graph to map nodes to unique memory locations

30 - Sailesh Kumar - 11/28/2015 Devising One-to-one Mapping , 01 0, 10 0, , 01 -, 10 -, h(00) = 0, h(01) = 4 h(10) = 1, h(11) = 5 h(000) = 1, h(010) = 5 h(100) = 2, h(110) = , 01 1, 10 1, , 01 00, 10 00, , 01 01, 10 01, , 01 11, 10 11, , , , , , , , , , h() = 0, h() = 4 h() = 1, h() = 5 h() = 2, h() = 6 h() = 3, h() = 7 h() = 1, h() = 5 h() = 2, h() = 6 h() = 8, h() = 3 h() = 0, h() = 4 h() = 1, h() = 5 h() = 6, h() = 2 h() = 0, h() = 4 h() = 5, h() = 1 h() = 0, h() = 3 h() = 4, h() = 6 Input labels OR HEXA identifier Four choices of HEXA identifiers Choices of memory locations Bipartite graph Nodes PERFECT MATCHING Pick Appropriate Discriminators

31 - Sailesh Kumar - 11/28/2015 HEXA based Implementation Store its discriminator instead of a single flag for every left and right child Addrnode memPrefix 1 1,xx,xx P3 2 1,xx,xx P2 3 1,xx,xx P4 4 0,xx,xx 5 6 1,xx,xx P5 7 0,xx,xx 8 9 1,xx,xx P1 Here we use only 5-bits per node in fast path

32 - Sailesh Kumar - 11/28/2015 Results n 3 choices are enough to find a perfect matching »Thus 2-bits discriminators (00 value reserved for no child) –Significant reduction 2-bits per node versus log 2 n bits 32 Eatherton tries, each contains k prefixes.

33 - Sailesh Kumar - 11/28/2015 Incremental Updates n IP table updates are very frequent »When a node is removed and another added, we must ensure a few memory operations. n In the new bipartite graph, a new perfect matching can be found »Quickly (O(n) time in the worst case, typically constant time) »New matching is slightly different from the previous matching –Typically around 10 different edges, experimental worst-case - 18 –Thus less than 18 memory operations are needed for an update

34 - Sailesh Kumar - 11/28/2015 n Questions???