HEXA: Compact Data Structures for Faster Packet Processing Sailesh Kumar Jonathan Turner Patrick Crowley Michael Mitzenmacher
HEXA HEXA (History-based Encoding, eXecution and Addressing) Novel representation for: IP Lookup tries (directed acyclic graph) Simple finite automaton such as Aho-Corasick String Matchers Space efficient Challenges the assumption that graph structures must store log2n bits pointers to identify successor nodes Requires only 2-bit versus 20-bit pointers (for 1 million nodes)
Tries - Traditional Implementation Addr data 1 2 3 5 4 7 9 P2 P5 6 P3 8 P4 P1 Five IP prefixes 1 0, 2, 3 2 0, 4, 5 1* P1 3 1, NULL, 6 00* P2 4 1, NULL, NULL 5 0, 7, 8 6 7 0, 9, NULL 8 9 11* P3 011* P4 0100* P5 There are nine nodes; we will need 4-bit node identifiers Total memory = 9 x 9 bits Each trie node will require 9-bits in memory - a flag indicating if node is a prefix - a 4-bit left child pointer - a 4-bit right child pointer
HEXA based Implementation 1 Five IP prefixes 1 2 3 1* P1 1 1 P1 1 00* P2 4 5 6 11* P3 P2 1 P3 011* P4 7 8 0100* P5 P4 9 P5 Properties of HEXA identifiers: Define HEXA identifier of a node as the path that leads to it from the root Unique for every node Implicit (need not be stored) 1. - 2. 0 3. 1 4. 00 5. 01 6. 11 7. 010 8. 011 Can replace node pointers 9. 0100
HEXA based Implementation Hash (HEXA identifier) = memory address IP addr. : 1 1 0 0 x x x If we have a minimal perfect hash function f - A function that maps elements to unique location Then we can store the trie as shown below begin lookup at root node The prefix, we were looking Addr node mem Prefix 1 2 3 4 5 6 7 8 9 Addr node mem Prefix 1 1,0,0 P3 2 P2 3 P4 4 0,1,1 5 0,1,0 6 P5 7 8 9 1,0,1 P1 f(-) = 4 f(0) = 7 f(1) = 9 We use only 3-bits per node in fast path - Valid prefix flag - Left child flag - Right child flag Properties of HEXA identifiers: 0,1,1 f(00) = 2 f(01) = 8 f(11) = 1 Unique for every node Implicit (need not be stored) 1. - 2. 0 3. 1 4. 00 5. 01 6. 11 7. 010 8. 011 9. 0100 0,1,1 Can act as memory address f(010) = 5 f(011) = 3 f(0100) = 6 1,0,1 P1
Devising One-to-one Mapping Finding a minimal perfect hash function is difficult One-to-one mapping is essential for HEXA to work Use discriminator bits Attach c-bits to every HEXA identifier, that we can modify Thus a node can have 2c choices of identifiers We now need to store these c-bits for every child instead of a single flag With multiple choices of HEXA identifiers for a node, reduce the problem to a bipartite graph matching We need to find a perfect matching in the graph to map nodes to unique memory locations
Devising One-to-one Mapping Use 2-bit discriminators Nodes Input labels OR HEXA identifier Four choices of HEXA identifiers Choices of memory locations Bipartite graph 1 - 00 -, 01 -, 10 -, 11 - h(00) = 0, h(01) = 4 h(10) = 1, h(11) = 5 2 00 0, 01 0, 10 0, 11 0 h(000) = 1, h(010) = 5 1 PERFECT MATCHING h(100) = 2, h(110) = 6 3 1 00 1, 01 1, 10 1, 11 1 00 00, 01 00, 10 00, 11 00 00 01, 01 01, 10 01, 11 01 00 11, 01 11, 10 11, 11 11 00 010, 01 010, 10 010, 11 010 00 011, 01 011, 10 011, 11 011 00 0100, 01 0100, 10 0100, 11 0100 h() = 0, h() = 4 h() = 1, h() = 5 h() = 2, h() = 6 h() = 3, h() = 7 h() = 8, h() = 3 h() = 6, h() = 2 h() = 5, h() = 1 h() = 0, h() = 3 h() = 4, h() = 6 2 4 00 3 5 01 4 Pick Appropriate Discriminators 6 11 5 7 010 6 8 011 7 9 0100 8
HEXA based Implementation Store its discriminator instead of a single flag for left and right children Addr node mem Prefix 1 1,xx,xx P3 2 P2 3 P4 4 0,xx,xx 5 6 P5 7 8 9 P1 Here we use only 5-bits per node in fast path - Valid prefix flag - Left discriminator - Right discriminator 1. - 2. 0 3. 1 4. 00 5. 01 6. 11 7. 010 8. 011 9. 0100
Results 3 choices are sufficient to find a perfect matching (with 10% memory over-provisioning) Thus 2-bits discriminators (00 value reserved for no child) Significant reduction 2-bits per node versus log2n bits 32 Eatherton tries, each contains 100-120k prefixes.
Incremental Updates IP table updates are very frequent When a node is removed and another added, we must ensure a few memory operations. In the new bipartite graph, a new perfect matching can be found Quickly (O(n2c) time in the worst-case, typically constant time) New matching is slightly different from the previous matching Typically around 10 different edges, experimental worst-case - 18 Thus less than 18 memory operations are needed for an update
HEXA for Pattern Matching HEXA can be used to compress Aho-Corasick string matching automaton Directed graph In the future, HEXA may become useful for general finite automaton Reg-ex acceleration
Thank you and Questions???