Block Permutations in Boolean Space to Minimize TCAM for Packet Classification Authors: Rihua Wei, Yang Xu, H. Jonathan Chao Publisher: IEEE INFOCOM,2012 Presenter: Jia-Wei,Yo Date: 2012/2/8 1
Introduction Ternary Content Addressable Memories (TCAMs) have been widely used to implement packet classification because of its parallel search capability and constant processing speed. 2
Introduction Rule r1, both the source port and destination port contain a range [1,5]. So both of them needs to be expanded to three prefixes, i.e., “001”, “01*”, “10*”. The combination of the prefix specifications of the two ranges will consume 3x3=9 TCAM entries, causing the well-known range expansion problem. 3 Propose a novel technique called Block Permutation (BP) to compress the packet classification rules stored in TCAMs
Relative work 4
In Figure 3 (b) spread sparsely and no two neighboring rule elements have the same action; thus, there are no two elements in the Karnaugh table that can be directly merged using logic optimization.
Block Permutation <> Ex : 0110Ex ’ : 1110 B 1 : 0001B 1 : 0001 B 2 : 1101B 2 ’ : 0101 B 3 : 0010=>B 3 : 0010=> B 1 and B 2 ’ merge to B 6 B 4 : 1110B 4 ’ : 0110B 3 and B 4 ’ merge to B 7 B 5 : ****B 5 : **** 6
Block Permutation 7
Terms and Concepts 1. Block size :The size of a block is defined as the number of points that are contained in the block. For example, the size of the block “0**1” is Distance :The number of different counterpart bits in their Boolean representations. For example, the distance between the two points “0001” and “1101” is 2. EX: “0*01” and “01*0” is 1, “0*01” and “0101” is Direction :If the Boolean representations of two blockshave wildcards(don’t care bit) that all appear in the same bit positions, we say these two blocks are in the same direction. EX: “0*01” and “0*10” in the same direction. 8
Terms and Concepts Target Blocks and Assistant Blocks: A pair of target blocks is the two blocks that we target to merge by a permutation. 9 B 6 and B 7 are target block.
Terms and Concepts To merge this target, we perform the operation “--10<>--11” over other two blocks “**10” and “**11”. These two blocks is the corresponding assistant. 10 Exchange row 10 and 11
Classifier compression 11 Wp : assistant block size tar : target block p : permutation
Classifier compression 1. GET_TARGET : Try to find out all possible targets <> (assistant block size : 3) Target block : (distance : 2) B 6 : 0*01=>B 6 ’ : 0*00 B 7 : 0*10=>B 7 ’ : 0*11 Can’t merge.
Classifier compression 2. EVAL_PERM :Have two tasks. One is to search all possible permutations for the targets we have obtained in previous step. The other is to determine if these permutations are worth performing and which permutation can yield the largest compression with the least overhead. Select the “best” one to perform : the number of blocks reduced minus the number of new blocks caused by the splitting of existing blocks.
Classifier compression <> B 4 : =>1100 produce two new small block and B 4 disappears B 3 : => Invalid
Classifier compression PERFORM : perform the permutation that has been selected in the step of EVAL_PERM to merge the target blocks.
Transformation implementation Use the pipeline structure to implement a series of transformations. If there are N transformations, we will design an N-stage pipeline. The one - block structure (one – stage pipeline) normally requires much less hardware resource than the pipeline structure, normally the stage has to be very complicated, thus largely reduce working speed. Propose a solution called stage-grouping to reduce the number of stages to trade-off between the speed and the cost. 16
Transformation implementation 17
Experiment 18 Linux workstation driven by Intel Xeon 2.0GHz E5335 CPUs. Implemented the corresponding transformations by using the FPGA of Altera Cyclone III. The FPGA synthesis tool used is Quartus II. The reason why we chose Altera Cyclone is due to its low price and appropriate clock rate. This kind of FPGA can run on a clock up to 400MHZ or even higher, which is enough for our targeted throughput of 100M packets per second. Nr = 150, Wmax = 102, Wmin = 54, using C/C++ language.
Experiment 19