Ultra-High Throughput Low-Power Packet Classification

Slides:

Advertisements

Similar presentations

Packet Classification using Hierarchical Intelligent Cuttings

Advertisements

Multi-dimensional Packet Classification on FPGA: 100Gbps and Beyond

Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar School of Electrical & Computer Engineering SIGCOMM 2010.

An On-Chip IP Address Lookup Algorithm Author: Xuehong Sun and Yiqiang Q. Zhao Publisher: IEEE TRANSACTIONS ON COMPUTERS, 2005 Presenter: Yu Hao, Tseng.

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

HybridCuts: A Scheme Combining Decomposition and Cutting for Packet Classification Author: Wenjun Li, Xianfeng Li Publisher: 2013 IEEE 21 st Annual Symposium.

Outline Introduction Related work on packet classification Grouper Performance Empirical Evaluation Conclusions.

A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.

1 TCAM Razor: A Systematic Approach Towards Minimizing Packet Classifiers in TCAMs Department of Computer Science and Information Engineering National.

A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

1 Searching Very Large Routing Tables in Wide Embedded Memory Author: Jan van Lunteren Publisher: GLOBECOM 2001 Presenter: Han-Chen Chen Date: 2010/01/06.

Fast Filter Updates for Packet Classification using TCAM Authors: Haoyu Song, Jonathan Turner. Publisher: GLOBECOM 2006, IEEE Present: Chen-Yu Lin Date:

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

CS 268: Lectures 13/14 (Route Lookup and Packet Classification) Ion Stoica April 1/3, 2002.

Efficient Multidimensional Packet Classification with Fast Updates Author: Yeim-Kuan Chang Publisher: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 4, APRIL.

1 DRES:Dynamic Range Encoding Scheme for TCAM Coprocessors Authors: Hao Che, Zhijun Wang, Kai Zheng and Bin Liu Publisher: IEEE Transactions on Computers,

1 Energy Efficient Packet Classification Hardware Accelerator Alan Kennedy, Xiaojun Wang HDL Lab, School of Electronic Engineering, Dublin City University.

Packet Classification George Varghese. Original Motivation: Firewalls Firewalls use packet filtering to block say ssh and force access to web and mail.

Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.

Chapter 9 Classification And Forwarding. Outline.

1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET

Packet Classification Using Multi-Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: COMPSACW, 2013 IEEE 37th Annual (Computer.

Applied Research Laboratory Edward W. Spitznagel 7 October Packet Classification for Core Routers: Is there an alternative to CAMs? Paper by: Florin.

(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.

LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.

An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

Multi-dimensional Packet Classification on FPGA 100 Gbps and Beyond Author: Yaxuan Qi, Jeffrey Fong, Weirong Jiang, Bo Xu, Jun Li, Viktor Prasanna Publisher:

GLOBECOM (Global Communications Conference), 2012

Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.

Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)

Packet Classification on Multiple Fields 참고 논문 : Pankaj Gupta and Nick McKeown SigComm 1999.

Packet Classifiers In Ternary CAMs Can Be Smaller Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison) Jia Wang.

Multi-Field Range Encoding for Packet Classification in TCAM Author: Yeim-Kuan Chang, Chun-I Lee and Cheng-Chien Su Publisher: INFOCOM 2011 Presenter:

Applied Research Laboratory Edward W. Spitznagel 24 October Packet Classification using Extended TCAMs Edward W. Spitznagel, Jonathan S. Turner,

1. Outline Introduction Related work on packet classification Grouper Performance Analysis Empirical Evaluation Conclusions 2/42.

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

Author ： Ioannis Sourdis, Vasilis Dimopoulos, Dionisios Pnevmatikatos and Stamatis Vassiliadis Publisher ： ANCS’06 Presenter ： Zong-Lin Sie Date ： 2011/01/05.

StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:

SCALABLE PACKET CLASSIFICATION USING INTERPRETING A CROSS-PLATFORM MULTI-CORE SOLUTION Author: Haipeng Cheng, Zheng Chen, Bei Hua and Xinan Tang Publisher/Conf.:

A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.

Bit Weaving: A Non-Prefix Approach to Compressing Packet Classifiers in TCAMs Author: Chad R. Meiners, Alex X. Liu, and Eric Torng Publisher: 2012 IEEE/ACM.

Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:

CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

Packet classification on Multiple Fields Authors: Pankaj Gupta and Nick McKcown Publisher: ACM 1999 Presenter: 楊皓中 Date: 2013/12/11.

Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.

TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.

Boundary Cutting for Packet Classification Author: Hyesook Lim, Nara Lee, Geumdan Jin, Jungwon Lee, Youngju Choi, Changhoon Yim Publisher: Networking,

Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.

1 Bit Weaving: A Non-Prefix Approach to Compressing Packet Classifiers in TCAMs Author: Chad R. Meiners, Alex X. Liu, and Eric Torng Publisher: IEEE/ACM.

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

Packet Classification Using Dynamically Generated Decision Trees

Author: Weirong Jiang and Viktor K. Prasanna Publisher: The 18th International Conference on Computer Communications and Networks (ICCCN 2009) Presenter:

Ultra-High Throughput Low-Power Packet Classiﬁcation Author: Alan Kennedy and Xiaojun Wang Accepted by IEEE Transactions on VLSI.

Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.

1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.

Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.

Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.

Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:

Toward Advocacy-Free Evaluation of Packet Classification Algorithms

Transport Layer Systems Packet Classification

Implementing an OpenFlow Switch on the NetFPGA platform

Scalable Multi-Match Packet Classification Using TCAM and SRAM

Publisher : TRANSACTIONS ON NETWORKING Author : Haoyu Song, Jonathan S

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI

MEET-IP Memory and Energy Efficient TCAM-based IP Lookup

Presentation transcript:

Ultra-High Throughput Low-Power Packet Classification Author: Alen Kennedy and Xiaojun Wang Publisher: IEEE Transactions on Very Large Scale Integration (VLSI) Systems Presenter: Ching Hsuan Shih Date: 2013/10/24

Outline Introduction Decision Tree-Based Packet Classification Algorithmic Modifications Packet Classification Engine Performance Results Conclusion

I. Instroduction The current amount of energy used by networking devices worldwide could exceed the yearly output of 21 typical nuclear reactor units [2]. Power consumption should be a key concern when designing any new networking equipment. To relieve pressure of a network processor with the growing number of tasks such as packet fragmentation, reassembly, classification and etc. By the addition of extra processing capacity and ramping up clock speeds to gain extra performance are difficult due to physical limitations in the silicon and tight power budgets. By the use of hardware accelerators which can reduce power consumption while increasing processing capacity.

I. Instroduction (Cont.) In software: Analysis [12] showed that even the best performing algorithm in terms of throughput RFC [5] can only classify around 400,000 packets per second. In hardware: Approaches can classify packets at core network line speeds, which can exceed 40 Gb/s using power hungry TCAM. The classifier presented here: A modified version of the HyperCuts can classify packets in parallel at speeds of up to 138.56 Gb/s.

II. Decision Tree-Based Packet Classification HyperCuts packet classification algorithm Multiple dimensions cutting at a time. Creates a decision tree by taking a geometric view of a ruleset, with each rule considered to be a hypercube in hyperspace. Works by breaking a ruleset into groups, with each group containing a small number of rules suitable for a linear search.

II. Decision Tree-Based Packet Classification (Cont.) A. Building a Decision Tree (HyperCuts) 1. Decide a value for spfac and binth: spfac is used to control how many cuts can be made to each root or internal node binth is used to limit the amount of rules at leaf nodes In the following example, spfac will be 3 and binth will be 2

II. Decision Tree-Based Packet Classification (Cont.) 2. Decide which dimensions to cut: Calculate the number of distinct range specifications for each field The source and destination IPs both having 6, the source and destination ports both having 4, and the protocol number having 2, giving a mean of 4.4. The fields whose distinct number of range specifications is greater than or equal to the mean number of distinct range specifications are then considered for cutting. The source and destination IPs shall, therefore, be considered for cutting.

II. Decision Tree-Based Packet Classification (Cont.) 3. Decide how many cuts should be made: Max cuts to node i ≤ spfac*sqrt (number of rules at node i), where i is the internal or root node being cut. The maximum number of cuts that can be made to the root node in this example is 7.9. The number of cuts is limited to be a power of 2 for ease of implementation, which means that a maximum of 4 cuts can be performed. Try all combinations of cuts between the chosen dimensions. The combinations of cuts to the source and destination IPs are [0, 2], [0, 4], [2, 0], [2, 2], and [4, 0]. The combination [2, 2] resulting in the smallest maximum number of rules stored in a child node is to cut both the source and destination IPs in two. Note. Small spfac → resulting in fewer cuts to nodes, creating a deep and narrow decision tree. Require less memory but have a longer search time. Larger spfac → allowing more cuts, resulting in a wide but shallow decision tree. Require more memory but have a shorter search time.

II. Decision Tree-Based Packet Classification (Cont.) EX. A packet with a header [0001, 0111, 50, 80, UDP] 1. root node → 2 cuts performed to both the source and destination IPs → only 1 bit of MSB need to examine [0001, 0111] → these bits are concatenated to form the index 00 2. internal node → 4 cuts performed to the destination IP → 2 bits of MSB need to examine [0111] → giving the index 11 → linear search in a leaf node → return R6 as the matching rule

II. Decision Tree-Based Packet Classification (Cont.) B. Heuristics Used to Reduce Memory Usage Node merging The pointers to the leaf nodes which contain the same list of rules are modified so that they point to just one of these leaf nodes. Rule overlap A rule can never be matched and is , therefore, removed from a leaf node if a rule with a higher priority completely covers another rule within the leaf node’s subregion. Pushing common rule subset upward (not be used in the modified HyperCuts of this paper) Store rules at an internal node or root node that would otherwise need to be stored in all of the internal or root node’s subregions. Region compaction

II. Decision Tree-Based Packet Classification (Cont.) Region compaction: Result in fewer cuts, thus reducing memory consumption.

III. Algorithmic Modifications Cutting Scheme Compacting of a Regoin Through Pre-Cutting Rule Storage Cut Selection Memory Organization

III. Algorithmic Modifications (Cont.) 1. Cutting Scheme: Requires three values to be specified before building the decision tree. Number of cuts to be made to the root node → # of cuts = 2n, n is 1~18 Maximum number of cuts that can be made to an internal node → # of cuts = 2m , m is 1~4 Maximum number of rules that a leaf node can store

III. Algorithmic Modifications (Cont.) 1. Cutting Scheme: Perform the majority of cuts to the root node → resulting in a shallow decision tree. Only a few cuts made to an internal node → prevent the decision tree from using too much memory → the information needed to traverse an internal node can fit in a single memory word Use the same method of HyperCuts to select fields to cut and decide how many cuts to be made. One different is that all combinations of cuts between the chosen fields that equal the 2n limit are tried on the root node.

III. Algorithmic Modifications (Cont.) 2. Compacting of a Regoin Through Pre-Cutting: Why don’t use region compaction? → Requires floating point division when a packet traverses the decision tree. Also requires the minimum and maximum values of the area covered by all fields to calculate the index of the child node .

III. Algorithmic Modifications (Cont.) Region Compaction: A packet with a destination IP of 0111 d = ((7 - 5) + 1) / 2 = 1.5 index = (7 − 5) / 1.5 = 1

III. Algorithmic Modifications (Cont.) 2. Compacting of a Regoin Through Pre-Cutting: A packet with a destination IP of 0111 can be simply calculated by using its third MSB as index.

III. Algorithmic Modifications (Cont.) 3. Rule Storage: Store the actual rule in the leaf node rather than a pointer to the rule. A small increase in memory consumption for some rulesets and a reduction for others as pointers to rules do not need to be stored. Large increase in throughput as data are presented to the classifier one clock cycle earlier. Encoding scheme An IP address usually requires 32 bits to store its address and 6 bits to store its mask. Reduce the number of bits required to store the source and destination IPs from 76 bits down to 70 bits Only a slight increase in the logic needed to decode the information.

III. Algorithmic Modifications (Cont.) Encoding scheme: Store the 32 bits IP address and 6 bits mast as a 35 bits number. LSB of 35 bits → 0,𝑖𝑓 𝑚𝑜𝑟𝑒 𝑡ℎ𝑎𝑛 28 𝑏𝑖𝑡𝑠 𝑜𝑓 𝐼𝑃 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑏𝑒 𝑚𝑎𝑡𝑐ℎ 𝑒𝑥𝑎𝑐𝑡𝑙𝑦. →32 𝑏𝑖𝑡𝑠 𝑓𝑜𝑟 𝐼𝑃 𝑎𝑑𝑑𝑟𝑒𝑠𝑠, 2 𝑏𝑖𝑡𝑠 𝑖𝑛𝑑𝑖𝑐𝑎𝑡𝑖𝑛𝑔 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜 𝑛 ′ 𝑡−𝑐𝑎𝑟𝑒. 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒. →28 𝑏𝑖𝑡𝑠 𝑓𝑜𝑟 𝐼𝑃 𝑎𝑑𝑑𝑟𝑒𝑠𝑠, 6 𝑏𝑖𝑡𝑠 𝑖𝑛𝑑𝑖𝑐𝑎𝑡𝑖𝑛𝑔 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑖𝑡𝑠 𝑡ℎ𝑎𝑡 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑏𝑒 𝑚𝑎𝑡𝑐ℎ𝑒𝑑.

III. Algorithmic Modifications (Cont.) 4. Cut Selection (the information to calculate index of child node): The cutting information for each field consists of two pre-computed value Cuts (is also the length of the bit-mask for a given field) → the number of cuts in the field EX. An 8 bits protocol number limited to 256 cuts, can only have 0, 2, 4, 8, 16, 32, 64, 128, or 256 cuts performed to it. So, use 4 bits number for Cuts to represent the nine possible cut values. BPos → the number of lower bits in a packet field that need to be removed by shifting the field right to calculate a child node index. EX. The protocol number will require three bits to store its BPos value as it will need to be shifted right by 0~7 places.

III. Algorithmic Modifications (Cont.) 4. Cut Selection (the information to calculate index of child node): The child node index is generated in two stages Generate the subindex for each field Concatenate these subindices together to form the final 18 bits index

III. Algorithmic Modifications (Cont.) 5. Memory Organization: Use 324 bits wide memory words The root node requires 18 bits to store each of its child node pointers Each internal node will fit fully in one memory word Each rule in a leaf node requires 162 bits.

III. Algorithmic Modifications (Cont.) 5. Memory Organization: A memory map showing how to save a decision tree with 32 cuts made to the root node, 2 internal nodes, and 4 leaf nodes containing 1~6 rules.

IV. Packet Classification Engine Architecture of the Classifier: Two modules 𝐴 𝑡𝑟𝑒𝑒 𝑡𝑟𝑎𝑣𝑒𝑟𝑠𝑒𝑟 𝑡ℎ𝑎𝑡 𝑖𝑠 𝑢𝑠𝑒𝑑 𝑡𝑜 𝑡𝑟𝑎𝑣𝑒𝑟𝑠𝑒 𝑎 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑡𝑟𝑒𝑒 𝐴 𝑙𝑒𝑎𝑓 𝑛𝑜𝑑𝑒 𝑠𝑒𝑎𝑟𝑐ℎ𝑒𝑟 𝑒𝑚𝑝𝑙𝑜𝑦𝑠 𝑡𝑤𝑜 𝑐𝑜𝑚𝑝𝑎𝑟𝑎𝑡𝑜𝑟 𝑏𝑙𝑜𝑐𝑘𝑠 𝑡ℎ𝑎𝑡 𝑤𝑜𝑟𝑘 𝑖𝑛 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙 Information on the decision tree’s root node is stored in registers in the tree traverser. Make it possible for the tree traverser to begin a new packet while the previous packet is being compared with rules in a leaf node.

IV. Packet Classification Engine (Cont.) Architecture of the Classifier: Use 8 packet classification engines working in parallel with both the Stratix III and Cyclone III Rulesets that contain many wildcard rules to be broken up into groups Rules with wildcard source IP can be kept in one group, rules with wildcard destination IP can be kept in another group. Help to ensure that the bandwidth of an FPGAs internal memory is better utilized.

IV. Packet Classification Engine (Cont.)

IV. Packet Classification Engine (Cont.) Sorter logic block: The sorter logic block registers the Match, NoMatch, and RuleID signals for a classified packet to a chain of registers and multiplexers in series. The register selected will depend on the packet ID number. The Match, NoMatch, and RuleID signals will be registered to the output register if they are next in the sequence of results to be outputted, and stored if not. All stored results are shifted toward the output register each time a result appears that is due to be outputted. This means that the classification results are outputted from the classifier in the same order that the packets were inputted.

IV. Packet Classification Engine (Cont.) Supporting IPv6 Packet Classification: Widen the memory words from 324 bits to 348 bits, with a memory word storing one rule instead of two. Tree traverser uses more logic resources such as larger multiplexers. Leaf node searcher needs a larger comparator block . Root and internal nodes require an extra 4 bits to store their cutting information.

V. Performance Results Hardware Implementation Parameters: Stratix III: Process packets at line rates of up to 138.56 Gb/s as minimum-sized 40 byte packets can arrive back-to-back. Cyclone III: Reach line speeds of up to 70Gb/s.

V. Performance Results (Cont.) Memory Usage and Worst Case Number of Memory Accesses:

V. Performance Results (Cont.) Evaluation Against Prior Art: RFC, HiCuts, HyperCuts, TSS, and EGT-PC can only classify packets at speeds of 400,973, 57,042, 32,242, 10,700, and 7,491 p/s, respectively in software alone. SF: Switch Factor

V. Performance Results (Cont.) Throughput Versus Power Consumption: