Multi-dimensional Packet Classification on FPGA 100 Gbps and Beyond Author: Yaxuan Qi, Jeffrey Fong, Weirong Jiang, Bo Xu, Jun Li, Viktor Prasanna Publisher:

Slides:



Advertisements
Similar presentations
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Advertisements

Multi-dimensional Packet Classification on FPGA: 100Gbps and Beyond
A HIGH-PERFORMANCE IPV6 LOOKUP ENGINE ON FPGA Author : Thilan Ganegedara, Viktor Prasanna Publisher : FPL 2013.
Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.
HybridCuts: A Scheme Combining Decomposition and Cutting for Packet Classification Author: Wenjun Li, Xianfeng Li Publisher: 2013 IEEE 21 st Annual Symposium.
Hybrid Data Structure for IP Lookup in Virtual Routers Using FPGAs Authors: Oĝuzhan Erdem, Hoang Le, Viktor K. Prasanna, Cüneyt F. Bazlamaçcı Publisher:
A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.
Pipelined Parallel AC-based Approach for Multi-String Matching Department of Computer Science and Information Engineering National Cheng Kung University,
1 A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup Author: Weirong JiangWeirong Jiang Prasanna, V.K. Prasanna, V.K. Publisher: High-Performance.
400 Gb/s Programmable Packet Parsing on a Single FPGA Authors : Michael Attig 、 Gordon Brebner Publisher: 2011 Seventh ACM/IEEE Symposium on Architectures.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
1 Author: Ioannis Sourdis, Sri Harsha Katamaneni Publisher: IEEE ASAP,2011 Presenter: Jia-Wei Yo Date: 2011/11/16 Longest prefix Match and Updates in Range.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:
1 Towards Green Routers: Depth- Bounded Multi-Pipeline Architecture for Power-Efficient IP Lookup Author: Weirong Jiang Viktor K. Prasanna Publisher: Performance,
1 Scalable high-throughput SRAM-based architecture for IP-lookup using FPGA Author: Hoang Le; Weirong Jiang; Prasanna, V.K.; Publisher: FPL Field.
Parallel IP Lookup using Multiple SRAM-based Pipelines Authors: Weirong Jiang and Viktor K. Prasanna Presenter: Yi-Sheng, Lin ( 林意勝 ) Date:
1 Energy Efficient Packet Classification Hardware Accelerator Alan Kennedy, Xiaojun Wang HDL Lab, School of Electronic Engineering, Dublin City University.
Study of IP address lookup Schemes
1 DBS A Bit-level Heuristic Packet Classification Algorithm for High Speed Network Author: Baohua Yang, Xiang Wang, Yibo Xue and Jun Li Publisher: International.
Block Permutations in Boolean Space to Minimize TCAM for Packet Classification Authors: Rihua Wei, Yang Xu, H. Jonathan Chao Publisher: IEEE INFOCOM,2012.
High-Performance Packet Classification on GPU Author: Shijie Zhou, Shreyas G. Singapura and Viktor K. Prasanna Publisher: HPEC 2014 Presenter: Gang Chi.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.
LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Multi-Field Range Encoding for Packet Classification in TCAM Author: Yeim-Kuan Chang, Chun-I Lee and Cheng-Chien Su Publisher: INFOCOM 2011 Presenter:
1 Towards Practical Architectures for SRAM-based Pipelined Lookup Engines Author: Weirong Jiang, Viktor K. Prasanna Publisher: INFOCOM 2010 Presenter:
Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.
1 Memory-Efficient and Scalable Virtual Routers Using FPGA Author: Hoang Le, Thilan Ganegedara and Viktor K. Prasanna Publisher: ACM/SIGDA FPGA '11 Presenter:
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Author : Weirong Jiang, Yi-Hua E. Yang, and Viktor K. Prasanna Publisher : IPDPS 2010 Presenter : Jo-Ning Yu Date : 2012/04/11.
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Binary-tree-based high speed packet classification system on FPGA Author: Jingjiao Li*, Yong Chen*, Cholman HO**, Zhenlin Lu* Publisher: 2013 ICOIN Presenter:
Author: Weirong Jiang and Viktor K. Prasanna Publisher: ACM Symposium on Parallel Algorithms and Architectures, SPAA 2009 Presenter: Chin-Chung Pan Date:
Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.
Range Enhanced Packet Classification Design on FPGA Author: Yeim-Kuan Chang, Chun-sheng Hsueh Publisher: IEEE Transactions on Emerging Topics in Computing.
Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
Packet Classification Using Dynamically Generated Decision Trees
High Throughput and Programmable Online Traffic Classifier on FPGA Author: Da Tong, Lu Sun, Kiran Kumar Matam, Viktor Prasanna Publisher: FPGA 2013 Presenter:
Author: Weirong Jiang and Viktor K. Prasanna Publisher: The 18th International Conference on Computer Communications and Networks (ICCCN 2009) Presenter:
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
Author: Weirong Jiang, Viktor K. Prasanna Publisher: th IEEE International Conference on Application-specific Systems, Architectures and Processors.
Authors : Baohua Yang, Jeffrey Fong, Weirong Jiang, Yibo Xue, and Jun Li. Publisher : IEEE TRANSACTIONS ON COMPUTERS Presenter : Chai-Yi Chu Date.
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
Practical Multituple Packet Classification Using Dynamic Discrete Bit Selection Author: Baohua Yang, Fong J., Weirong Jiang, Yibo Xue, Jun Li Publisher:
Hierarchical Hybrid Search Structure for High Performance Packet Classification Authors : O˜guzhan Erdem, Hoang Le, Viktor K. Prasanna Publisher : INFOCOM,
Range Hash for Regular Expression Pre-Filtering Publisher : ANCS’ 10 Author : Masanori Bando, N. Sertac Artan, Rihua Wei, Xiangyi Guo and H. Jonathan Chao.
400 Gb/s Programmable Packet Parsing on a Single FPGA Author: Michael Attig 、 Gordon Brebner Publisher: ANCS 2011 Presenter: Chun-Sheng Hsueh Date: 2013/03/27.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
A Multi-dimensional Packet Classification Algorithm Based on Hierarchical All-match B+ Tree Author: Gang Wang, Yaping Lin*, Jinguo Li, Xin Yao Publisher:
FPGA IMPLEMENTATION OF TCAM Under the guidance of Dr. Hraziia Presented By Malvika ( ) Department of Electronics and Communication Engineering.
Author: Yun R. Qu, Shijie Zhou, and Viktor K. Prasanna Publisher:
High-throughput Online Hash Table on FPGA
Toward Advocacy-Free Evaluation of Packet Classification Algorithms
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Power-efficient range-match-based packet classification on FPGA
Large-scale Packet Classification on FPGA
A Trie Merging Approach with Incremental Updates for Virtual Routers
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Clustered Hierarchical Search Structure for Large-Scale Packet Classification on FPGA Publisher : Field Programmable Logic and Applications, 2011 Author.
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
Presentation transcript:

Multi-dimensional Packet Classification on FPGA 100 Gbps and Beyond Author: Yaxuan Qi, Jeffrey Fong, Weirong Jiang, Bo Xu, Jun Li, Viktor Prasanna Publisher: FPT 2010 Presenter: Chun-Sheng Hsueh Date: 2013/10/23 1

Introduction A FPGA-based architecture which based on HyperSplit targeting 100 Gbps packet classification. Special logic is designed to support two packets to be processed every clock cycle. A node-merging algorithm is proposed to reduce the number of pipeline stages. A leaf-pushing algorithm is designed to control the memory usage. 2

Introduction Software solutions have good flexibility and programmability, but they inherently lack high parallelism and abundant on-chip memory. TCAM-based solutions can reach wire speed performance, they sacrifice scalability, programmability and power efficiency. TCAM-based solutions also suffer from a range-to prefix conversion problem, making it difficult to support large and complex rule sets. 3

Background and Related Work A. Problem Statement: 4

Background and Related Work B. Packet Classification Algorithms: 5

Background and Related Work C. Related Work ◦ Jedhe et al. implemented the DCFL architecture on a Xilinx Virtex-2 Pro FPGA and achieved a throughput of 16 Gbps for 128 rules. ◦ Luo et al. proposed a method called explicit range search to allow more cuts per node than the original HyperCuts algorithm. The tree height is reduced at the cost of extra memory consumption. ◦ Jiang et al. proposed two optimization methods for the HyperCuts algorithm to reduce memory consumption. By deep pipelining, their FPGA implementation can achieve a throughput of 80 Gbps. 6

Architecture Design A. Algorithm Motivation ◦ Algorithm parallelism ◦ Logic complexity ◦ Memory efficiency 7

Architecture Design 8

A. Algorithm Motivation ◦ HyperSplit is well suited to FPGA implementation:  First, the tree structure of the binary search can be pipelined to achieve high throughput of one packet per clock cycle.  Second, the operation at each tree node is simple. Both the value comparison and address fetching can be efficiently implemented with small amount of logic. 9

Architecture Design B. Basic Architecture 10

Architecture Design 11

Architecture Design B. Basic Architecture ◦ When an update is initiated, a write bubble is inserted into the pipeline. Each write bubble is assigned with a stage_identifier. ◦ If the write enable bit is set, the write bubble will use the new content to update the memory at the stage specified by the stage_identifier. 12

Architecture Design C. Design Challenges ◦ Number of pipeline stages:  The pipeline can be long for large rule sets. For example, with 10K ACL rules, the pipeline has 28 stages. ◦ Block RAM usage:  Because the number of nodes is different at each level of the tree, the memory usage at each stage is not equal. In order to support on-the-fly rule update, the size of block RAMs for each pipeline stage needs to be determined during the implementation of the design. 13

ARCHITECTURE OPTIMIZATION A. Pipeline Depth Reduction 14

ARCHITECTURE OPTIMIZATION 15

ARCHITECTURE OPTIMIZATION 16

ARCHITECTURE OPTIMIZATION B. Controlled Block RAM Allocation ◦ First, because the number tree nodes in the top l-level is small (less than 4l), it is not efficient to instantiate the entry block RAMs for those levels. Instead, we allow those entries in distributed RAMs. ◦ Second, because pushing down the leaf nodes does not change the search semantics of the original tree and the number of leaf nodes is comparable to non-leaf nodes, we can reshape the tree by pushing leaf nodes down to lower levels to reduce the memory usage at certain stages. 17

ARCHITECTURE OPTIMIZATION B. Controlled Block RAM Allocation 18

19

PERFORMANCE EVALUATION A. Test bed and data set ◦ Rule sets ranges from 100 to 50,000 and generated by the Classbench with acl1 seeds. ◦ FPGA device used in the tests is the Xilinx Virtex-6, (XC6VSX475T) containing 7,640 Kb Distributed RAM and 38,304 Kb block RAMs. 20

PERFORMANCE EVALUATION B. Node-merging Optimization 21

PERFORMANCE EVALUATION C. Leaf-pushing Optimization 22

PERFORMANCE EVALUATION D. FPGA Performance 23

PERFORMANCE EVALUATION D. FPGA Performance 24