A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Slides:



Advertisements
Similar presentations
Hybrid Data Structure for IP Lookup in Virtual Routers Using FPGAs Authors: Oĝuzhan Erdem, Hoang Le, Viktor K. Prasanna, Cüneyt F. Bazlamaçcı Publisher:
Advertisements

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
An Efficient IP Address Lookup Algorithm Using a Priority Trie Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Mar. 11, 2008.
1 A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup Author: Weirong JiangWeirong Jiang Prasanna, V.K. Prasanna, V.K. Publisher: High-Performance.
1 Author: Ioannis Sourdis, Sri Harsha Katamaneni Publisher: IEEE ASAP,2011 Presenter: Jia-Wei Yo Date: 2011/11/16 Longest prefix Match and Updates in Range.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup Author: Weirong Jiang ViktorK.Prasanna Publisher: Infocom 08 Present:
1 Towards Green Routers: Depth- Bounded Multi-Pipeline Architecture for Power-Efficient IP Lookup Author: Weirong Jiang Viktor K. Prasanna Publisher: Performance,
1 Scalable high-throughput SRAM-based architecture for IP-lookup using FPGA Author: Hoang Le; Weirong Jiang; Prasanna, V.K.; Publisher: FPL Field.
1 Multi-Terabit IP Lookup Using Parallel Bidirectional Pipelines Author: Weirong Jiang Viktor K. Prasanna Publisher: ACM 2008 Presenter: Po Ting Huang.
Scalable IPv6 Lookup/Update Design for High-Throughput Routers Authors: Chung-Ho Chen, Chao-Hsien Hsu, Chen -Chieh Wang Presenter: Yi-Sheng, Lin ( 林意勝.
Multi-Terabit IP Lookup Using Parallel Bidirectional Pipelines Author: Weirong Jiang, Viktor K. Prasanna Publisher: May 2008 CF '08: Proceedings of the.
Parallel IP Lookup using Multiple SRAM-based Pipelines Authors: Weirong Jiang and Viktor K. Prasanna Presenter: Yi-Sheng, Lin ( 林意勝 ) Date:
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
High-Performance Packet Classification on GPU Author: Shijie Zhou, Shreyas G. Singapura and Viktor K. Prasanna Publisher: HPEC 2014 Presenter: Gang Chi.
Packet Classification using Rule Caching Author: Nitesh B. Guinde, Roberto Rojas-Cessa, Sotirios G. Ziavras Publisher: IISA, 2013 Fourth International.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
Fast forwarding table lookup exploiting GPU memory architecture Author : Youngjun Lee,Minseon Jeong,Sanghwan Lee,Eun-Jin Im Publisher : Information and.
Packet Classification Using Multi-Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: COMPSACW, 2013 IEEE 37th Annual (Computer.
Multi-dimensional Packet Classification on FPGA 100 Gbps and Beyond Author: Yaxuan Qi, Jeffrey Fong, Weirong Jiang, Bo Xu, Jun Li, Viktor Prasanna Publisher:
CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley, Jonathan Turner Washington University in St. Louis.
High-Speed Packet Classification Using Binary Search on Length Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Jan. 14, 2008.
A Hybrid IP Lookup Architecture with Fast Updates Author : Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian Conference: IEEE INFOCOM,
1 Towards Practical Architectures for SRAM-based Pipelined Lookup Engines Author: Weirong Jiang, Viktor K. Prasanna Publisher: INFOCOM 2010 Presenter:
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
1 Memory-Efficient and Scalable Virtual Routers Using FPGA Author: Hoang Le, Thilan Ganegedara and Viktor K. Prasanna Publisher: ACM/SIGDA FPGA '11 Presenter:
Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Author: Hoang Le, Weirong Jiang and Viktor K. Prasanna Publisher: IEEE.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Binary-tree-based high speed packet classification system on FPGA Author: Jingjiao Li*, Yong Chen*, Cholman HO**, Zhenlin Lu* Publisher: 2013 ICOIN Presenter:
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Range Enhanced Packet Classification Design on FPGA Author: Yeim-Kuan Chang, Chun-sheng Hsueh Publisher: IEEE Transactions on Emerging Topics in Computing.
PC-TRIO: A Power Efficient TACM Architecture for Packet Classifiers Author: Tania Banerjee, Sartaj Sahni, Gunasekaran Seetharaman Publisher: IEEE Computer.
Packet Classification Using Dynamically Generated Decision Trees
Author: Weirong Jiang and Viktor K. Prasanna Publisher: The 18th International Conference on Computer Communications and Networks (ICCCN 2009) Presenter:
1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.
Practical Multituple Packet Classification Using Dynamic Discrete Bit Selection Author: Baohua Yang, Fong J., Weirong Jiang, Yibo Xue, Jun Li Publisher:
Hierarchical Hybrid Search Structure for High Performance Packet Classification Authors : O˜guzhan Erdem, Hoang Le, Viktor K. Prasanna Publisher : INFOCOM,
Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:
A Multi-dimensional Packet Classification Algorithm Based on Hierarchical All-match B+ Tree Author: Gang Wang, Yaping Lin*, Jinguo Li, Xin Yao Publisher:
IP Routers – internal view
2018/6/26 An Energy-efficient TCAM-based Packet Classification with Decision-tree Mapping Author: Zhao Ruan, Xianfeng Li , Wenjun Li Publisher: 2013.
Statistical Optimal Hash-based Longest Prefix Match
Parallel Processing Priority Trie-based IP Lookup Approach
2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.
Scalable Memory-Less Architecture for String Matching With FPGAs
2018/12/29 A Novel Approach for Prefix Minimization using Ternary trie (PMTT) for Packet Classification Author: Sanchita Saha Ray, Abhishek Chatterjee,
Memory-Efficient Regular Expression Search Using State Merging
Virtual TCAM for Data Center Switches
A Small and Fast IP Forwarding Table Using Hashing
Scalable Multi-Match Packet Classification Using TCAM and SRAM
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Compact DFA Structure for Multiple Regular Expressions Matching
Online NetFPGA decision tree statistical traffic classifier
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
Pipelined Architecture for Multi-String Matching
Power-efficient range-match-based packet classification on FPGA
Large-scale Packet Classification on FPGA
A Trie Merging Approach with Incremental Updates for Virtual Routers
Authors: A. Rasmussen, A. Kragelund, M. Berger, H. Wessing, S. Ruepp
Design principles for packet parsers
A Hybrid IP Lookup Architecture with Fast Updates
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
2019/10/19 Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures Author: Eva Papadogiannaki, Lazaros Koromilas, Giorgos.
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Towards TCAM-based Scalable Virtual Routers
Packet Classification Using Binary Content Addressable Memory
2019/11/12 Efficient Measurement on Programmable Switches Using Probabilistic Recirculation Presenter:Hung-Yen Wang Authors:Ran Ben Basat, Xiaoqi Chen,
Presentation transcript:

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA 2019/9/20 A SRAM-based Architecture for Trie-based IP Lookup Using FPGA Author: Hoang Le, Weirong Jiang, Viktor K. Prasanna Publisher/Conf.: (2008)16th International Symposium on Field-Programmable Custom Computing Machines Presenter: 林鈺航 Date: 2018/12/26 1 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. CSIE CIAL Lab

Background (1/3) A simple pipelining approach is to map each tree level onto a pipeline stage with its own memory and processing logic. One packet can be processed every clock cycle. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Background (2/3) However, this approach results in unbalanced tree node distribution over the pipeline stages. In an unbalanced pipeline, the “fattest” stage, which stores the largest number of tree nodes, becomes a bottleneck. “Fattest Stage” National Cheng Kung University CSIE Computer & Internet Architecture Lab

Background (3/3) More time is needed to access the larger local memory. This leads to a reduction in the global clock rate. A fat stage results in many updates. During the update process caused by intensive route/rule insertion, the fattest stage may also result in memory overflow. It is unclear at hardware design time which stage will be the fattest, we need to allocate memory with the maximum size for every stage. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Method(1/2) 12 levels ‧‧‧‧‧ ‧‧‧‧ ‧‧‧‧‧‧‧‧‧‧‧‧ Sub trie Sub trie Sub trie Sub trie Sub trie Sub trie National Cheng Kung University CSIE Computer & Internet Architecture Lab

Method(1/2) ‧‧‧‧‧‧‧‧‧‧‧‧ Stage: 1 2 3 4 5 ‧‧‧‧ National Cheng Kung University CSIE Computer & Internet Architecture Lab

BiOLP Architecture(1/2) Direction Index Table (DIT) stores the relationship between the subtries and their mapping directions. For any arriving packet p, the initial bits of its IP address are used to lookup the DIT and retrieve the information about its corresponding subtrie ST(p). The information includes (1) the distance to the stage where the root of ST(p) is stored (2) the memory address of the root of ST(p) in that stage (3) the mapping direction of ST(p) which leads the packet to different entrance of the pipeline. Bidirectional Optimized Linear Pipeline National Cheng Kung University CSIE Computer & Internet Architecture Lab

BiOLP Architecture(2/2) The content of each entry in the memory includes (1) the memory address of the child node and (2) the distance to the stage where the child node is stored. Before a packet moves onto the next stage, its distance value is checked. If it is zero, the memory address of its child node will be used to index the memory in the next stage to retrieve its child node content. Otherwise, the packet will do nothing in that stage but decrement its distance value by one. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Subtrie Invertion(1/3) In a trie, there are few nodes at the top levels while there are a lot of nodes at the leaf level. Hence, we can invert some subtries so that their leaf nodes are mapped onto the first several stages. IFR denotes the inversion factor. A larger inversion factor results in more subtries to be inverted. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Subtrie Invertion(2/3) Stage: 1 After sub-trie is inverted National Cheng Kung University CSIE Computer & Internet Architecture Lab

Subtrie Invertion(3/3) In this paper, it’s propose several heuristics to select the subtries to be inverted: Largest leaf: The subtrie with the most number of leaves. Least height: The subtrie whose height is the minimum. Largest leaf per height: By dividing the number of leaves of a subtrie by its height. Least average depth per leaf: Average depth per leaf is the ratio of the sum of the depth of all the leaves to the number of leaves. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Mapping Forward subtrie Reverse subtrie (a) After invertedfinish (b) Two set of subtrie National Cheng Kung University CSIE Computer & Internet Architecture Lab

Mapping - Algorithm Mi is the number of nodes mapped onto stage i. The node whose priority is equal to the number of the remaining stages is regarded as a Critical node. If such a node is not mapped onto the current stage, none of its descendants can be mapped later. The nodes are popped out of the ReadyList in the decreasing order of their priority. The priority of a trie node is defined as its height if the node is in a forward subtrie, and its depth if in a reverse subtrie. For the forward subtries, a node will be pushed into the NextReadyList immediately after its parent is popped. For the reverse subtries, a node will not be pushed into the NextReadyList until all its children are popped. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Cache-based BiOLP Architecture At the front is the cache module, which takes in up to four IP addresses at a time. The most recently searched packets are cached. If a cache hit occurs, the packet will skip traversing the pipeline. Otherwise, it needs to go through the pipeline. For IP lookup, only the destination IP of the packet is used to index the cache. The cache update will be triggered, either when there is a route update that is related to some cached entry, or after a packet that previously had a cache miss retrieves its search result from the pipeline. The length of the delay buffer is equal to the sum of the pipeline depth and the queue length. Cache update use the Least Recently Used (LRU) algorithm National Cheng Kung University CSIE Computer & Internet Architecture Lab

Route Update The new content of the memory is computed offline. When an update is initiated, a write bubble is inserted into the pipeline. The direction of write bubble insertion is determined by the direction of the subtrie that the write bubble is going to update. When it arrives at the stage prior to the stage to be updated, the write bubble uses its ID to lookup the WBT. If the write enable bit is set, the write bubble will use the new content to update the memory location in the next stage. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Implementation Results(1/5) Representative routing tables National Cheng Kung University CSIE Computer & Internet Architecture Lab

Implementation Results(2/5) Impact of the inversion heuristics: We have four different heuristics to invert subtrees. The value of the inversion factor is set to 1. According to the results, the least average depth per leaf heuristic has the best performance. Impact of the inversion factor: The inversion heuristic is the largest leaf heuristic. When we increase the inversion factor from 0 to 25, the Bidirectional mapping changes from top-down to bottom-up. From the results, we can achieve perfect memory balance with the inversion factor between 4 and 8. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Implementation Results(3/5) Impact of the input width: We increased the input width, and observed the throughput scalability. With caching, the throughput scales well with the input width. The number of parallel inputs, denoted P (input width) National Cheng Kung University CSIE Computer & Internet Architecture Lab

Implementation Results(4/5) Impact of the queue size: the queue size greater than 16 had little effect on the throughput improvement since we reach the point of diminishing return at the queue size of 16. Even without the queue, the throughput can reach over 3.65 PPC. Therefore, a small queue with the size of 16 is sufficient for the 4-width BiOLP architecture. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Implementation Results(5/5) Impact of the cache size: Caching is effective in improving the throughput. Even with only 1% of the routing entries being cached, the throughput reached almost 4 PPC in 4- width architecture. the number of packets processed per clock cycle (PPC) National Cheng Kung University CSIE Computer & Internet Architecture Lab