1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

Slides:

Advertisements

Similar presentations

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.

Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

Network Algorithms, Lecture 4: Longest Matching Prefix Lookups George Varghese.

1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.

IP Routing Lookups Scalable High Speed IP Routing Lookups.

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.

Hybrid Data Structure for IP Lookup in Virtual Routers Using FPGAs Authors: Oĝuzhan Erdem, Hoang Le, Viktor K. Prasanna, Cüneyt F. Bazlamaçcı Publisher:

An Efficient IP Address Lookup Algorithm Using a Priority Trie Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Mar. 11, 2008.

Router Architecture : Building high-performance routers Ian Pratt

1 A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup Author: Weirong JiangWeirong Jiang Prasanna, V.K. Prasanna, V.K. Publisher: High-Performance.

IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.

1 Scalable high-throughput SRAM-based architecture for IP-lookup using FPGA Author: Hoang Le; Weirong Jiang; Prasanna, V.K.; Publisher: FPL Field.

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:

1 Multi-Terabit IP Lookup Using Parallel Bidirectional Pipelines Author: Weirong Jiang Viktor K. Prasanna Publisher: ACM 2008 Presenter: Po Ting Huang.

CS 104 Introduction to Computer Science and Graphics Problems

Parallel-Search Trie-based Scheme for Fast IP Lookup

1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.

An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.

1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:

Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.

Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.

1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Low Contention Mapping of RT Tasks onto a TilePro 64 Core Processor 1 Background Introduction = why 2 Goal 3 What 4 How 5 Experimental Result 6 Advantage.

PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET

IP Address Lookup Masoud Sabaei Assistant professor

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley, Jonathan Turner Washington University in St. Louis.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

1 Towards Practical Architectures for SRAM-based Pipelined Lookup Engines Author: Weirong Jiang, Viktor K. Prasanna Publisher: INFOCOM 2010 Presenter:

Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.

1 Memory-Efficient and Scalable Virtual Routers Using FPGA Author: Hoang Le, Thilan Ganegedara and Viktor K. Prasanna Publisher: ACM/SIGDA FPGA '11 Presenter:

1 A Throughput-Efficient Packet Classifier with n Bloom filters Authors: Heeyeol Yu and Rabi Mahapatra Publisher: IEEE GLOBECOM 2008 proceedings Present:

IP Routing Processing with Graphic Processors Author: Shuai Mu, Xinya Zhang, Nairen Zhang, Jiaxin Lu, Yangdong Steve Deng, Shu Zhang Publisher: IEEE Conference.

StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:

1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:

1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,

Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.

High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.

Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:

CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,

Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.

Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

Principles of Linear Pipelining

Author: Weirong Jiang and Viktor K. Prasanna Publisher: The 18th International Conference on Computer Communications and Networks (ICCCN 2009) Presenter:

Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.

1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.

Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.

IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.

Author: Yun R. Qu, Shijie Zhou, and Viktor K. Prasanna Publisher:

IP Routers – internal view

A Scalable Routing Architecture for Prefix Tries

Review Graph Directed Graph Undirected Graph Sub-Graph

Statistical Optimal Hash-based Longest Prefix Match

Packet Classification Using Coarse-Grained Tuple Spaces

A Small and Fast IP Forwarding Table Using Hashing

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu

Authors: A. Rasmussen, A. Kragelund, M. Berger, H. Wessing, S. Ruepp

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Presentation transcript:

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu, G.Singh, S. Publisher: Computer Architecture, ISCA '05. Proceedings. 32nd International Presenter: Po Ting Huang Date: 2009/12/23

2 Introduction This paper describes the pipeline architecture which provides both high execution throughput and balanced memory distribution dividing the tree into subtrees and allocating each subtree separately allowing searches to begin at any pipeline stage The architecture is validated by implementing and simulating state of the art solutions for IPv4 lookup, VPN forwarding and packet classification. Provide a solutions do well in terms of performance, efficiency, and cost

3 Background Rapid growth in network link rates poses a strong demand on high speed packet forwarding engines Searching engine’s work has been a significant bottleneck for core routers. using pipelining can significantly improve the throughput

4 memory allocation Problem For trie-based searching, a simple approach is to map each trie level onto a private pipeline stage this approach results in unbalanced trie node distribution over different pipeline stages the stage storing a larger number of trie nodes needs more time to access the larger memory When there is intensive route insertion, the larger stage can lead to memory overflow

5 Conventional solution conventional approaches use either complex dynamic memory allocation schemes (dramatically increasing the hardware complexity) or over-provision each of the pipeline stages (resulting in memory waste)

6 First contribution we introduce our first contribution: an additional degree of freedom for the search operation. We allow the search to start at any stage in the pipeline. For every search, the starting position is picked using a hash function based on information in the packet header. Ip lookup and packet classification

7 Sub tree Allocation

8 To keep the explanation simple, let us assume that the tree has four subtrees, called s1….s4 Furthermore, the depth of each subtree is four levels. We assume that this search structure is implemented on a four stage pipeline. The stages of the pipeline are called p1….p4 The first level of the subtree S1 called s The first level of the subtree s1 called s is stored and processed by the pipeline stage p1 The second level s, is stored and processed by the pipeline stage p2 and so on. Subtree s2 s3 s4 goes on

9 Sub tree Allocation con. By doing so, the pipeline allocates nearly equal amounts of memory to each stage, by virtually allocating a “subtree” in each of the stages.

10 two simplifications In practice, we relax these two simplifications in this illustration. First :We allow more subtrees than pipeline stages (processing elements), thus implying multiple subtrees may have the same start node. Second :We also allow the maximum depth of each subtree to be less than or equal to the number of pipeline stages.

11 Conflicts However, introducing this new degree of freedom that allows search tasks to start execution from any pipeline stage impacts the throughput of the system. This is because of potential conflicts between the new tasks and the ones that are in execution.

12 Second contribution It modifies the regular pipeline structure and behavior as follows. Each pipeline stage works at a frequency f=2*F where F is the maximum throughput of the input All tasks traverse the pipeline twice and are inserted at the first pipeline stage,irrespective of their starting stage (for execution) in the pipeline.

13 Architecture of Ring Pipeline

14 Second contribution con. Each pipeline stage accommodates two data paths (virtual data paths – they can share the same physical wires). The first data path (represented by the top lines) is active during the odd clock cycles and it is used for a first traversal of the pipeline. During this traversal a task Ti traverses the pipeline until its starting stage I and continues the execution until the last stage of the pipeline

15 Second contribution con. The second data path is traversed during even cycles and allows the task to continue its execution on the pipeline stages that are left. Once a task finishes executing, its results are propagated to the output through the final stage. For example~~~~~~~~

16 Guarantees the following 1) an output rate equal to the input rate 2) all the tasks exit in order 3) all the tasks have a constant latency through the pipeline equal to N*1/F 4) while communication between processors occurs only between neighbors in a linear ordering of the processors (1) the need for a scheduler for both input and output of the task (2) the communication complexity.

17 Selecting the Subtrees Ideally, the subtrees to be allocated should have relatively equal size (approximately the same number of nodes) We provide an iterative algorithm that takes as input the original trie and at each step identifies one subtrie that contains a number of nodes which is the closest to a desired value (threshold). The result of the algorithm is a list of tuples. Each tuple is made up of the root node of a subtrie together with the longest matching prefix of this node.

18 The Allocation of the Subtrees Our heuristic considers one subtree at a time, randomly picked from the set of subtrees identified using the algorithm described in the previous method, and allocates it such that the level in the new subtree that requires the minimum amount of memory corresponds to the pipeline stage that already uses the largest amount of memory.

19 Evaluation Focuses on the following two critical questions 1) What is the overall waste in the memory space due to our new model? 2) What is the maximum throughput and expected latency our scheme can provide? we synthesized in Verilog the computational logic for each pipeline stage for both Eatherton’s IP lookup algorithm and the HyperCuts algorithm using 0.13um technology

20 Search Latency and Throughput When our balanced allocation algorithm is applied, we find that all searches analyzed in this research, except one, can be implemented with memory latency less than 2ns The longest path delay in the computation of the next node address in both algorithms is smaller than 1ns This combines with a 2ns memory access time to a allow a 3ns execution delay per pipeline stage

21 Search Latency and Throughput con. Given the architecture of Section 2, a pipeline running at 330MHZ(3 ns per stage) achieves a search throughput of 6ns per packet All the searches through the pipeline have a latency that is constant and is double the latency of a one way pipeline traversal. The overall latency of a search operation using the Eatherton algorithm [11] for the IPv4 lookup is 8*2*3ns=48ns assuming an eight-stage pipeline with

22 Memory Distribution per Pipeline Stage Evaluation of IP Lookup

23 Evaluation of IP Lookup an IP prefix table with about entriesrequires almost 11Mbits of memory for one stage Asaresult the memory access time increases to about 3.5ns In comparison, our new pipeline scheme has a maximum of 2.9Mbits of memory allocated per stage As a result the memory access time is reduced to 1.4ns

24 Evaluation of IP Lookup con.

25 Evaluation of VPN Forwarding

26 Evaluation of the Packet Classification Algorithm (HyperCuts)