1 High-performance TCAM- based IP Lookup Engines Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present: 林呈俞 Date:

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Quality-of-Service Routing in IP Networks Donna Ghosh, Venkatesh Sarangan, and Raj Acharya IEEE TRANSACTIONS ON MULTIMEDIA JUNE 2001.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Jaringan Komputer Lanjut Packet Switching Network.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
Low Power TCAM Forwarding Engine for IP Packets Authors: Alireza Mahini, Reza Berangi, Seyedeh Fatemeh and Hamidreza Mahini Presenter: Yi-Sheng, Lin (
An Efficient IP Address Lookup Algorithm Using a Priority Trie Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Mar. 11, 2008.
1 MIPS Extension for a TCAM Based Parallel Architecture for Fast IP Lookup Author: Oğuzhan ERDEM Cüneyt F. BAZLAMAÇCI Publisher: ISCIS 2009 Presenter:
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup Author: Weirong Jiang ViktorK.Prasanna Publisher: Infocom 08 Present:
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
An Efficient Hardware-based Multi-hash Scheme for High Speed IP Lookup Department of Computer Science and Information Engineering National Cheng Kung University,
Parallel IP Lookup using Multiple SRAM-based Pipelines Authors: Weirong Jiang and Viktor K. Prasanna Presenter: Yi-Sheng, Lin ( 林意勝 ) Date:
1 Partition Filter Set for Power- Efficient Packet Classification Authors: Haibin Lu, MianPan Publisher: IEEE GLOBECOM 2006 Present: Chen-Yu Lin Date:
Parallel-Search Trie-based Scheme for Fast IP Lookup
PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers Author: Tania Banerjee-Mishra, Sartaj Sahni,Gunasekaran Seetharaman Publisher: IEEE Symposium.
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
1 DRES:Dynamic Range Encoding Scheme for TCAM Coprocessors Authors: Hao Che, Zhijun Wang, Kai Zheng and Bin Liu Publisher: IEEE Transactions on Computers,
Two stage packet classification using most specific filter matching and transport level sharing Authors: M.E. Kounavis *,A. Kumar,R. Yavatkar,H. Vin Presenter:
An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
(part 3).  Switches, also known as switching hubs, have become an increasingly important part of our networking today, because when working with hubs,
Computer Networks Switching Professor Hui Zhang
1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Packet Classification using Rule Caching Author: Nitesh B. Guinde, Roberto Rojas-Cessa, Sotirios G. Ziavras Publisher: IISA, 2013 Fourth International.
An Integrated IP Packet Shaper and Scheduler for Edge Routers MSEE Project Presentation Student: Yuqing Deng Advisor: Dr. Belle Wei Spring 2002.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
A Hybrid IP Lookup Architecture with Fast Updates Author : Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian Conference: IEEE INFOCOM,
Multi-Field Range Encoding for Packet Classification in TCAM Author: Yeim-Kuan Chang, Chun-I Lee and Cheng-Chien Su Publisher: INFOCOM 2011 Presenter:
Balajee Vamanan and T. N. Vijaykumar School of Electrical & Computer Engineering CoNEXT 2011.
Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.
1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.
CLUE: Achieving Fast Update over Compressed Table for Parallel Lookup with Reduced Dynamic Redundancy Author: Tong Yang, Ruian Duan, Jianyuan Lu, Shenjiang.
PARALLEL-SEARCH TRIE- BASED SCHEME FOR FAST IP LOOKUP Author: Roberto Rojas-Cessa, Lakshmi Ramesh, Ziqian Dong, Lin Cai Nirwan Ansari Publisher: IEEE GLOBECOM.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
IPv6-Oriented 4 OC768 Packet Classification with Deriving-Merging Partition and Field- Variable Encoding Scheme Mr. Xin Zhang Undergrad. in Tsinghua University,
A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.
HIGH-PERFORMANCE LONGEST PREFIX MATCH LOGIC SUPPORTING FAST UPDATES FOR IP FORWARDING DEVICES Author: Arun Kumar S P Publisher/Conf.: 2009 IEEE International.
Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.
1 Bit Weaving: A Non-Prefix Approach to Compressing Packet Classifiers in TCAMs Author: Chad R. Meiners, Alex X. Liu, and Eric Torng Publisher: IEEE/ACM.
Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Author : Masanori Bando and H. Jonathan Chao Publisher : INFOCOM, 2010 Presenter : Jo-Ning Yu Date : 2011/02/16.
1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
DRES: Dynamic Range Encoding Scheme for TCAM Coprocessors 2008 YU-ANTL Lab Seminar June 11, 2008 JeongKi Park Advanced Networking Technology Lab. (YU-ANTL)
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.
1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
Overview Parallel Processing Pipelining
A Scalable Routing Architecture for Prefix Tries
Addressing: Router Design
Statistical Optimal Hash-based Longest Prefix Match
Scalable Memory-Less Architecture for String Matching With FPGAs
Duo Liu, Bei Hua, Xianghui Hu, and Xinan Tang
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Presentation transcript:

1 High-performance TCAM- based IP Lookup Engines Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present: 林呈俞 Date: 2008/9/24

2 Outline Introduction Previous works MSMB scheme MSMB-PT scheme MSMB-LPT scheme Goals of this paper Proposed works M-MSMB-LPT scheme MSMB-LPT-I scheme Experimental results

3 Introduction (1/3) To achieve high IP lookup performance, it has been proposed to use TCAMs to implement IP-Lookup accelerators. One TCAM-based routing table is shared by multiple packet streams in one line card or multiple line cards in practice. Previous works on reconfiguring a TCAM into several independent blocks. MSMB MSMB – PT MSMB – LPT

4 Introduction (2/3) MSMB (Multi – Selector and Multi – Block) scheme Proposed in [6] to reconfiguring a TCAM into several independent blocks so that parallel IP lookup is possible. With K TCAMs, instead of performing only one lookup in each cycle, all TCAMs can concurrently be used for different lookups. One would need M parallel RDs for the this system.

5 Introduction (3/3) MSMB – PT (Popular – prefix table) scheme This scheme is based on temporal locality of packet destinations. In order to alleviate the TCAM contention problem caused by traffic bias. Popular-Prefix Table (PT) : caching some of the prefixes recently used by all inputs.

6 MSMB – LPT (Local PT) (1/2) A flow is a stream of packets, for which the packets are transmitted as a bursty sequence. For a given router R, the packets of flows arrive at same input of R exhibit bias of IP streams to a small set of IP prefixes. For any bursty traffic period of an input of R, the bias of IP addresses is called the temporal locality of flows. The major difference between MSMB – LPT and MSMB – PT are as follows MSMB – LPT improve the performance of MSMB – PT by up to 250%(speedup), 80%(hit ratio), 82%(TCAM contention), and 71%(TCAM power consumption). LPT helps to reduce the number of accesses to the TCAM blocks and TCAM contentions. MSMB-PTMSMB-LPT Capture temporal locality global to all input. Capture temporal locality of flow

7 MSMB – LPT (Local PT) (2/2) Local Popular-Prefix Table (LPT) : it used to dynamically store recently referenced IP prefixes requested from input i. Contention Resolver (CR) : chooses one request according to a priority scheme and passes it to TCAM.

8 Goals of this paper How to design a TCAM-based IP lookup engine that improves MSMB-LPT without using more HW resources ? satisfy given performance requirements ? For lage m (inputs) How to design a scalable TCAM-based IP lookup engine ? How to find tradeoffs among cost, performance and reliability ?

9 Proposed work (1/5) Definitions: MSMB – LPT has a configuration with (m, n, k) m input k TCAM blocks LPT of size n Total number of prefixes M (each block contains M/k prefixes). The parameters m and k are carefully selected to achieve optimized cost and performance. Are there better MSMB schemes for given m and k ? Two proposed schemes: M – MSMB – LPT MSMB – LPT – I

10 Proposed work (2/5) Multiple(M) – MSMB – LPT For large m (input), we propose to use w identical copies of MSMB – LPT of configuration (m’, n, k). input i*m ’ + j as the j-th input of the (i+1)-th MSMB-LPT. m ’ = m / w

11 Proposed work (3/5) Multiple(M) – MSMB – LPT The w TCAM clocks TCAM j,u,have the same content as TCAM u in MSMB-LPT, where j = 1 ~ w. We say that an M-MSMB-LPT has configuration (m, n, w, k). if it has w MSMB-LPT s of configuration (m ’, n, k). In an M-MSMB-LPT scheme, w MSMB-LPT s operate completely independently. MSMB - LPT j Input (j-1)*m ’ + 1 Input (j-1)*m ’ + 2 Input j*m ’ k CR s and k TCAM blocks …

12 Proposed work (4/5) MSMB – LPT – Interleaved TCAMs (MSMB – LPT – I) An MSMB – LPT – I of configuration (m, n, w, k) has m input, and the LPT of size n. wk TCAM blocks that are partitioned into k groups, each called TCAM bundle. Input 1 Input 2 Input m The w TCAM blocks in the j-th TCAM bundle contain the same content as that of TCAM j in the MSMB-LPT scheme. k bundles

13 Proposed work (5/5) Process runs concurrently i = 1~ m j = 1~ k n i – th key from input i  The concurrent TCAM – search processes are coordinated by CR, which can be implemented as a round robin m – to – w selector.

14 Experimental results (1/9) We conduct a serious simulations on M-MSMB-LPT and MSMB-LPT-I. First – in – first – out (FIFO) replacement policy is used for LPT update. Round – rodin (RR) arbitration is used for TCAM contention resolution. Two packet traces are used in simulations. 1. generating accroding to routing table described in [17]. 2. derived from actual packet flows given in [19]. The performance of an M-MSMB-LPT is determined by a single component MSMB-LPT. The performance of MSMB-LPT and M-MSMB-LPT can be derived from the performance of MSMB-LPT-I with configurations (m, n, w, k) as follows. (m, n, 1, k) = MSMB-LPT with (m, n, k). (m, n, 1, k) = M-MSMB-LPT with (w*m, n, w, k). Example: MSMB-LPT-I with (6, n, 1, 4) can be used to indicate the performance of M- MSMB-LPT with (12, n, 2, 4) as well as (18, n, 3, 4) # bundles # blocks

15 Experimental results (2/9) Performance metrics TCAM contention ratio Speedup over naïve MSMB TCAM utilization # contentions at TCAM blocks Total # key search time. Total # parallel cycles to complete IP lookup for all packets in a trace. A MSMB-LPT-I(j) : total # cycles in which TCAM j blocks is searched.

16 Experimental results (3/9) Power consumption

17 Experimental results (4/9) Speedup 48 TCAM blocks 16 TCAM blocks

18 Experimental results (5/9) Power consumption

19 Experimental results (6/9) Contention ratio 36 inputs and 4 TCAM blocks in each bundle. Increase the number of TCAM bundles. From 1 to 2 From 4 to (36, n, w, 4)  w = 1, 2, 4, 6

20 Experimental results (7/9) Given the available TCAM resource such as # TCAM bundles – 2 # TCAM blocks in each bundle – 4 It is important to know the expected contention ratio under different inputs. (m, n, 2, 4)  m = 6, 12, 18,

21 Experimental results (8/9) Speedup gain of increasing the TCAM bundle for a given # inputs. (36, n, w, 4)  w = 1, 2, 4,

22 Experimental results (9/9) The speedup changes with the number of inputs. (m, n, 2, 4)  m = 6, 12, 18, 36