EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;

Slides:

Advertisements

Similar presentations

Router Internals CS 4251: Computer Networking II Nick Feamster Spring 2008.

Advertisements

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.

August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta

Fast Updating Algorithms for TCAMs Devavrat Shah Pankaj Gupta IEEE MICRO, Jan.-Feb

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.

Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.

1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.

M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.

IP Routing Lookups Scalable High Speed IP Routing Lookups.

Low Power TCAM Forwarding Engine for IP Packets Authors: Alireza Mahini, Reza Berangi, Seyedeh Fatemeh and Hamidreza Mahini Presenter: Yi-Sheng, Lin (

IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.

1 MIPS Extension for a TCAM Based Parallel Architecture for Fast IP Lookup Author: Oğuzhan ERDEM Cüneyt F. BAZLAMAÇCI Publisher: ISCIS 2009 Presenter:

1 Searching Very Large Routing Tables in Wide Embedded Memory Author: Jan van Lunteren Publisher: GLOBECOM 2001 Presenter: Han-Chen Chen Date: 2010/01/06.

Efficient Multi-match Packet Classification with TCAM Fang Yu Randy H. Katz EECS Department, UC Berkeley {fyu,

Fast Filter Updates for Packet Classification using TCAM Authors: Haoyu Song, Jonathan Turner. Publisher: GLOBECOM 2006, IEEE Present: Chen-Yu Lin Date:

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:

An Efficient Hardware-based Multi-hash Scheme for High Speed IP Lookup Department of Computer Science and Information Engineering National Cheng Kung University,

1 Partition Filter Set for Power- Efficient Packet Classification Authors: Haibin Lu, MianPan Publisher: IEEE GLOBECOM 2006 Present: Chen-Yu Lin Date:

Parallel-Search Trie-based Scheme for Fast IP Lookup

Efficient Multi-Match Packet Classification with TCAM Fang Yu

1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.

An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.

Algorithms for Advanced Packet Classification with TCAMs Karthik Lakshminarayanan UC Berkeley Joint work with Anand Rangarajan and Srinivasan Venkatachary.

Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.

UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

CoPTUA: Consistent Policy Table Update Algorithm for TCAM without Locking Zhijun Wang, Hao Che, Mohan Kumar, Senior Member, IEEE, and Sajal K. Das.

LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,

Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)

Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:

Packet Classifiers In Ternary CAMs Can Be Smaller Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison) Jia Wang.

Applied Research Laboratory Edward W. Spitznagel 24 October Packet Classification using Extended TCAMs Edward W. Spitznagel, Jonathan S. Turner,

Balajee Vamanan and T. N. Vijaykumar School of Electrical & Computer Engineering CoNEXT 2011.

1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.

Routing Prefix Caching in Network Processor Design Huan Liu Department of Electrical Engineering Stanford University

1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:

Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.

PARALLEL-SEARCH TRIE- BASED SCHEME FOR FAST IP LOOKUP Author: Roberto Rojas-Cessa, Lakshmi Ramesh, Ziqian Dong, Lin Cai Nirwan Ansari Publisher: IEEE GLOBECOM.

Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University

IPv6-Oriented 4 OC768 Packet Classification with Deriving-Merging Partition and Field- Variable Encoding Scheme Mr. Xin Zhang Undergrad. in Tsinghua University,

A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.

TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.

Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Author: Hoang Le, Weirong Jiang and Viktor K. Prasanna Publisher: IEEE.

CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

HIGH-PERFORMANCE LONGEST PREFIX MATCH LOGIC SUPPORTING FAST UPDATES FOR IP FORWARDING DEVICES Author: Arun Kumar S P Publisher/Conf.: 2009 IEEE International.

Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.

On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.

IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.

1 Space-Efficient TCAM-based Classification Using Gray Coding Authors: Anat Bremler-Barr and Danny Hendler Publisher: IEEE INFOCOM 2007 Present: Chen-Yu.

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

Author : Masanori Bando and H. Jonathan Chao Publisher : INFOCOM, 2010 Presenter : Jo-Ning Yu Date : 2011/02/16.

Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.

IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.

IP Routers – internal view

Addressing: Router Design

Temporal Indexing MVBT.

Packet Classification Using Coarse-Grained Tuple Spaces

A Small and Fast IP Forwarding Table Using Hashing

Jason Klaus, Duncan Elliott Confidential

Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu

Authors: A. Rasmussen, A. Kragelund, M. Berger, H. Wessing, S. Ruepp

Worst-Case TCAM Rule Expansion

Packet Classification Using Binary Content Addressable Memory

Presentation transcript:

EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;

Overview  Introduction  Research Goal  Proposed approach  Results  Conclusion & Future work

Introduction IP LookupPacket Queue DRAMRouting Table Header Processing HdrData Hdr IP Address Next Hop

Introduction  HW and SW solutions for IP lookup  Software solutions unable to match link speed.  Hardware solutions can accommodate today’s link speeds  TCAMs most popular hardware device  Consume up to 15 W/chip, (4-8 chips)  Increased cooling costs and fewer ports

Current Approach  Power Reduction in TCAM  Partitioning of TCAM Array [Infocom’03, Hot Interconnect’02]  Compaction (minimization) [Micro’02]  Update techniques [Micro’02]  Routing update  TCAM updates

Bottleneck with existing approaches  Power reduction  Number of entries enabled is not bounded  Does not avoid storing redundant information  Update  Minimization techniques are not incremental  Update time is not independent of routing table size

Motivation  Solution for bounded and reduced power consumption  Truly incremental Routing and TCAM update

Contributions Contributions  A pipelined architecture for IP Lookup  New prefix properties (prefix aggregation and prefix expansion)  Upper bound on number of entries enabled (256 x 3)  Novel Page filling, memory management and incremental update techniques

Solution: Prefix properties  Prefix Aggregation / / / / / /24 is the LCS for the given set of prefixes (rounded to nearest octet) /24 is the LCS for the given set of prefixes (rounded to nearest octet) Prefixes aggregated based on LCS mostly have the same next hop Prefixes aggregated based on LCS mostly have the same next hop Gives a bound on the number of prefixes minimized (256) Gives a bound on the number of prefixes minimized (256)

Solution: Prefix properties Router Total Prefixes Max Prefix Compaction Prefix Aggregation based Compaction ATT- Canada BBN- planet TABLE I. Comparision of prefix compaction using prefix aggregation property and Espresso II for attcanada and bbnplanet router

Solution: Prefix Properties  Prefix expansion Prefixes having same length can be minimized Prefixes having same length can be minimized To increase minimization, extend prefixes of different length to nearest octet by adding don’t-cares To increase minimization, extend prefixes of different length to nearest octet by adding don’t-cares Extending to nearest octet useful for incremental update Extending to nearest octet useful for incremental update XX X X 1011XXXX

Solution: Prefix properties  Overlapping prefixes Prefix length < 8 not present in routing table Prefix length < 8 not present in routing table Number of matching prefixes for IP address is ≤ 25 Number of matching prefixes for IP address is ≤ 25 Property is used to selectively enable bounded number of entries in TCAM, (256 x 3) Property is used to selectively enable bounded number of entries in TCAM, (256 x 3)

Solution: Architecture  2 level architecture, w1 bits in 1 st level and 32- w1 in 2 nd level  Segment size corresponding to 1 st w(8) bits is variable  Power bounded by segment size 2 nd Level bits 128.x 1.x 2.x Variable Sized Segment W 1 =8 bits 127.x 254.x 255.x 1 st Level Segmented Architecture for routing lookup using TCAM.

Solution: Architecture  Memory Compaction Apply prefix properties to remove redundancies Apply prefix properties to remove redundancies Apply pruning, prefix aggregation and minimization in succession Apply pruning, prefix aggregation and minimization in succession Put all prefixes < w1 into bucket (Rarely occurring prefixes) Put all prefixes < w1 into bucket (Rarely occurring prefixes) Total number of entries after compaction

Solution: Architecture  Paged TCAM architecture Group the prefixes of length > w1 based on their LCS Group the prefixes of length > w1 based on their LCS The LCS values (cubes) that cover the prefixes The LCS values (cubes) that cover the prefixes The cubes now correspond to the page id The cubes now correspond to the page id Prefixes covered by cube are stored in actual pages Prefixes covered by cube are stored in actual pages (Pages formed using LCS as page-id can result in under-utilization)

Architecture Block Diagram Page Table 1 Page Table I Page Table N Page I+1 Page I I+C max Bucket Comparator (32-w 1 )bits 32 bits 32 bits 32 bits (N*  ) IP address IP address IP address Enable Line  Pages formed using LCS as page-id can result in under-utilization)

How to avoid Under-utilization?  LCS aggregation 1. Aggregate prefixes having different LCS by modifying the cube 2. Set page-size to optimal value – avoid too large and small pages Observe: The maximum size of page can be 256, based on the above property *

Solution: Page Filling Algorithm  Page Filling Heuristics (2) 1. Generates cubes such that it covers maximum prefixes and page size < Aggregate the page ID’s in the page tables and store them in comparators for a 0 th level lookup 3. Find the total memory consumed (pages, page tables and comparator) for different values of w1 4. Get optimal value of w1 and page size β for which total memory is the least

Solution: Page Filling  Page filling heuristics ensures: No page has more than β*γ entries, where γ is the page fill-factor No page has more than β*γ entries, where γ is the page fill-factor Number of cubes that cover all the prefixes are minimum Number of cubes that cover all the prefixes are minimum Total memory consumption is the least for a specific value of w1 and β Total memory consumption is the least for a specific value of w1 and β

Architecture Block Diagram Power Enabled blocks in EaseCAM Page Table 1 Page Table I Page Table N Page I+1 Page I I+C max Bucket Comparator (32-w 1 )bits 32 bits 32 bits 32 bits (N*  ) IP address IP address IP address Enable Line 

Solution: Architecture  Bucket Prefixes of size < w1 are stored in bucket Prefixes of size < w1 are stored in bucket Word length of bucket is 32 Word length of bucket is 32 Either bucket or pages are searched during each lookup in the 2 nd level Either bucket or pages are searched during each lookup in the 2 nd level

Solution: Architecture  Empirical model for memory  α: fraction of total entries in the bucket  αf : bucket fill factor  γ: page fill factor  Cmax: number of page ids in the page table  N: the number of entries  Pagemax: total number of pages  βw1: represents the optimal page size  Mimimum memory requirement  = βw1* Pagemax * (32-w1)/32 + Pagemax + Pagemax/Cmax + N*α/ αf

Incremental Updates  100s updates/sec and 10 updates/sec after routing flaps  Insertion If length of prefix > w1, If length of prefix > w1, 1.Minimize the prefix and find the new cube 2.Number of prefixes minimized < Update the page table and comparator if required 4.Update the TCAM with changed entries 5.TCAM insertion time and minimization time is time bounded

Solution: Incremental Update  Deletion Delete the prefix from TCAM Delete the prefix from TCAM Update the page table entry and comparator if required Update the page table entry and comparator if required Total number of prefixes minimized < 256 Total number of prefixes minimized < 256 TCAM update time is also bounded TCAM update time is also bounded

Solution: Incremental Update Router Total Prefixes Micro’02 Approach Proposed Approach Size Time (sec) SizeTime(sec) attcanada bbnplanet Comparision of incremental update time

Solution: Memory Management  Managing page overflow  Reason: Lower value of γ. Pages with same cube are recomputed Pages with same cube are recomputed Free pages available in TCAM are used Free pages available in TCAM are used Comparators are also updated when required Comparators are also updated when required

Results  Power consumption per lookup bbnplanet router attcanada router

Results Results  Case study Memory requirements (γ=1 and α=1) Memory requirements (γ=1 and α=1) Router Raw data (entries) After Compaction (entries) Effect of Architecture (entries) Attcanada Bbnplanet Reduction in memory requirements

Results: Access time  Pre-estimation using Cacti 3.0 on CAM structure Router Raw data (ns) (ns) After Compaction (ns) Effect of Architecture (ns) Attcanada Bbnplanet Reduction in access time

Results: Power  Pre-estimation using Cacti 3.0 on CAM structure Reduction in power Router Raw data (W) After Compaction (W) Effect of Architecture (W) Attcanada14.35W7.38W0.135W Bbnplanet15.9W12.31W0.12W

Conclusion  Significant reduction in memory consumption based on prefix compaction  Pipelined architecture to store prefixes to achieve bounded power consumption  Efficient memory management and incremental update techniques

Future work  Apply Cacti model to TCAM structure  Identify/design low-power TCAM cell  Consider classification together with IP- lookup  Fast on-chip logic minimization  Explore parallel architectures & algorithms for IP processing.

Thank You !! Thank You !!Questions?