Jason Klaus, Duncan Elliott Confidential

Slides:



Advertisements
Similar presentations
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
Advertisements

August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta
Fast Updating Algorithms for TCAMs Devavrat Shah Pankaj Gupta IEEE MICRO, Jan.-Feb
Functions and Functional Blocks
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
An On-Chip IP Address Lookup Algorithm Author: Xuehong Sun and Yiqiang Q. Zhao Publisher: IEEE TRANSACTIONS ON COMPUTERS, 2005 Presenter: Yu Hao, Tseng.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Low Power TCAM Forwarding Engine for IP Packets Authors: Alireza Mahini, Reza Berangi, Seyedeh Fatemeh and Hamidreza Mahini Presenter: Yi-Sheng, Lin (
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Parallel-Search Trie-based Scheme for Fast IP Lookup
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.
EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;
Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.
Chapter 9 Classification And Forwarding. Outline.
IP Addressing & Subnetting Made Easy. Part 1: Working with IP Addresses.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Computing Hardware Starter.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Memory and Programmable Logic Dr. Ashraf Armoush © 2010 Dr. Ashraf Armoush.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
IP Forwarding.
A Low-Power CAM Design for LZ Data Compression Kun-Jin Lin and Cheng-Wen Wu, IEEE Trans. On computers, Vol. 49, No. 10, Oct Presenter: Ming-Hsien.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Packet Classifiers In Ternary CAMs Can Be Smaller Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison) Jia Wang.
Applied Research Laboratory Edward W. Spitznagel 24 October Packet Classification using Extended TCAMs Edward W. Spitznagel, Jonathan S. Turner,
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
ENG241 Digital Design Week #8 Registers and Counters.
1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.
Routing Prefix Caching in Network Processor Design Huan Liu Department of Electrical Engineering Stanford University
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
Class 09 Content Addressable Memories Cell Design and Peripheral Circuits.
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung.
PARALLEL-SEARCH TRIE- BASED SCHEME FOR FAST IP LOOKUP Author: Roberto Rojas-Cessa, Lakshmi Ramesh, Ziqian Dong, Lin Cai Nirwan Ansari Publisher: IEEE GLOBECOM.
A Low-Power Precomputation-Based Parallel CAM Chi-Sheng Lin, Jui-Chang, Bin-Da Liu IEEE2003.
A Dynamic Longest Prefix Matching Content Addressable Memory for IP Routing Author: Satendra Kumar Maurya, Lawrence T. Clark Publisher: IEEE TRANSACTIONS.
COMP203/NWEN Memory Technologies 0 Plan for Memory Technologies Topic Static RAM (SRAM) Dynamic RAM (DRAM) Memory Hierarchy DRAM Accelerating Techniques.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
HIGH-PERFORMANCE LONGEST PREFIX MATCH LOGIC SUPPORTING FAST UPDATES FOR IP FORWARDING DEVICES Author: Arun Kumar S P Publisher/Conf.: 2009 IEEE International.
Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.
Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
Behrouz A. Forouzan TCP/IP Protocol Suite, 3rd Ed.
COMP541 Memories II: DRAMs
REGISTER TRANSFER LANGUAGE (RTL)
Registers and Counters
Instructor Materials Chapter 5: Ethernet
Micro-programmed Control
IP Routers – internal view
Chapter 3: Dynamic Routing
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)
A Small and Fast IP Forwarding Table Using Hashing
ECE 352 Digital System Fundamentals
EMOMA- Exact Match in One Memory Access
Registers Today we’ll see some common sequential devices: counters and registers. They’re good examples of sequential analysis and design. They are also.
CprE / ComS 583 Reconfigurable Computing
Authors: A. Rasmussen, A. Kragelund, M. Berger, H. Wessing, S. Ruepp
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
Presentation transcript:

Jason Klaus, Duncan Elliott Confidential Prefix CAM Jason Klaus, Duncan Elliott Confidential

Outline Routing Lookup Tables Ternary CAM Objective Prefix Representation Previous Work Aside: Binary CAM Innovation Generating Enable Signal Drawbacks Simulated Results Summary References 2/24/2019 Confidential

Routing Lookup Tables Routing lookup tables map incoming IP addresses to outgoing ports using address prefix rules Ex: 192.168.*.* to port 1 Ex: 128.5.8.* to port 2 Ex: 128.5.*.* to port 3 More specific (longer) prefixes are given priority over less specific (shorter) prefixes Ex: 128.5.8.7 routed to port 2, not port 3 2/24/2019 Confidential

Routing Lookup Tables (cont.) As Internet link speeds continue to increase, so too does the demand for fast, low power routing lookup tables Most current solutions involve Ternary Content Addressable Memory (TCAM) 2/24/2019 Confidential

Ternary CAM The TCAM stores and searches all prefixes in parallel for the longest match to a given IP address query Each stored prefix has an associated match line Match line charged before/during each query Query IP address is bitwise compared with the prefix Any mismatching bits discharge the match line Longest prefix with match line still high is best match 2/24/2019 Confidential

Objective Design a specialized content addressable memory (CAM) used only for IP address prefixes Called a prefix CAM (PCAM) Reduce the number of transistors required to store and search each IP address prefix without degrading performance or increasing dynamic power consumption Reduces manufacturing costs and static power consumption 2/24/2019 Confidential

Prefix Representation TCAM solves a more general problem, requiring two bits of SRAM for every bit of IP address prefix Each bit is stored as either a 0, 1, or * (either 0 or 1) A 32bit IP address (IPv4) prefix can be uniquely and optimally represented using only 33 bits (n+1 bits) Store prefix bits followed by a 1, and 0 pad to 33 bits Ex: 128.3.5.* becomes (in binary) 10000000.00000011.00000101.10000000.0 2/24/2019 Confidential

Previous Work Akhbarizadeh, Nourani, Vijayasarathi and Balsara use a similar 33bit representation for their PCAM design Used logic equation optimization techniques to achieve a low transistor count 396 transistors per prefix, or 12.375 per bit Standard TCAM designs require 16 transistors per bit 2/24/2019 Confidential

Previous Work (cont.) Unfortunately there are some drawbacks to this PCAM design compared to TCAM Match line is loaded with far more transistors Bit comparisons sometimes drive more than one transistor, up to three in some cases As a result this PCAM suffers degraded performance and increased dynamic power consumption versus a comparable TCAM 2/24/2019 Confidential

Aside: Binary CAM A binary CAM (BCAM) is similar to a TCAM except it does not support wildcards in the mask The query either entirely matches or it doesn’t Each cell compares a single bit, discharging the match line if they differ This simplified cell requires only 9 transistors per bit as opposed to 16 transistors per TCAM cell 2/24/2019 Confidential

Innovation Add a single transistor to a BCAM cell which acts as an enable If high, mismatches discharge match line If low, mismatches do not discharge match line This transistor creates minimal additional match line loading to preserve performance Enable does not change based on search data, reducing dynamic power consumption 2/24/2019 Confidential

Innovation (cont.) Store prefixes as previously mentioned, with significant bits followed by a 1 and 0 padded Scanning from end to start, first 1 encountered indicates all remaining bits must be matched If a cell stores a 1 then it should enable all cells before it for matching If a cell stores a 0 then it should pass on the enable state it received 2/24/2019 Confidential

Basic Binary CAM Cell Search Line Bit Line Bit Line Search Line Word Line Data Data Match Line 2/24/2019 Confidential

CAM With Enable Search Line Bit Line Bit Line Search Line Word Line Data Data Match Line Enable 2/24/2019 Confidential

CAM With Cascaded Enable Search Line Bit Line Bit Line Search Line Word Line Data Data Match Line Enable Logical OR Better implementation to come 2/24/2019 Confidential

CAM Cell Cascaded 2/24/2019 Confidential

Generating Enable Signal Use a pull-up transistor combined with a transmission gate This requires 3 additional transistors per cell Total of ~13 transistors per prefix bit, or more precisely 419 transistors per prefix enable_in enable_out data Not exactly 13 transistors per cell, since the enable_out of the first cell is not needed, and the enable_in for the last cell is stored in an SRAM cell. 2/24/2019 Confidential

Drawbacks Enable signals take time to ripple through cells from back to front, especially since transmission gates become exponentially slower when chained PCAM requires a delay before searches after one or more consecutive writes Typical routing applications require 1 update per 100k searches Could replace some transmission gates with full 6 transistor implementation to break chain 2/24/2019 Confidential

Simulated Results Schematic simulation of a typical TCAM, the previous PCAM, and this proposed PCAM was performed in 90nm ST digital process Typical power consumption was measured as the average power consumed by a single row of cells storing and searching a selection of prefixes and addresses Design Worst Case Mismatch Time Typical Power Consumption TCAM 264 ps 114 μW Akhbarizadeh 507 ps 67 μW Proposed PCAM 345 ps 56 μW 2/24/2019 Confidential

Summary Compared to TCAM designs, the proposed PCAM design requires fewer transistors and reduces total power consumption while maintaining comparable performance for IP address prefix matching Only previous PCAM design requires slightly fewer transistors but degrades performance significantly and consumes more power 2/24/2019 Confidential

Extensions Faster than TCAM version, +3 transistors Beyond 32 bits Match line provides Wired-AND for any heterogeneous mix of CAM cells connected Source and Destination addresses in separate clusters Flags for QOS, etc., as pure TCAM 2/24/2019 Confidential

References PCAM: A Ternary CAM Optimized for Longest Prefix Matching Tasks, Akhbarizadeh, M.J.; Nourani, M.; Vijayasarathi, D.S.; Balsara, P.T. , Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference on,11-13 Oct. 2004, Pages: 6- 11 2/24/2019 Confidential