Jason Klaus, Duncan Elliott Confidential Prefix CAM Jason Klaus, Duncan Elliott Confidential
Outline Routing Lookup Tables Ternary CAM Objective Prefix Representation Previous Work Aside: Binary CAM Innovation Generating Enable Signal Drawbacks Simulated Results Summary References 2/24/2019 Confidential
Routing Lookup Tables Routing lookup tables map incoming IP addresses to outgoing ports using address prefix rules Ex: 192.168.*.* to port 1 Ex: 128.5.8.* to port 2 Ex: 128.5.*.* to port 3 More specific (longer) prefixes are given priority over less specific (shorter) prefixes Ex: 128.5.8.7 routed to port 2, not port 3 2/24/2019 Confidential
Routing Lookup Tables (cont.) As Internet link speeds continue to increase, so too does the demand for fast, low power routing lookup tables Most current solutions involve Ternary Content Addressable Memory (TCAM) 2/24/2019 Confidential
Ternary CAM The TCAM stores and searches all prefixes in parallel for the longest match to a given IP address query Each stored prefix has an associated match line Match line charged before/during each query Query IP address is bitwise compared with the prefix Any mismatching bits discharge the match line Longest prefix with match line still high is best match 2/24/2019 Confidential
Objective Design a specialized content addressable memory (CAM) used only for IP address prefixes Called a prefix CAM (PCAM) Reduce the number of transistors required to store and search each IP address prefix without degrading performance or increasing dynamic power consumption Reduces manufacturing costs and static power consumption 2/24/2019 Confidential
Prefix Representation TCAM solves a more general problem, requiring two bits of SRAM for every bit of IP address prefix Each bit is stored as either a 0, 1, or * (either 0 or 1) A 32bit IP address (IPv4) prefix can be uniquely and optimally represented using only 33 bits (n+1 bits) Store prefix bits followed by a 1, and 0 pad to 33 bits Ex: 128.3.5.* becomes (in binary) 10000000.00000011.00000101.10000000.0 2/24/2019 Confidential
Previous Work Akhbarizadeh, Nourani, Vijayasarathi and Balsara use a similar 33bit representation for their PCAM design Used logic equation optimization techniques to achieve a low transistor count 396 transistors per prefix, or 12.375 per bit Standard TCAM designs require 16 transistors per bit 2/24/2019 Confidential
Previous Work (cont.) Unfortunately there are some drawbacks to this PCAM design compared to TCAM Match line is loaded with far more transistors Bit comparisons sometimes drive more than one transistor, up to three in some cases As a result this PCAM suffers degraded performance and increased dynamic power consumption versus a comparable TCAM 2/24/2019 Confidential
Aside: Binary CAM A binary CAM (BCAM) is similar to a TCAM except it does not support wildcards in the mask The query either entirely matches or it doesn’t Each cell compares a single bit, discharging the match line if they differ This simplified cell requires only 9 transistors per bit as opposed to 16 transistors per TCAM cell 2/24/2019 Confidential
Innovation Add a single transistor to a BCAM cell which acts as an enable If high, mismatches discharge match line If low, mismatches do not discharge match line This transistor creates minimal additional match line loading to preserve performance Enable does not change based on search data, reducing dynamic power consumption 2/24/2019 Confidential
Innovation (cont.) Store prefixes as previously mentioned, with significant bits followed by a 1 and 0 padded Scanning from end to start, first 1 encountered indicates all remaining bits must be matched If a cell stores a 1 then it should enable all cells before it for matching If a cell stores a 0 then it should pass on the enable state it received 2/24/2019 Confidential
Basic Binary CAM Cell Search Line Bit Line Bit Line Search Line Word Line Data Data Match Line 2/24/2019 Confidential
CAM With Enable Search Line Bit Line Bit Line Search Line Word Line Data Data Match Line Enable 2/24/2019 Confidential
CAM With Cascaded Enable Search Line Bit Line Bit Line Search Line Word Line Data Data Match Line Enable Logical OR Better implementation to come 2/24/2019 Confidential
CAM Cell Cascaded 2/24/2019 Confidential
Generating Enable Signal Use a pull-up transistor combined with a transmission gate This requires 3 additional transistors per cell Total of ~13 transistors per prefix bit, or more precisely 419 transistors per prefix enable_in enable_out data Not exactly 13 transistors per cell, since the enable_out of the first cell is not needed, and the enable_in for the last cell is stored in an SRAM cell. 2/24/2019 Confidential
Drawbacks Enable signals take time to ripple through cells from back to front, especially since transmission gates become exponentially slower when chained PCAM requires a delay before searches after one or more consecutive writes Typical routing applications require 1 update per 100k searches Could replace some transmission gates with full 6 transistor implementation to break chain 2/24/2019 Confidential
Simulated Results Schematic simulation of a typical TCAM, the previous PCAM, and this proposed PCAM was performed in 90nm ST digital process Typical power consumption was measured as the average power consumed by a single row of cells storing and searching a selection of prefixes and addresses Design Worst Case Mismatch Time Typical Power Consumption TCAM 264 ps 114 μW Akhbarizadeh 507 ps 67 μW Proposed PCAM 345 ps 56 μW 2/24/2019 Confidential
Summary Compared to TCAM designs, the proposed PCAM design requires fewer transistors and reduces total power consumption while maintaining comparable performance for IP address prefix matching Only previous PCAM design requires slightly fewer transistors but degrades performance significantly and consumes more power 2/24/2019 Confidential
Extensions Faster than TCAM version, +3 transistors Beyond 32 bits Match line provides Wired-AND for any heterogeneous mix of CAM cells connected Source and Destination addresses in separate clusters Flags for QOS, etc., as pure TCAM 2/24/2019 Confidential
References PCAM: A Ternary CAM Optimized for Longest Prefix Matching Tasks, Akhbarizadeh, M.J.; Nourani, M.; Vijayasarathi, D.S.; Balsara, P.T. , Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference on,11-13 Oct. 2004, Pages: 6- 11 2/24/2019 Confidential