Content Addressable Memories Cell Design and Peripheral Circuits
CAM: Introduction CAM vs. RAM Data In Address In Address Out Data Out 1 Data In 1 5 4 3 2 4 Address In 1 5 4 3 2 3 Address Out 1 1 1 Data Out
CAM: Introduction Binary CAM Cell ML pre-charged to VDD Match: ML remains at VDD Mismatch: ML discharges BL1c BL1 WL SL1c SL1 ML BL1c_cell BL1_cell P1 P2 N1 N2 N3 N4 N5 N7 N6 N8
CAM: Introduction Ternary CAM (TCAM) X 1 Input Keyword 1 Input Keyword 1 Input Keyword 1 Input Keyword X 1 5 4 3 2 X 1 5 4 3 2 1 4 Match 1 4 Match 1 1 1 1
CAM: Introduction TCAM Cell Global Masking SLs Local Masking BLs ML BL1c BL2c Comparison Logic TCAM Cell Global Masking SLs Local Masking BLs BL1 BL2 Logic 1 X N.A. BL1 BL2 WL RAM Cell
CAM: Introduction DRAM based TCAM Cell Higher bit density Slower table update Expensive process Refreshing circuitry Scaling issues (Leakage) BL2 BL1 WL SL2 SL1 ML BL2_cell BL1_cell N3 N4 N5 N7 N6 N8
CAM: Introduction SRAM based TCAM Cell Standard CMOS process Fast table update Large area (16T) BL1 BL1c BL2 BL2c WL SL1 SL2 ML BL1c_cell BL2c_cell
CAM: Introduction Block diagram of a 256 x 144 TCAM Search Lines (SLs) CAM Cell (0) BL1c(0) BL2c(0) CAM Cell (143) BL1c(N) BL2c(N) ML0 SL1(143) SL2(143) SL1(0) SL2(0) MLSA MLSO(0) ML255 MLSO(255) SL Drivers Search Lines (SLs) ML Sense Amplifiers Match Lines (MLs)
CAM: Introduction Why low-power TCAMs? Parallel search Very high power (2Mb Sibercore TCAM 66MHz 66Msps 3.4W) IPv6, OC-768 Larger word size, larger no. of entries High power Embedded applications (SoC)
CAM: Introduction Why high-performance TCAMs? OC-768 135M packets/s (7.4 ns/packet) Application complexity Multiple searches IPv6 Larger word size larger search time
CAM: Design Techniques Cell Design: 12T Static TCAM cell* ‘0’ is retained by Leakage (VWL ~ 200 mV) High density Leakage (3 orders) Noise margin Soft-errors (node S) Unsuitable for READ * I. Arsovski, T. Chandler, A. Sheikholeslami, IEEE JSSC, vol. 38, no. 1, pp. 155-158, Jan. 2003
CAM: Design Techniques Cell Design: NAND vs. NOR Type CAM Low Power Charge-sharing Slow CAM Cell (N) CAM Cell (1) CAM Cell (0) SA ML_NAND M BL1 BL1c WL SL1 SL1c VDD NAND-type CAM NOR-type CAM SA CAM Cell (N) CAM Cell (1) CAM Cell (0) ML_NOR MM
CAM: Design Techniques MLSA Design: Conventional Pre-charge ML to VDD Match VML = VDD Mismatch VML = 0 MM VDD PRE MLSO ML
CAM: Design Techniques MLSA Design: Current Race Sensing* RST VDD RSTc ML MLSO MLOFF MATCH MM Dummy ML MLOFF Delay * I. Arsovski, T. Chandler, A. Sheikholeslami, IEEE JSSC, vol. 38, no. 1, pp. 155-158, Jan. 2003
CAM: Design Techniques MLSA Design: Current Race Sensing No need to reset SLs in every clock cycle Lower ML voltage swing (Vth + ∆V) ≈ ½VDD Speed Current Voltage Margin Voltage Margin ML [0] MLSO [0] ML [1]
CAM: Design Techniques MLSA Design: Charge Redistribution* Fast pre-charge ML through MREF Mismatch SP=‘0’ MLSO=‘1’ IML > IREF > leakage ∆VML (VREF – Vth) FAST_PRE High power FAST_PRE RST VREF VDD SP MLSO IREF ML CML CSP MREF * P. Vlasenko, D. Perry, MOSAID Technologies Inc., US Patent 6717876, April 6, 2004
CAM: Design Techniques MLSA Design: Charge Injection* Reset ML and pre-charge CINJ Charge share CINJ and CML Match VML = CINJ x VDD/(CINJ +CML) Mismatch VML = 0 Small ∆VML Poor noise margin Area penalty (CINJ) VDD ML MLSO CML OFFSET SA CHARGE_IN PRE CINJ RST * G. Kasai, Y. Takarabe, K. Furumi, and M. Yoneda, SONY Corp., Proc. IEEE CICC, pp. 387-390, Sep. 2003
CAM: Design Techniques Low Power: Selective Pre-charge* MLs: Two segments If MATCH in pre-search Main-search No. of bits in pre-search Data statistics ML1 ML2 MLSA1 MLSO1 MLSA2 MLSO2 PRE-SEARCH MAIN-SEARCH * C. Zukowski and S. Wang, Proc. IEEE ISCAS, pp. 745-770, Jun. 9-12, 1997
CAM: Design Techniques Low Power: Dual-ML TCAM* MLSA1 is enabled first MLSA2 is enabled if MLSO1 = ‘1’ CAM Cell (0) BL1c(0) BL2c(0) CAM Cell (N) BL1c(N) BL2c(N) ML1 SL1(N) SL2(N) SL1(0) SL2(0) MLSA1 MLSO1 ML2 MLSA2 MLSO2 * N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM: Design Techniques Low Power: Dual-ML TCAM Cap(ML1) = Cap(ML2) = ½ C(ML) Same speed, 50% less energy (Ideally!) Parasitic interconnects degrade both speed and energy Additional ML increases coupling capacitance
CAM: Design Techniques Low Power: Dual-ML TCAM Simulation results (144 bits)* Interconnect cap. = 27 fF W/L = 0.6µm/0.18µm Old New Difference TS (ns) 8.14 8.46 4% E1 (fJ) 769 426 45% E2 (fJ) 973 26% * N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM: Design Techniques Low Power: Dual-ML TCAM* EAVG = PML1 x E1 +(1 – PML1) x E2 SA1 cannot detect Type I For ‘M’ mismatches, PML1 = 1 – (0.5)M SL1 BL1c ML1 Mismatch SL1 SL2 BL1 BL2 Type I 1 Type II * N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM: Design Techniques Low Power: Dual-ML TCAM* * N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM: Design Techniques Low Power: Hierarchical SLs* 144 bits (5 segments: 8, 34, 34, 34, 34) SLs Multiple blocks (64 words each) ∆VGSL 0.45V (VDD=1.8V) Logic complexity Search time/latency 64-bit OR gates * Pagiamtzis et. al., Proc. IEEE CICC, pp. 383-386, Sep. 2003
CAM: Design Techniques Static Power Reduction 16T TCAM: Leakage Paths* WL BL1 BL1c SL1 SL2 BL2 BL2c ML ‘1’ ‘0’ N1 N2 N3 N4 P1 P2 N5 N6 N7 N8 P3 P4 N12 N9 N11 N10 BL1c_cell BL2c_cell * N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004
CAM: Design Techniques Static Power Reduction Technology Scaling1 Dimensions 30% Dynamic power 50% Leakage current 5x Architectural level techniques2, 3 A small portion is enabled S. Borkar, IEEE Micro, pp. 23-29, Jul.-Aug. 1999 K. Pagiamtzis, A. Sheikholeslami, Proc. IEEE CICC, pp. 383-386, Sep. 2003 G. Kasai, Y. Takarabe, K. Furumi, M. Yoneda, Proc. IEEE CICC, pp. 387-390, Sep. 2003
CAM: Design Techniques Static Power Reduction Leakage current* VDD ISUB VDD * R. X. Gu, M. I. Elmasry, IEEE JSSC, vol. 31, no. 5, pp. 707-713, May 1996
CAM: Design Techniques Static Power Reduction Side Effects of VDD Reduction in TCAM Cells Speed: No change Dynamic power: No change Robustness VDD Volt. Margin (Current-race sensing) Voltage Margin ML [0] MLSO [0] ML [1]
CAM: Design Techniques Static Power Reduction Voltage Margin of 144-bit TCAM word in 0.18 µm CMOS* * N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004
CAM: Design Techniques Static Power Reduction Effects of Technology Scaling* Berkeley predictive technology model (BPTM) * N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004