Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class 09 Content Addressable Memories Cell Design and Peripheral Circuits.

Similar presentations


Presentation on theme: "Class 09 Content Addressable Memories Cell Design and Peripheral Circuits."— Presentation transcript:

1 Class 09 Content Addressable Memories Cell Design and Peripheral Circuits

2 Semiconductor Memory Classification FIFO: First-in-first-out LIFO: Last-in-first-out (stack) CAM: Content addressable memory

3 Memory Architecture: Decoders pitch matched line too long

4 2D Memory Architecture A0A0 Row Decoder A1A1 A j-1 Sense Amplifiers bit line word line storage (RAM) cell Row Address Column Address AjAj A j+1 A k-1 Read/Write Circuits Column Decoder 2 k-j m2 j Input/Output (m bits) amplifies bit line swing selects appropriate word from memory row

5 3D Memory Architecture Row Addr Column Addr Block Addr Input/Output (m bits) Advantages: 1. Shorter word and/or bit lines 2. Block addr activates only 1 block saving power

6 Hierarchical Memory Architecture Global Data Bus Row Address Column Address Block Address Block SelectorGlobal Amplifier/Driver I/O Control Circuitry  Advantages: l shorter wires within blocks l block address activates only 1 block: power management

7 Read-Write Memories (RAM)  Static (SRAM) l Data stored as long as supply is applied l Large (6 transistors per cell) l Fast l Differential signal (more reliable)  Dynamic (DRAM) l Periodic refresh required l Small (1-3 transistors per cell) but slower l Single ended (unless using dummy cell to generate differential signals)

8 Associative Memory

9 What is CAM? Content Addressable Memory is a special kind of memory! Read operation in traditional memory:  Input is address location of the content that we are interested in it.  Output is the content of that address. In CAM it is the reverse:  Input is associated with something stored in the memory.  Output is location where the associated content is stored.

10 Type of CAMs Binary CAM (BCAM) only stores 0s and 1s – Applications: MAC table consultation. Layer 2 security related VPN segregation. Ternary CAM (TCAM) stores 0s, 1s and don’t cares. – Application: when we need wilds cards such as, layer 3 and 4 classification for QoS and CoS purposes. IP routing (longest prefix matching). Available sizes: 1Mb, 2Mb, 4.7Mb, 9.4Mb, and 18.8Mb. CAM entries are structured as multiples of 36 bits rather than 32 bits.

11 CAM: Introduction CAM vs. RAM 001101115 100011014 101111013 110010112 000011011 010101010 10001101 Data Out 4 Address In 110001115 000111014 100011013 110010112 000011011 010101010 10001101 Data In 3 Address Out 1000110110001101

12 Memory Hierarchy The overall goal of using a memory hierarchy is to obtain the highest-possible average access speed while minimizing the total cost of the entire memory system. Microprogramming: refers to the existence of many programs in different parts of main memory at the same time.

13 Main memory

14 ROM Chip

15 Memory Address Map Memory Configuration (case study): Required: 512 bytes ROM + 512 bytes RAM Available: 512 byte ROM + 128 bytes RAM The designer of a computer system must calculate the amount of memory required for the particular application and assign it to either RAM or ROM. The interconnection between memory and processor is then established from knowledge of the size of memory needed and the type of RAM and ROM chips available. The addressing of memory can be established by means of a table that specifies the memory address assigned to each chip. The table, called a memory address map, is a pictorial representation of assigned address space for each chip in the system.

16 Memory Address Map

17 Associative Memory The time required to find an item stored in memory can be reduced considerably if stored data can be identified for access by the content of the data itself rather than by an address. A memory unit access by content is called an associative memory or Content Addressable Memory (CAM). This type of memory is accessed simultaneously and in parallel on the basis of data content rather than specific address or location. When a word is written in an associative memory, no address is given. The memory is capable of finding an empty unused location to store the word. When a word is to be read from an associative memory, the content of the word or part of the word is specified. The associative memory is uniquely suited to do parallel searches by data association. Moreover, searches can be done on an entire word or on a specific field within a word. Associative memories are used in applications where the search time is very critical and must be very short.

18 Hardware Organization Argument register (A) Key register (K) Associative memory array and logic m words n bits per word M Match register Input Write Read Output

19 Associative memory of an m word, n cells per word A 1 C 11 A n A j K 1 K n K j C 1j C 1n C i1 C ij C in C m1 C mj C mn M 1 M m M i Bit 1Bit nBit j Word 1 Word m Word i

20 One Cell of Associative Memory R S Match logic Input Read Write Output To M i K j A i F ij

21 Match Logic cct.

22 CAM: Introduction Binary CAM Cell BL1c BL1 WL SL1cSL1 ML BL1c_cellBL1_cell P1P2 N1N2 N3 N4 N5N7 N6N8

23 CAM: Introduction Ternary CAM (TCAM) 00X001115 010011014 000111013 110010X12 101011011 010X01010 XXX01101 Input Keyword XXXXX1115 XXXX11014 XXX111013 XX0010112 X00011011 010101010011010110111010001101 1 4 Match 1 4 10001101 Input Keyword

24 CAM: Introduction TCAM Cell – Global Masking  SLs – Local Masking  BLs BL1BL2Logic 010 101 11X 00N.A. BL1BL2WL RAM Cell SL1SL2 ML BL1cBL2c Comparison Logic

25 CAM: Introduction DRAM based TCAM Cell  Higher bit density  Slower table update  Expensive process  Refreshing circuitry  Scaling issues (Leakage) BL2 BL1 WL SL2SL1 ML BL2_cellBL1_cell N3 N4 N5N7 N6N8

26 CAM: Introduction SRAM based TCAM Cell  Standard CMOS process  Fast table update  Large area (16T) BL1BL1c BL2 BL2c WL SL1SL2 ML BL1c_cellBL2c_cell

27 CAM: Introduction Block diagram of a 256 x 144 TCAM

28 CAM: Introduction Why low-power TCAMs? – Parallel search  Very high power – Larger word size, larger no. of entries  High power – Embedded applications (SoC)

29 CAM: Design Techniques Cell Design: 12T Static TCAM cell* – ‘0’ is retained by Leakage (V WL ~ 200 mV)  High density  Leakage  (3 orders)  Noise margin  Soft-errors (node S)  Unsuitable for READ

30 CAM: Design Techniques Cell Design: NAND vs. NOR Type CAM  Low Power  Charge-sharing  Slow CAM Cell (N) CAM Cell (1) CAM Cell (0) SA ML_NANDM SA CAM Cell (N) CAM Cell (1) CAM Cell (0) ML_NORMM BL1BL1c WL SL1 SL1c V DD BL1BL1c WL SL1c SL1 V DD NAND-type CAM NOR-type CAM

31 CAM: Design Techniques MLSA Design: Conventional – Pre-charge ML to V DD – Match  V ML = V DD – Mismatch  V ML = 0 MM V DD PRE MLSO V DD ML

32 CAM: Design Techniques Low Power: Dual-ML TCAM – Same speed, 50% less energy (Ideally!) – Parasitic interconnects degrade both speed and energy – Additional ML increases coupling capacitance

33 CAM: Design Techniques Static Power Reduction – 16T TCAM: Leakage Paths* WL BL1BL1c SL1SL2 BL2 BL2c ML ‘1’‘1’ ‘0’ ‘1’‘1’ N1N1N2 N3N4 P1P2 N5N6 N7N8 P3P4 N12 N9N11 N10 ‘0’ ‘1’ BL1c_cellBL2c_cell * N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004

34 CAM: Design Techniques Static Power Reduction – Side Effects of V DD Reduction in TCAM Cells  Speed: No change  Dynamic power: No change  Robustness  – V DD   Volt. Margin  (Current-race sensing) Voltage Margin ML [0] MLSO [0] ML [1]

35 CAM for Routing Table Implementation CAM can be used as a search engine. We want to find matching contents in a database or Table. Example Routing Table Source: http://pagiamtzis.com/cam/camintro.html

36 Simplified CAM Block Diagram  The input to the system is the search word.  The search word is broadcast on the search lines.  Match line indicates if there were a match btw. the search and stored word.  Encoder specifies the match location.  If multiple matches, a priority encoder selects the first match.  Hit signal specifies if there is no match.  The length of the search word is long ranging from 36 to 144 bits.  Table size ranges: a few hundred to 32K.  Address space : 7 to 15 bits. Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. of Solid-state circuits. March 2006

37 CAM Memory Size Largest available around 18 Mbit (single chip). Rule of thumb: Largest CAM chip is about half the largest available SRAM chip.  A typical CAM cell consists of two SRAM cells. Exponential growth rate on the size Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. of Solid-state circuits. March 2006

38 CAM Basics The search-data word is loaded into the search-data register. All match-lines are pre-charged to high (temporary match state). Search line drivers broadcast the search word onto the differential search lines. Each CAM core compares its stored bit against the bit on the corresponding search-lines. Match words that have at least one missing bit, discharge to ground. Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. of Solid-state circuits. March 2006

39 CAM Advantages They associate the input (comparand) with their memory contents in one clock cycle. They are configurable in multiple formats of width and depth of search data that allows searches to be conducted in parallel. CAM can be cascaded to increase the size of lookup tables that they can store. We can add new entries into their table to learn what they don’t know before. They are one of the appropriate solutions for higher speeds.

40 CAM Disadvantages They cost several hundred of dollars per CAM even in large quantities. They occupy a relatively large footprint on a card. They consume excessive power. Generic system engineering problems: – Interface with network processor. – Simultaneous table update and looking up requests.

41 CAM structure The comparand bus is 72 bytes wide bidirectional. The result bus is output. Command bus enables instructions to be loaded to the CAM. It has 8 configurable banks of memory. The NPU issues a command to the CAM. CAM then performs exact match or uses wildcard characters to extract relevant information. There are two sets of mask registers inside the CAM.

42 CAM structure  There is global mask registers which can remove specific bits and a mask register that is present in each location of memory.  The search result can be  one output (highest priority)  Burst of successive results.  The output port is 24 bytes wide.  Flag and control signals specify status of the banks of the memory.  They also enable us to cascade multiple chips.

43 CAM Features CAM Cascading: – We can cascade up to 8 pieces without incurring performance penalty in search time (72 bits x 512K). – We can cascade up to 32 pieces with performance degradation (72 bits x 2M). Terminology: – Initializing the CAM: writing the table into the memory. – Learning: updating specific table entries. – Writing search key to the CAM: search operation Handling wider keys: – Most CAM support 72 bit keys. – They can support wider keys in native hardware. Shorter keys: can be handled at the system level more efficiently.

44 CAM Latency Clock rate is between 66 to 133 MHz. The clock speed determines maximum search capacity. Factors affecting the search performance: – Key size – Table size For the system designer the total latency to retrieve data from the SRAM connected to the CAM is important. By using pipeline and multi-thread techniques for resource allocation we can ease the CAM speed requirements. Source: IDT

45 Management of Tables Inside a CAM It is important to squeeze as much information as we can in a CAM. Example from Netlogic application notes: – We want to store 4 tables of 32 bit wide IP destination addresses. – The CAM is 128 bits wide. – If we store directly in every slot 96 bits are wasted. We can arrange the 32 bit wide tables next to each other. – Every 128 bit slot is partitioned into four 32 bit slots. – These are 3 rd, 2 nd, 1 st, and 0 th tables going from left to right. – We use the global mask register to access only one of the tables. MASK 300000000FFFFFFFF MASK 2FFFFFFFF00000000FFFFFFFF MASK 1FFFFFFFF 00000000FFFFFFFF MASK 0FFFFFFFF 00000000

46 Example Continued We can still use the mask register (not global mask register) to do maximum prefix length match.


Download ppt "Class 09 Content Addressable Memories Cell Design and Peripheral Circuits."

Similar presentations


Ads by Google