Download presentation
Presentation is loading. Please wait.
Published byKennedi Shirah Modified over 9 years ago
1
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University of Pittsburgh
2
Feb. 6 ’07 – CCW-21 Lookup ops in packet processing Packet forwarding Given an IP address Look up in a table (IP table) a matching prefix Make sure the chosen prefix is longest LPM (Longest Prefix Matching) Rule-based packet filtering Given a set of packet fields Look up in a rule database matching entries Deep packet inspection Given a string in packet payload Look up in a signature database matching entries
3
Feb. 6 ’07 – CCW-21 Lookup performance scalability Lookup performance must match increasing line speeds For OC-768, up to 104M packets must be processed per second Network traffic has doubled every year [McKeown03] Router capacity doubles every 18 months Capacity pressure Routing tables (~200K prefixes in a core router) are growing [RIS] # of firewall rules increases; 100K rules are practical [Baboescu04] IPv6 Power and thermal issue already a critical limiting factor in network processing device design [McKeown03] Two conventional lookup solutions Software methods (tries, hash table, …) Hardware methods (TCAM, Bloom filter, …)
4
Feb. 6 ’07 – CCW-21 IP lookup using a trie Consider an IP address: 0 1 0 0 0 1 1 0 “flexibility” high memory capacity requirement high memory bandwidth requirement not SCALABLE
5
Feb. 6 ’07 – CCW-21 IP lookup using TCAM Consider an IP address: 0 1 0 0 0 1 1 0 110100* 110101* 110111* 01000* 01100* 01101* 11011* 0100* 0110* 1101* 10* 0* sort before storing choose the first among the matched high bandwidth, constant time lookup TCAMs are relatively small, expensive power consumption very high not SCALABLE
6
Feb. 6 ’07 – CCW-21 CA-RAM – a hybrid approach Can we do better than the existing conventional schemes? Flexibility and search performance Exploit optimized RAM designs CA-RAM combines hashing w/ hardware parallel matching CA-RAM design goals High lookup performance Low power consumption Smaller chip area per stored datum Straightforward system-level integration
7
Feb. 6 ’07 – CCW-21 CA-RAM – Content Addressable RAM Separate match logic and memory Match logic for a single row, not every row Allows the use of dense RAM technology Enables highly reconfigurable match logic Keep keys sorted in each row, not in entire array Match logic Memory cells Conventional CAM/TCAMCA-RAM
8
Feb. 6 ’07 – CCW-21 Very simple, yet efficient Use hashing to store keys in a particular row To look up, hash the key and retrieve one row Perform matching on entire row in parallel Achieve full content addressability w/o paying overhead! Index generator Key i1 Match processor 1 … … Key i2 Key j2 Key j1 Match processor 2 … key
9
Feb. 6 ’07 – CCW-21 Pipelined CA-RAM operation Index generatorSearch key Key i1 Match processor 1 Key i2 Key j2 Key j1 Match processor 2 ResultMatch processor 3 Key i3 Key j3 Step 1Step 2Step 3Step 4 Index Key j2 Key j1 Key j3 Search keyMatch processor 2 Index generationMemory access Key matching Result forwarding
10
Feb. 6 ’07 – CCW-21 Dealing w/ bucket overflows Careful design of hash function Increase bucket size Reduce load factor ( ); = # of occupied entries / # of total entries Use “chaining”; store overflows in subsequent rows Multiple accesses per lookup Use a small overflow CAM, accessed in parallel Similar to popular “victim caching” in computer architecture Use two-level hashing and employ multiple CA-RAM banks … …
11
Feb. 6 ’07 – CCW-21 CA-RAM reconfig. opportunities Reconfigurable match logic allows: Adapting key size to apps Same hardware to support multiple apps or standards … …
12
Feb. 6 ’07 – CCW-21 Adapting key size Key i1 Reconfigurable match logic Key i2 Key j2 Key j1 Key i3 Key j3 Match information Key i1 Key i2 Key j2 Key j1 Adapting key size is straightforward Will benefit supporting multiple apps/ standards Select key bits for matching
13
Feb. 6 ’07 – CCW-21 CA-RAM reconfig. opportunities Reconfigurable match logic allows: Adapting key size to apps Same hardware to support multiple apps or standards Binary and ternary matching Some apps require ternary matching, some don’t … …
14
Feb. 6 ’07 – CCW-21 Supporting binary/ternary matching Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Mask j1 Mask i1 Developed configurable comparator T-matching requires 2 bits / 1 symbol Supporting different types of matching in different bit positions feasible Consider mask bits or not
15
Feb. 6 ’07 – CCW-21 CA-RAM reconfig. opportunities Reconfigurable match logic allows: Adapting key size to apps Same hardware to support multiple apps or standards Binary and ternary matching Some apps require ternary matching, some don’t Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half … …
16
Feb. 6 ’07 – CCW-21 Simult. key matching & data access Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Data j1 Data i1 Data access follows TCAM lookup CA-RAM supports data embedding Cuts memory traffic & latency by half Match information & Data Match key & bypass data
17
Feb. 6 ’07 – CCW-21 CA-RAM reconfig. opportunities Reconfigurable match logic allows: Adapting key size to apps Same hardware to support multiple apps or standards Binary and ternary matching Some apps require ternary matching, some don’t Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half Providing range checking capabilities Beneficial for rule-based packet filtering … …
18
Feb. 6 ’07 – CCW-21 Supporting range checking Reconfigurable match logic Match information Key i1 Range i1 Range j1 Key j1 Search key (Range checking causes troubles) (Entries must be expanded) CA-RAM can upport range checking efficiently Match key & check range
19
Feb. 6 ’07 – CCW-21 Evaluation We implemented a CA-RAM design (w/ reconfigurability) and evaluated its power and area advantages over state-of-the- art TCAMs We experimented with real routing tables to estimate the load factor and the average memory accesses per lookup
20
Feb. 6 ’07 – CCW-21 Comparing CA-RAM and TCAM Per Cell Area (um 2 ) @130nm 4.5x 11x 4.5Mb Power (W) @143MHz 14x 4x Cell area ( m 2 ) @130nm CMOS Power (W) 4.5Mb @143MHz CA-RAM area advantage 4.5x~11x CA-RAM power advantage 4x~14x
21
Feb. 6 ’07 – CCW-21 Mapping a large IP routing table Consider multiple design points: Design B Design A Design D Design C Design E Design F 2,048 rows (32 entries) 4,096 rows (64 entries) ( = 0.47) ( = 0.40) ( = 0.36) ( = 0.24) ( = 0.36)
22
Feb. 6 ’07 – CCW-21 Mapping a large IP routing table Spilled entries Average memory access latency ( = 0.47)( = 0.40)( = 0.36) ( = 0.24)( = 0.36) “Uniform” traffic “Skewed” traffic With a properly chosen , CA-RAM achieves near-constant AMAL
23
Feb. 6 ’07 – CCW-21 Mapping a large IP routing table CA-RAM advantageous over TCAM Design B
24
Feb. 6 ’07 – CCW-21 Conclusions Compared w/ software methods Less # of memory accesses; higher lookup performance Compared w/ TCAM Higher density matching that of DRAM large lookup table Exceeds the speed of TCAM Low power – a critical advantage for cost-effective system design Reconfigurability Can accommodate apps having different key/record sizes, binary vs. ternary searching requirements, range checking, … Can adopt new standards much more easily, e.g., IPv6
25
Feb. 6 ’07 – CCW-21 CA-RAM components Index generator Result Bus Key i1 Match processor 1 … … … Key i2 Key j2 Key j1 Match processorsMatch processor 2 C bits 2 R rows N bits
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.