Download presentation
Presentation is loading. Please wait.
Published byRey Granby Modified over 9 years ago
1
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University of Pittsburgh
2
Background ISP Internet Subnet ISP end user router end user
3
Network packet processing tasks Packet forwarding Given an IP address Look up in a table (IP table) a matching prefix Make sure the chosen prefix is longest LPM (Longest Prefix Matching) requirement Rule-based packet filtering Given a set of packet fields (src/dst IP, src/dst port, protocol, …) Look up in a rule database matching entries Deep packet inspection Given a string in packet payload Look up in a signature database matching entries
4
Lookup performance scalability Lookup performance must match increasing line speeds For OC-768, up to 104M packets must be processed per second Network traffic has doubled every year [McKeown03] Router capacity doubles every 18 months Capacity pressure Routing tables (~200K prefixes in a core router) are growing [RIS] # of firewall rules increases; 100K rules are practical [Baboescu04] IPv6 Power and thermal issues already a critical limiting factor in network processing device design [McKeown03] Two conventional lookup solutions Software methods (tries, hash table, …) Hardware methods (TCAM, Bloom filter, …)
5
IP lookup using a trie Consider an IP address: 0 1 0 0 0 1 1 0 “flexibility” high memory capacity requirement low memory bandwidth utilization not SCALABLE
6
IP lookup using TCAM Consider an IP address: 0 1 0 0 0 1 1 0 110100* 110101* 110111* 01000* 01100* 01101* 11011* 0100* 0110* 1101* 10* 0* sort before storing choose the first among the matched high bandwidth, constant time lookup TCAMs are relatively small, expensive power consumption very high not SCALABLE
7
Recap: Why is TCAM inefficient? all bits are involved in matching large embedded match logic “large” means more work in this case
8
CA-RAM–a hybrid approach Can we do better than the existing schemes? Flexibility and search performance Exploit optimized RAM designs Hardware approach (software too slow) CA-RAM combines hashing w/ hardware parallel matching CA-RAM design goals High lookup performance Low power consumption Smaller chip area per stored datum Straightforward system-level integration
9
CA-RAM–Content Addressable RAM Separate match logic and memory Match logic for a single row, not every row Allows the use of dense RAM technology Enables highly reconfigurable match logic (Keep keys sorted in each row, not in entire array) Match logic Memory cells Conventional CAM/TCAMCA-RAM
10
Very simple, yet efficient Use hashing to store keys in a particular row To look up, hash the key and retrieve one row Perform matching on entire row in parallel Achieve (full) content addressability w/o paying overhead! Index generator Key i1 Match processor 1 … … Key i2 Key j2 Key j1 Match processor 2 … key
11
Pipelined CA-RAM operation Index generatorSearch key Key i1 Match processor 1 Key i2 Key j2 Key j1 Match processor 2 ResultMatch processor 3 Key i3 Key j3 Step 1Step 2Step 3Step 4 Index Key j2 Key j1 Key j3 Search keyMatch processor 2 Index generationMemory access Key matching Result forwarding
12
Dealing w/ bucket overflows Careful design of hash function Increase bucket size Reduce load factor ( ); = # of occupied entries / # of total entries Trade-off space for performance Use “chaining”; store overflows in subsequent rows Multiple accesses per lookup Use a small overflow CAM, accessed in parallel Similar to popular “victim caching” in computer architecture Use two-level hashing and employ multiple CA-RAM banks … …
13
CA-RAM reconfig. opportunities Reconfigurable match logic allows: Adapting key size to apps Same hardware to support multiple apps or standards … …
14
Adapting key size Key i1 Reconfigurable match logic Key i2 Key j2 Key j1 Key i3 Key j3 Match information Key i1 Key i2 Key j2 Key j1 Adapting key size is straightforward Will benefit supporting multiple apps/ standards Select key bits for matching
15
CA-RAM reconfig. opportunities Reconfigurable match logic allows: Adapting key size to apps Same hardware to support multiple apps or standards Binary and ternary matching Some apps require ternary matching, some don’t … …
16
Supporting binary/ternary matching Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Mask j1 Mask i1 Developed configurable comparator T-matching requires 2 bits / 1 symbol Supporting different types of matching in different bit positions feasible Consider mask bits or not
17
CA-RAM reconfig. opportunities Reconfigurable match logic allows: Adapting key size to apps Same hardware to support multiple apps or standards Binary and ternary matching Some apps require ternary matching, some don’t Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half … …
18
Simult. key matching & data access Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Data j1 Data i1 Data access follows TCAM lookup CA-RAM supports data embedding Cuts memory traffic & latency by half Match information & Data Match key & bypass data
19
CA-RAM reconfig. opportunities Reconfigurable match logic allows: Adapting key size to apps Same hardware to support multiple apps or standards Binary and ternary matching Some apps require ternary matching, some don’t Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half Providing range checking capabilities Beneficial for rule-based packet filtering … …
20
Supporting range checking Reconfigurable match logic Match information Key i1 Range i1 Range j1 Key j1 Search key (Range checking causes troubles) (Entries must be expanded) CA-RAM can upport range checking efficiently Match key & check range
21
Evaluation We implemented a CA-RAM design (w/ reconfigurability) and evaluated its power and area advantages over state-of-the-art TCAMs We experimented with real routing tables to estimate the load factor and the average memory accesses per lookup
22
Mapping a large IP routing table Consider multiple design points: Design B Design A Design D Design C Design E Design F 2,048 rows (32 entries) 4,096 rows (64 entries) ( = 0.47) ( = 0.40) ( = 0.36) ( = 0.24) ( = 0.36)
23
Mapping a large IP routing table Spilled entries Average memory access latency ( = 0.47)( = 0.40)( = 0.36) ( = 0.24)( = 0.36) “Uniform” traffic “Skewed” traffic With a properly chosen , CA-RAM achieves near-constant AMAL
24
Comparing CA-RAM and TCAM Per Cell Area (um 2 ) @130nm 4.5x 11x 4.5Mb Power (W) @143MHz 14x 4x Cell area ( m 2 ) @130nm CMOS Power (W) 4.5Mb @143MHz CA-RAM area advantage 4.5x~11x CA-RAM power advantage 4x~14x
25
Conclusions Compared w/ software methods Less # of memory accesses; higher lookup performance Compared w/ TCAM Higher density matching that of DRAM large lookup table Exceeds the speed of TCAM Low power – a critical advantage for cost-effective system design Reconfigurability Can accommodate apps having different key/record sizes, binary vs. ternary searching requirements, range checking, … Can adopt new standards much more easily, e.g., IPv6
26
Mapping a large IP routing table CA-RAM advantageous over TCAM Design B
27
CA-RAM components Index generator Result Bus Key i1 Match processor 1 … … … Key i2 Key j2 Key j1 Match processorsMatch processor 2 C bits 2 R rows N bits
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.