Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

Similar presentations

Presentation on theme: "A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University."— Presentation transcript:

1 A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University of Pittsburgh

2 Feb. 6 ’07 – CCW-21 Lookup ops in packet processing  Packet forwarding Given an IP address Look up in a table (IP table) a matching prefix Make sure the chosen prefix is longest  LPM (Longest Prefix Matching)  Rule-based packet filtering Given a set of packet fields Look up in a rule database matching entries  Deep packet inspection Given a string in packet payload Look up in a signature database matching entries

3 Feb. 6 ’07 – CCW-21 Lookup performance scalability  Lookup performance must match increasing line speeds For OC-768, up to 104M packets must be processed per second Network traffic has doubled every year [McKeown03] Router capacity doubles every 18 months  Capacity pressure Routing tables (~200K prefixes in a core router) are growing [RIS] # of firewall rules increases; 100K rules are practical [Baboescu04] IPv6  Power and thermal issue already a critical limiting factor in network processing device design [McKeown03]  Two conventional lookup solutions Software methods (tries, hash table, …) Hardware methods (TCAM, Bloom filter, …)

4 Feb. 6 ’07 – CCW-21 IP lookup using a trie  Consider an IP address: 0 1 0 0 0 1 1 0  “flexibility”  high memory capacity requirement  high memory bandwidth requirement  not SCALABLE

5 Feb. 6 ’07 – CCW-21 IP lookup using TCAM  Consider an IP address: 0 1 0 0 0 1 1 0 110100* 110101* 110111* 01000* 01100* 01101* 11011* 0100* 0110* 1101* 10* 0* sort before storing choose the first among the matched  high bandwidth, constant time lookup  TCAMs are relatively small, expensive  power consumption very high  not SCALABLE

6 Feb. 6 ’07 – CCW-21 CA-RAM – a hybrid approach  Can we do better than the existing conventional schemes? Flexibility and search performance Exploit optimized RAM designs  CA-RAM combines hashing w/ hardware parallel matching  CA-RAM design goals High lookup performance Low power consumption Smaller chip area per stored datum Straightforward system-level integration

7 Feb. 6 ’07 – CCW-21 CA-RAM – Content Addressable RAM  Separate match logic and memory  Match logic for a single row, not every row  Allows the use of dense RAM technology  Enables highly reconfigurable match logic  Keep keys sorted in each row, not in entire array Match logic Memory cells Conventional CAM/TCAMCA-RAM

8 Feb. 6 ’07 – CCW-21 Very simple, yet efficient  Use hashing to store keys in a particular row  To look up, hash the key and retrieve one row  Perform matching on entire row in parallel  Achieve full content addressability w/o paying overhead! Index generator Key i1 Match processor 1 … … Key i2 Key j2 Key j1 Match processor 2 … key

9 Feb. 6 ’07 – CCW-21 Pipelined CA-RAM operation Index generatorSearch key Key i1 Match processor 1 Key i2 Key j2 Key j1 Match processor 2 ResultMatch processor 3 Key i3 Key j3 Step 1Step 2Step 3Step 4 Index Key j2 Key j1 Key j3 Search keyMatch processor 2 Index generationMemory access Key matching Result forwarding

10 Feb. 6 ’07 – CCW-21 Dealing w/ bucket overflows  Careful design of hash function  Increase bucket size Reduce load factor (  );  = # of occupied entries / # of total entries  Use “chaining”; store overflows in subsequent rows Multiple accesses per lookup  Use a small overflow CAM, accessed in parallel Similar to popular “victim caching” in computer architecture  Use two-level hashing and employ multiple CA-RAM banks … …

11 Feb. 6 ’07 – CCW-21 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards … …

12 Feb. 6 ’07 – CCW-21 Adapting key size Key i1 Reconfigurable match logic Key i2 Key j2 Key j1 Key i3 Key j3 Match information Key i1 Key i2 Key j2 Key j1  Adapting key size is straightforward  Will benefit supporting multiple apps/ standards Select key bits for matching

13 Feb. 6 ’07 – CCW-21 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t … …

14 Feb. 6 ’07 – CCW-21 Supporting binary/ternary matching Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Mask j1 Mask i1  Developed configurable comparator  T-matching requires 2 bits / 1 symbol  Supporting different types of matching in different bit positions feasible Consider mask bits or not

15 Feb. 6 ’07 – CCW-21 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t  Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half … …

16 Feb. 6 ’07 – CCW-21 Simult. key matching & data access Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Data j1 Data i1  Data access follows TCAM lookup  CA-RAM supports data embedding  Cuts memory traffic & latency by half Match information & Data Match key & bypass data

17 Feb. 6 ’07 – CCW-21 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t  Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half  Providing range checking capabilities Beneficial for rule-based packet filtering … …

18 Feb. 6 ’07 – CCW-21 Supporting range checking Reconfigurable match logic Match information Key i1 Range i1 Range j1 Key j1 Search key  (Range checking causes troubles)  (Entries must be expanded)  CA-RAM can upport range checking efficiently Match key & check range

19 Feb. 6 ’07 – CCW-21 Evaluation  We implemented a CA-RAM design (w/ reconfigurability) and evaluated its power and area advantages over state-of-the- art TCAMs  We experimented with real routing tables to estimate the load factor and the average memory accesses per lookup

20 Feb. 6 ’07 – CCW-21 Comparing CA-RAM and TCAM Per Cell Area (um 2 ) @130nm 4.5x 11x 4.5Mb Power (W) @143MHz 14x 4x Cell area (  m 2 ) @130nm CMOS Power (W) 4.5Mb @143MHz  CA-RAM area advantage 4.5x~11x  CA-RAM power advantage 4x~14x

21 Feb. 6 ’07 – CCW-21 Mapping a large IP routing table Consider multiple design points: Design B Design A Design D Design C Design E Design F 2,048 rows  (32 entries) 4,096 rows  (64 entries) (  = 0.47) (  = 0.40) (  = 0.36) (  = 0.24) (  = 0.36)

22 Feb. 6 ’07 – CCW-21 Mapping a large IP routing table Spilled entries Average memory access latency (  = 0.47)(  = 0.40)(  = 0.36) (  = 0.24)(  = 0.36) “Uniform” traffic “Skewed” traffic  With a properly chosen ,  CA-RAM achieves near-constant AMAL

23 Feb. 6 ’07 – CCW-21 Mapping a large IP routing table  CA-RAM advantageous over TCAM Design B

24 Feb. 6 ’07 – CCW-21 Conclusions  Compared w/ software methods Less # of memory accesses; higher lookup performance  Compared w/ TCAM Higher density matching that of DRAM  large lookup table Exceeds the speed of TCAM Low power – a critical advantage for cost-effective system design  Reconfigurability Can accommodate apps having different key/record sizes, binary vs. ternary searching requirements, range checking, … Can adopt new standards much more easily, e.g., IPv6

25 Feb. 6 ’07 – CCW-21 CA-RAM components Index generator Result Bus Key i1 Match processor 1 … … … Key i2 Key j2 Key j1 Match processorsMatch processor 2 C bits 2 R rows N bits

Download ppt "A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University."

Similar presentations

Ads by Google