1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami Melhem.
2 Background IP Lookup in Core Router Incoming Packet Lookup IP Address IP address Next Hop Outgoing Link **** (Port 2) Longer Prefix Matching Port 2
3 Motivation Increasing Internet Traffic High Speed links Optical technology -> link rates ~100Gbps High Speed Routers TCAM-based forwarding engines Larger forwarding tables TCAMs FAIL to scale.
4 IP Lookup Schemes 1.TCAM-based schemes. [idt, netlogic, micron,CoolCAM] 1.Fast and constant lookup time 2.High cost and power consumption 2.Trie-based schemes. [Eatherton04, Devroye03,…] 1.Multi-cycle lookup latencies and low worse-case throughput. 2.Performance and scalability are fundamentally tied with the IP address length. 3.Hash-based schemes. [Srinivasan98, Hasan06, Kaxiras05,…] 1.Key-length independent latencies 2.Easy to implement in hardware 3.Hashing collisions -> space inefficiency 4.Hash keys (prefixes) include “don’t care” bits and they make hashing complicated.
5 Overview Problem: Hash-based schemes can be power and cost efficient but are still space inefficient or slow. Goal: A hardware-based forwarding engine that has: 1. Constant and high speed lookup throughput. 2. Space efficiency. 3. Scales well with the increasing fwrding tables 4. Low cost and power consumption. Proposal: A h/w-based multi hash architecture with high throughput (1 packet lookup per mem cycle) and at the same time is space and power efficient.
6 Outline Introduction High Speed and Space Efficient Implementation Selecting hashing bits / Dealing with wildcard bits Experimental Evaluation Summary
7 h/w Hash-based IP Lookup key 1 key 2 key c … 2 R rows C entries C keys fetched match 1 match 2 match c … Hash Index generator Key (IP address) Matching Processors LPM logic C-way associative memory array Much more power efficient scheme compared with TCAM. High Throughput
8 Hash-based IP Lookup example key 1 key 2 key j … 2 R rows C entries C keys fetched match 1 match 2 match j … Hash Index generator Key (IP address) / 8 bits 1010**** 1111**** **** Next Hop 1111****
9 Hash-based IP Lookup - LPM key 1 key 2 key j … 2 R rows C entries C keys fetched match 1 match 2 match j … Hash Index generator Key (IP address) / 8 bits 1010**** ** * Next Hop 1010**** ** * LPM (Longest Prefix Match)
10 Hash index generation Simple XOR-folding hash function N selected bits FR = N – F Skew XOR IP Prefix or IP incoming address Bit-Select mechanism R bit hash index XOR hash function
11 Inserting / Hashing IP prefixes Space Utilization = 30% Single Hash Table Balanced is better Bucket index Bucket Load Total available memory space Used memory space
12 How to Improve the utilization of the hash table. Powerful Hash Functions -> Complexity -> Delay on Critical lookup path. Adaptive perfect or semi-perfect Hash Functions -> Rehashing of the whole routing table is needed periodically – very time consuming process. Using multiple hash functions (MHT) -> Increase of space efficiency Our proposal: multi-hashing scheme (MHT) + items are allowed to migrate during insertion operation.
13 IP prefix insertion (multi-hashing) h1 h2h3 Used Entry
14 Hashing IP prefixes: multi-hashing Single Hash Table Space Utilization 30%50% Bucket index Bucket Load Single hashing Multi-hashing with 3 hash tables.
15 Migrations are allowed during the insertion operation Insertion time? h1 h2h3
16 Hashing prefixes: MHT + migrations (a) (b) (c) Single Hash Table Single hashing Multi-hashing with 3 hash tables. Multi-hashing with 3 hash tables + migrations. Space Utilization 30%50% 70%
17 Crisis: Handling unresolved collisions Victim TCAM h1 h2h3
18 Outline Introduction High Speed and Space Efficient Implementation. Selecting Hashing Bits / Dealing with wildcard bits. Experimental Evaluation. Summary
19 Selecting hashing bits from prefixes ************************ / length = 8 bits **************** / length = 16 bits ******** / length = 24 bits - No prefix has length < 8 bits - Rightmost bits have higher entropy and are more suitable for hashing. - Routing tables become larger while wildcard bits participate in hashing.
20 Supporting wildcard bits in hashing Current technique: Convert each prefix of length x to a set of new prefixes of length L=x+k so the wildcard bits are eliminated up to length L. Then hash the whole new expanded set of prefixes. [Srinivasan et al.] -> Each prefix expands the table by 2^k prefixes / /16 … / / **************** / length = 16 bits
… ************** 16 keys to be inserted (index) (index) (index) (index) ************** 4 keys to be inserted CWR: Select bits from any carefully predefined positions CWR: -> Allows Sensitivity analysis that can find optimal configuration points for maximum space efficiency. -> faster Insertion time per prefix Control Wildcard Resolution (CWR)
22 Outline Introduction High Speed and Space Efficient Implementation. Selecting hashing bits / Dealing with wildcard bits Experimental Evaluation. Summary
23 Lookup Architecture … R+F bits (Selected bits for Index generation ) R bits Hash Index Tag to match T + F bits (TAG) … … LPM Incoming packet’s IP Address ( 32 bits) Bit-Select mechanism
24 Sensitivity Analysis Different Bit-select configurations 1.Advantage over the standard MHT scheme. 2.Very small deviation of the points around the trend line. -> a practical guarantee that the unresolved collisions will not be far from an estimated value.
25 Comparison for h/w based schemes TCAMIPStashNew scheme Descriptionh/w CAM based h/w Hash- based Throughput11/31 Space Efficiency BestVery Good (state of the art for hash-based) Good Power consumption CAM => high consumption per lookup 2.2 mem access per lookup + many row comparators 1 mem access per lookup + few row comparators
26 Space Efficiency - Comparison Load Factor = Routing table size / Available space capacity
27 Power Consumption Even with load factor = x more power efficient than TCAM - 2x compared with IPStash.
28 Victim TCAM space requirements The percentage of the ‘unresolved collisions’ is an accurate estimator of the victim space that is required for the corresponding load factor.
29 Summary IP Lookup using TCAMs is expensive. Current hash-based approaches are promising but are either space inefficient or limited by low lookup throughput. The proposed h/w-based multi-hash lookup scheme has: 1. High Speed Lookup Throughput. Requires 1 mem access time per packet lookup 2. Space Efficiency. Effective Load Factor 70% with < 5% victim TCAM 3. Low power consumption and cost. 8x less power than dynamic TCAMs. Best among hash-based schemes. Simple and easy hardware implementation. 4. Scalable to future routing table sizes abd IPv6 transition. All methods and techniques used scale well.
30 Questions source code: