EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;
Overview Introduction Research Goal Proposed approach Results Conclusion & Future work
Introduction IP LookupPacket Queue DRAMRouting Table Header Processing HdrData Hdr IP Address Next Hop
Introduction HW and SW solutions for IP lookup Software solutions unable to match link speed. Hardware solutions can accommodate today’s link speeds TCAMs most popular hardware device Consume up to 15 W/chip, (4-8 chips) Increased cooling costs and fewer ports
Current Approach Power Reduction in TCAM Partitioning of TCAM Array [Infocom’03, Hot Interconnect’02] Compaction (minimization) [Micro’02] Update techniques [Micro’02] Routing update TCAM updates
Bottleneck with existing approaches Power reduction Number of entries enabled is not bounded Does not avoid storing redundant information Update Minimization techniques are not incremental Update time is not independent of routing table size
Motivation Solution for bounded and reduced power consumption Truly incremental Routing and TCAM update
Contributions Contributions A pipelined architecture for IP Lookup New prefix properties (prefix aggregation and prefix expansion) Upper bound on number of entries enabled (256 x 3) Novel Page filling, memory management and incremental update techniques
Solution: Prefix properties Prefix Aggregation / / / / / /24 is the LCS for the given set of prefixes (rounded to nearest octet) /24 is the LCS for the given set of prefixes (rounded to nearest octet) Prefixes aggregated based on LCS mostly have the same next hop Prefixes aggregated based on LCS mostly have the same next hop Gives a bound on the number of prefixes minimized (256) Gives a bound on the number of prefixes minimized (256)
Solution: Prefix properties Router Total Prefixes Max Prefix Compaction Prefix Aggregation based Compaction ATT- Canada BBN- planet TABLE I. Comparision of prefix compaction using prefix aggregation property and Espresso II for attcanada and bbnplanet router
Solution: Prefix Properties Prefix expansion Prefixes having same length can be minimized Prefixes having same length can be minimized To increase minimization, extend prefixes of different length to nearest octet by adding don’t-cares To increase minimization, extend prefixes of different length to nearest octet by adding don’t-cares Extending to nearest octet useful for incremental update Extending to nearest octet useful for incremental update XX X X 1011XXXX
Solution: Prefix properties Overlapping prefixes Prefix length < 8 not present in routing table Prefix length < 8 not present in routing table Number of matching prefixes for IP address is ≤ 25 Number of matching prefixes for IP address is ≤ 25 Property is used to selectively enable bounded number of entries in TCAM, (256 x 3) Property is used to selectively enable bounded number of entries in TCAM, (256 x 3)
Solution: Architecture 2 level architecture, w1 bits in 1 st level and 32- w1 in 2 nd level Segment size corresponding to 1 st w(8) bits is variable Power bounded by segment size 2 nd Level bits 128.x 1.x 2.x Variable Sized Segment W 1 =8 bits 127.x 254.x 255.x 1 st Level Segmented Architecture for routing lookup using TCAM.
Solution: Architecture Memory Compaction Apply prefix properties to remove redundancies Apply prefix properties to remove redundancies Apply pruning, prefix aggregation and minimization in succession Apply pruning, prefix aggregation and minimization in succession Put all prefixes < w1 into bucket (Rarely occurring prefixes) Put all prefixes < w1 into bucket (Rarely occurring prefixes) Total number of entries after compaction
Solution: Architecture Paged TCAM architecture Group the prefixes of length > w1 based on their LCS Group the prefixes of length > w1 based on their LCS The LCS values (cubes) that cover the prefixes The LCS values (cubes) that cover the prefixes The cubes now correspond to the page id The cubes now correspond to the page id Prefixes covered by cube are stored in actual pages Prefixes covered by cube are stored in actual pages (Pages formed using LCS as page-id can result in under-utilization)
Architecture Block Diagram Page Table 1 Page Table I Page Table N Page I+1 Page I I+C max Bucket Comparator (32-w 1 )bits 32 bits 32 bits 32 bits (N* ) IP address IP address IP address Enable Line Pages formed using LCS as page-id can result in under-utilization)
How to avoid Under-utilization? LCS aggregation 1. Aggregate prefixes having different LCS by modifying the cube 2. Set page-size to optimal value – avoid too large and small pages Observe: The maximum size of page can be 256, based on the above property *
Solution: Page Filling Algorithm Page Filling Heuristics (2) 1. Generates cubes such that it covers maximum prefixes and page size < Aggregate the page ID’s in the page tables and store them in comparators for a 0 th level lookup 3. Find the total memory consumed (pages, page tables and comparator) for different values of w1 4. Get optimal value of w1 and page size β for which total memory is the least
Solution: Page Filling Page filling heuristics ensures: No page has more than β*γ entries, where γ is the page fill-factor No page has more than β*γ entries, where γ is the page fill-factor Number of cubes that cover all the prefixes are minimum Number of cubes that cover all the prefixes are minimum Total memory consumption is the least for a specific value of w1 and β Total memory consumption is the least for a specific value of w1 and β
Architecture Block Diagram Power Enabled blocks in EaseCAM Page Table 1 Page Table I Page Table N Page I+1 Page I I+C max Bucket Comparator (32-w 1 )bits 32 bits 32 bits 32 bits (N* ) IP address IP address IP address Enable Line
Solution: Architecture Bucket Prefixes of size < w1 are stored in bucket Prefixes of size < w1 are stored in bucket Word length of bucket is 32 Word length of bucket is 32 Either bucket or pages are searched during each lookup in the 2 nd level Either bucket or pages are searched during each lookup in the 2 nd level
Solution: Architecture Empirical model for memory α: fraction of total entries in the bucket αf : bucket fill factor γ: page fill factor Cmax: number of page ids in the page table N: the number of entries Pagemax: total number of pages βw1: represents the optimal page size Mimimum memory requirement = βw1* Pagemax * (32-w1)/32 + Pagemax + Pagemax/Cmax + N*α/ αf
Incremental Updates 100s updates/sec and 10 updates/sec after routing flaps Insertion If length of prefix > w1, If length of prefix > w1, 1.Minimize the prefix and find the new cube 2.Number of prefixes minimized < Update the page table and comparator if required 4.Update the TCAM with changed entries 5.TCAM insertion time and minimization time is time bounded
Solution: Incremental Update Deletion Delete the prefix from TCAM Delete the prefix from TCAM Update the page table entry and comparator if required Update the page table entry and comparator if required Total number of prefixes minimized < 256 Total number of prefixes minimized < 256 TCAM update time is also bounded TCAM update time is also bounded
Solution: Incremental Update Router Total Prefixes Micro’02 Approach Proposed Approach Size Time (sec) SizeTime(sec) attcanada bbnplanet Comparision of incremental update time
Solution: Memory Management Managing page overflow Reason: Lower value of γ. Pages with same cube are recomputed Pages with same cube are recomputed Free pages available in TCAM are used Free pages available in TCAM are used Comparators are also updated when required Comparators are also updated when required
Results Power consumption per lookup bbnplanet router attcanada router
Results Results Case study Memory requirements (γ=1 and α=1) Memory requirements (γ=1 and α=1) Router Raw data (entries) After Compaction (entries) Effect of Architecture (entries) Attcanada Bbnplanet Reduction in memory requirements
Results: Access time Pre-estimation using Cacti 3.0 on CAM structure Router Raw data (ns) (ns) After Compaction (ns) Effect of Architecture (ns) Attcanada Bbnplanet Reduction in access time
Results: Power Pre-estimation using Cacti 3.0 on CAM structure Reduction in power Router Raw data (W) After Compaction (W) Effect of Architecture (W) Attcanada14.35W7.38W0.135W Bbnplanet15.9W12.31W0.12W
Conclusion Significant reduction in memory consumption based on prefix compaction Pipelined architecture to store prefixes to achieve bounded power consumption Efficient memory management and incremental update techniques
Future work Apply Cacti model to TCAM structure Identify/design low-power TCAM cell Consider classification together with IP- lookup Fast on-chip logic minimization Explore parallel architectures & algorithms for IP processing.
Thank You !! Thank You !!Questions?