Download presentation
Presentation is loading. Please wait.
1
Author: Kang Li, Francis Chang, Wu-chang Feng Publisher: INCON 2003 Presenter: Yun-Yan Chang Date:2010/11/03 1
2
Introduction Approach Cache Architecture Evaluation 2
3
Given a limited silicon resources, what is the best way to implement a packet classification cache? Determine how to best use the limited resource in three aspects: ◦ Cache’s associativity ◦ Replacement policy ◦ Hash function 3
4
The method used to evaluate the performance of cache is to use trace-driven simulations. ◦ Packet Classification Cache Simulator (PCCS) ◦ Trace data set Bell Labs New Zealand University (OC-3 link) 4 Fig 1. Flow volume in traces
5
The cache memory is an N-way set associative cache, which splits the cache memory into N memory banks. Each memory bank is a directly mapped cache that is addressable by the output of the hash function. Fig 2. Cache Architecture 5 m: Flow-ID size k: result size N: set associative cache
6
For an N-way set associative cache, every input FlowID selects N memory entries from each memory bank. Each entry contains an m-bit FlowID and a k- bit classification result. Classification result is at least 1 bit for a packet filter, but could be multiple bits. Fig 2. Cache Architecture 6 m: Flow-ID size k: result size N: set associative cache
7
Cache associativity ◦ Focus on storage costs on cache. Direct-mapped N-way associative Fully associative 7 Fig 3. Cache associativity
8
Cache replacement ◦ Determines which entry must be replaced in order to make room for a newly classified flow. LRU (Least-Recently-Used) replacement LFU (Least-Frequency-Used) replacement Probabilistic replacement Algorithm: 8 update(state) if (state == 0) h = alpha * h; else h = alpha * h + (1 - alpha); alpha = 0.9 h: recent hit ratio Upon cache miss: update(0) // replace entry with probability h Upon hit miss: update(1)
9
◦ Performance of different cache replacement algorithms using a 4-way associative cache 9 Fig. 5. Replacement policies using 4-way caches
10
Hash function ◦ A critical component to implementing a cache is the hash function used to index into it. ◦ Traditional hash function, such as SHA-1 hash function, the generation output takes more than 1000 logic operations (Shift, AND, OR, and XOR) using 32-bit word. ◦ To reduce the size of hash function and the latency, we design a XOR-based hash function operates on the packet header, and consumes only 16 logic operations (XOR and Shift). 10
11
11 Fig 6. XOR-based hash function (needs 16 logic operations) SHA-1 hash function (needs more than 1000 logic operations)
12
◦ Compare the performance with 4-way associative LRU cache. The performance of XOR-based hash function is almost equal to the SHA-1 hash function. 12 Fig. 7. Hash performance using 4-way, LRU cache.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.