Download presentation
Presentation is loading. Please wait.
Published byNicholas Bruno Watts Modified over 9 years ago
1
Author: Sriram Ramabhadran, George Varghese Publisher: SIGMETRICS’03 Presenter: Yun-Yan Chang Date: 2010/12/29 1
2
Introduction Previous works Scheme ◦ LR(T) ◦ Aggregated bitmap Implementation Conclusion 2
3
Remove bottleneck of [1] by proposing a counter management algorithm (CMA) called LR(T) (Largest Recent with threshold T) that avoids sorting by only keeping a bitmap that tracks counters that are larger than threshold T. 3
4
D. Shah, S. Iyer, B. Prabhakar, and N. McKeown ◦ Maintaining statistics counters in router line cards Propose a hybrid architecture in which DRAM is used to store the statistics counters but a small amount of SRAM is used to enable counter updates at line rate. Propose a CMA called LCF (Largest Counter First) which picks the counter with the largest value to be updated to DRAM. 4
5
Architecture ◦ SRAM stores N counters of size m<M bits. ◦ DRAM stores N counters of size M bits. The SRAM counters hold recent updates and are periodically transferred to the corresponding DRAM counters. Figure 1. Statistics counter architecture 5
6
Largest Counter First (LCF) ◦ An algorithm which can minimize the size of SRAM. Selects the largest counter. If multiple counters have the same value, picks one arbitrarily. Updates the value of the corresponding counter in the DRAM and sets in the SRAM. ◦ Bottleneck: Sort: find the highest counter Difficult to implement at high speed 6
7
Algorithm description ◦ Let j * be the counter with the largest value among the counters incremented in the last cycle of b updates to SRAM. ◦ If the value of counter c j* ≥T, then updates counter j * to DRAM. ◦ If c j* <T, LR(T) updates any counter with value at least T to DRAM. ◦ If no counter exists, LR(T) updates counter j * to DRAM. 7
8
Proof: ◦ Threshold T=0 allows a simple implementation, while T=b is optimal and minimizes the size of SRAM requirement. ◦ LR(0) Only remembers the last b updates to SRAM in determining which counter update to DRAM. Let be maximum value of a counter can reach under LR(0) Theorem 1: Implies SRAM counter of size at least 8
9
◦ LR(b) Threshold increases from 0 to b. b: time between accesses DRAM Let be maximum value of a counter can reach under LR(0). Theorem2: Implies any counter is at most (b − 1)(N − 1) Value of counter cannot be larger than (b-1)+log d (N-1) 9, where
10
To minimize the required storage ◦ Consider a fixed universe U of N elements labelled 1, 2,…,N. ◦ Use a bitmap b 1 b 2... b N to record which elements are contained in set S or not. b i is set to 1 if element i ∈ S, otherwise set to 0. Implement functions: ◦ add(i) Adds element i to set S ◦ delete(i)Deletes element i from set S ◦ test(i) Tests whether element i belongs to set S ◦ find() Returns any element i that belongs to set S 10
11
Figure 2: Aggregated bitmap for N = 128 elements and W = 16 word size. 11
12
Each group of W bits in the bitmap is aggregated to form a single node. ◦ N : bits of aggregated bitmap ◦ W : the word size (N and W must be power of 2) 12 Figure 2: Aggregated bitmap for N=128 elements and W=16 word size. Total: nodes Total memory: W
13
Each internal node in the tree contains two fields called lcount and rcount. ◦ lcount is the number of 1s present in its left child ◦ rcount is the number of 1s present in its right child 13 Figure 2: Aggregated bitmap for N=128 elements and W=16 word size. lcountrcount
14
Pipelined implementation ◦ Each operation proceeds top-down, start at root, from one level to another. ◦ At each level of the tree, there is potentially a memory read followed by a memory write. ◦ Storing each of the levels of the tree in a different memory bank permits simultaneous access to all levels of the tree. 14
15
To implement LR(T), it’s necessary to keep track of two things: ◦ The largest value among all counters updated in the last cycle of b updates along with the corresponding counter j ∗. ◦ All counters above the threshold T. Memory accesses for counter operations and bitmap operations proceed in parallel. 15
16
Every cycle of b updates involves b SRAM and a DRAM update operation 16 Figure 3: Timing diagram for SRAM and DRAM updates for two successive cycles of b counter updates. ◦ SRAM update operation Two accesses to update SRAM counter Two accesses for add ◦ DRAM update operation Two accesses to read and reset SRAM counter Four accesses for delete and find. Two DRAM accesses to update DRAM counter
17
For a reference system of a million 64-bit counters and a line rate of 10 Gbps with 10 counter updates per packet 17 Table 1: Cost - benefit comparison for different schemes.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.