Approximating Hit Rate Curves using Streaming Algorithms Nick Harvey Joint work with Zachary Drudi, Stephen Ingram, Jake Wires, Andy Warfield TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A
Modern Caching Registers, L1, L2, L3 RAM Disk SSD Cloud Storage Proxy CDN Associative map LRU etc. LRU Consistent Hashing... from 1968 CPUs are >1000x faster Disk latency is <10x better Cache misses are increasingly costly
Server Virtualization Modern servers are heavily virtualized How should we allocate the physical cache among virtual servers to improve overall performance? What is “marginal benefit” to giving server more cache?
Hit Rate Curve Hit rate MSR Cambridge “TS” Trace, LRU Policy Fix a particular workload and caching policy If cache size is x, what would hit rate be? HRCs are useful for choosing an appropriate cache size Cache Size (GB) “Elbow” “Knee” “Working Set” Not much marginal benefit of a bigger cache
Hit Rate Curve Real-world HRCs need not be concave or smooth “Marginal benefit” and “Working set” are undefined Cache Size (GB) Hit rate MSR Cambridge “Web” Trace, LRU Policy “Elbow”? “Knee”? “Working Set”?
LRU Caching (1968) Policy: An LRU cache of size x always contains the x most recently requested distinct symbols. A B C A D A B … If cache size >3 then B will still be in the cache during the second request for B. – Second request for B is a hit for cache size x if x>3. Monotonicity: Larger caches include contents of smaller caches. 3 distinct symbols “Reuse Distance”
Mattson’s Algorithm (1970) for computing Hit Rate Curve Keep list of all blocks, sorted by most recent request time. Reuse distance of a request is its position in that list. If distance is d, this request is a hit for all cache sizes ¸ d. Hit rate curve is CDF of reuse distances. Easy exercise: implement in O(m log n) time A B C A D A B … List: AB AC B AA C BD A C BA D C BB A D C Requests: Space is (n) n = # blocks m = length of trace
We ran an optimized C implementation of Mattson on the MSR-Cambridge traces of 13 live servers over 1 week Trace file is 20GB in size, 2.3B requests, 750M blocks (3TB) Processing time: 1 hour RAM usage: 92GB Lesson: Cannot afford linear space to process storage workloads Question:Can we estimate HRCs in sublinear space? Is linear space OK?
Quadratic Space A B C A D A B Requests: Set of all subsequent items: A BB CCC AAA DDDDD AA BBBBB Items seen since first request Items seen since second request Reuse distance is size of oldest set that grows. Hit rate curve is CDF of reuse distances. Reuse Distance = 2 Reuse Distance = 3 Reuse Distance = 1
Quadratic Space A B C A D A B Requests: For t=1,…,m Receive request b t; Find minimum j such that b t is not in j th set Let v j be cardinality of j th set Record a hit at reuse distance v j Insert b t into all previous sets Set of all subsequent items: A BB CCC AAA DDDDD AA v j = 3 j=3
More Abstract Version For t=1,…,m Let v j be cardinality of j th set Receive request b t Let ± j be change in j th set’s cardinality when adding b t For j=2,…,t Record ( ± j - ± j-1 ) hits at reuse distance v j A B C A D A B Requests: Set of all subsequent items: A BB CCC AAA DDDDD AA ±j:±j: ± j - ± j-1 : v j = 3 How should we represent these sets?Hash table? ; Insert b t into all previous sets
Insert Delete Member? Cardinality? Space (in bits) Random Set Data Structures Bloom FilterF 0 Estimator Yes No Yes* No (n) Yes No Yes* O(log n) Operations “Distinct Element Estimator” * allowing some error
Subquadratic Space A B C A D A B Requests: Set of all subsequent items: Items seen since first request Items seen since second request Reuse distance is size of oldest set that grows (cardinality query) Hit rate curve is CDF of reuse distances. F0 Estimator Insert … For t=1,…,m Let v j be value of j th F 0 -estimator Receive request b t Let ± j be change in j th F 0 -estimator when adding b t For j=2,…,t Record ( ± j - ± j-1 ) hits at reuse distance v j
Towards Sublinear Space A B C A Requests: Set of all subsequent items: Note that an earlier F 0 -estimator is a superset of later one Can this be leveraged to achieve sublinear space? F0 Estimator … ¶¶¶
F 0 Estimation [Flajolet-Martin ‘83, Alon-Mattias-Szegedy ‘99, …, Kane-Nelson-Woodruff ‘10] Operations: Insert(x) Cardinality(), with (1+ ² ) multiplicative error Space: log(n)/ ² 2 bits £ ( ² -2 +log n) is optimal log n rows ² -2 columns
F 0 Estimation A B C A D A B … Hash function h (uniform) Hash function g (geometric) Operations: Insert(x), Cardinality() ² -2 columns 1 1 log n rows
F 0 Estimation 11 1 A B C A D A B … Hash function h (uniform) Hash function g (geometric) Operations: Insert(x), Cardinality() ² -2 columns log n rows
F 0 Estimation A B C A D A B … Hash function h (uniform) Hash function g (geometric) Operations: Insert(x), Cardinality() ² -2 columns log n rows
F 0 Estimation A B C A D A B … Hash function h (uniform) Hash function g (geometric) Operations: Insert(x), Cardinality() ² -2 columns log n rows
F 0 Estimation A B C A D A B … Hash function h (uniform) Hash function g (geometric) Operations: Insert(x), Cardinality() ² -2 columns log n rows
F 0 Estimation Suppose we insert n distinct elements # of 1 s in a column is max of ¼ n ² 2 geometric RVs, so ¼ log(n ² 2 ) Averaging over all columns gives a concentrated estimate for log(n ² 2 ) Exponentiating and scaling gives concentrated estimate for n Operations: Insert(x), Cardinality() ² -2 columns log n rows
F 0 Estimation for a chain cf. Sliding Window Estimation word ² -2 columns Operations: Insert(x) Cardinality(t), estimate # distinct elements since t th insert Space: log(n)/ ² 2 words log n rows
F 0 Estimation for a chain 1 1 A B C A D A B … Hash function h (uniform) Hash function g (geometric) ² -2 columns Operations: Insert(x), Cardinality(t) Space: log(n)/ ² 2 words log n rows
21 1 A B C A D A B … Hash function h (uniform) Hash function g (geometric) ² -2 columns F 0 Estimation for a chain Operations: Insert(x), Cardinality(t) log n rows
213 1 A B C A D A B … Hash function h (uniform) Hash function g (geometric) ² -2 columns F 0 Estimation for a chain Operations: Insert(x), Cardinality(t) log n rows
243 4 A B C A D A B … Hash function h (uniform) Hash function g (geometric) ² -2 columns F 0 Estimation for a chain Operations: Insert(x), Cardinality(t) log n rows
A B C A D A B … Hash function h (uniform) Hash function g (geometric) ² -2 columns F 0 Estimation for a chain Operations: Insert(x), Cardinality(t) log n rows
² -2 columns F 0 Estimation for a chain The {0,1}-matrix consisting of all entries ¸ t is the same as the matrix for an F 0 estimator that started at time t. So, for any t, we can estimate # distinct elements since time t. Operations: Insert(x), Cardinality(t) log n rows
Theorem 1: Let C : [n] ! [0,1] be true HRC. Let Ĉ : [n] ! [0,1] be estimated HRC. Using O(B 2 ¢ ² -2 ¢ log(n) ¢ log 2 (m)) bits of space, can get C((j-1) ¢ W) - ² · Ĉ (j ¢ W) · C(j ¢ W)+ ² 8 j =1,…,B Vertical error Horizontal error n = # distinct blocks m = # requests B = # “bins” W = n/B = width of each “bin” Theorem 2: Suppose an algorithm outputs Ĉ satisfying C((j-1) ¢ W) - ² · Ĉ (j ¢ W) · C(j ¢ W)+ ² 8 j =1,…,B Then it must use (B 2 + ² -2 + log(n)) bits of space.
Conclusions “Working set” has no definition. Need to understand the entire “hit rate curve”. Can estimate HRCs in sublinear space, quickly and accurately. Our algorithm has been implemented in the Coho Data product. It is running live at dozens of customer sites. To diagnose cache performance issues, it streams the F 0 -Estimators back to the Coho Data monitoring tools.