Download presentation
Presentation is loading. Please wait.
Published byAshlyn Paul Modified over 9 years ago
1
RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1
2
The Problem 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 2 Caches: power vs performance Reconfigurable caches e.g., IvyBridge The Problem: Which configuration to select? e.g., to get the best energy-efficiency? Core LLC DRAM Miss Fetch
3
Cache Performance Prediction 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 3 We propose a framework h = (r · B) · φ h: hit ratio r: reuse-distance distribution (novel hardware support) B: stochastic Binomial matrix φ: hit function (LRU, PLRU, RANDOM, NMRU) Case study: Energy-Delay Product (EDP) within 7% of minimum
4
Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 4 The Problem Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study
5
Cache Overview 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 5 Limited storage Sets of (usually 64-byte) blocks #blocks/set = associativity (#ways) Set Index + Address tags identify data bbbbbbbb bbbbbbbb bbbbbbbb bbbbbbbb Associativity (A) Sets (S) Address Tag Match? YHit Miss N
6
Last-Level Cache (LLC) Workload Variation swim 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 6 ammp, blackscholes, bodytrack, fluidanimate, freqmine, swaptions equake, gafort, wupwise apache mgrid zeus oltp jbb fma3d
7
Bad configurations hurt! 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 7 EDP (energy-delay product) 27% worse 218% worse Minimum Maximum
8
Problem Summary 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 8 Reconfigurable caches Multiple replacement policies Goal: Online miss-ratio prediction bbbbbbbb bbbbbbbb bbbbbbbb bbbbbbbb Associativity (A) Sets (S)
9
Indexing Assumption 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 9 Mapping of unique addresses to cache sets Assumption: independent, uniform [Smith, 1978] Unique accesses as Bernoulli trials (Partial) Hashing POWER4, POWER5, POWER6, Xeon Simple XOR-based function [similar to Cypher, 2008]
10
Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 10 The Problem Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study
11
Temporal Locality Metrics 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 11 Unique Reuse Distance (URD) #unique intervening addresses x y z z y x : URD(x)=2 Stack Distance [Mattson, 1970] – 1 Large cache large distances to track Absolute Reuse Distance (ARD) #intervening addresses x y z z y x : ARD(x)=4 ■ ■ ■ ■ … ■ ■ i P(URD=i) r Size?
12
Per-set Locality, r(S) 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 12 r(S) is “compressed” as S (#sets) increases Less of the tail is important ■ ■ ■ ■ … ■ ■ i P(URD=i) r x x x x #sets: S #sets: S > S
13
Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 13 The Problem Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study
14
Generalized stochastic Binomial matrices [Strum, 1977] r(S) = r(1) · B(1 – 1/S, 1/S) Composition: r(S) = r(S) · B(1 – S/S, S/S) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Estimating per-set locality 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 14 ■ ■ ■ ■ i P(URD=i) k i r B P(k successes in i trials) i.e., P(k of i to the same set) 0 0 0 0 0 0 0 0 0 0 0 1
15
Computation reuse & speedup 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 15 “Shorter” tail smaller matrices r(1) r(2 14 ) r(2 13 ) r(2 12 ) r(2 11 ) r(2 10 ) r(2 14 ) r(2 13 ) r(2 12 ) r(2 11 ) r(1) Now: compute Later: hardware support Size? Poisson Approximation ■ ■ ■ ■ … ■ ■ i P(URD=i) r
16
Size of r(2 10 )? 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 16 Prediction with r(2 10 ) limited to URD < n ■ ■ ■ ■ … ■ ■ i P(URD=i) r
17
Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 17 The Problem Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study
18
Hit Function, φ 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 18 φ k : P(x will hit|URD(x)=k) Monotonically decreasing model Intuition: larger URD same or larger eviction probability φ 0 = 1 φ k ≤ φ k-1 φ = 0 x Not x x ∞
19
Hit Function, φ 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 19 Example: A=8
20
Formulating φ 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 20 φ(LRU): step-function (r · B) · φ(LRU) [Smith, 1978], [Hill & Smith, 1989] φ(PLRU): Assumes on average, traffic evenly divided between subtrees φ(RANDOM): Estimates #intervening misses using ARD φ(NMRU): similar to φ(RANDOM) except φ 1 =1
21
Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 21 The Problem Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study
22
Prediction Accuracy 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 22 LRU, PLRU(A=2), NMRU(A=2): exact per-set model Others: approximate per-set model
23
Overheads 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 23 r = r · B : 6 80 μsec Binomial Poisson approximation for each row of B h = (r · B) · φ : 20 30 μsec Average over 24 configurations B applied 8 times
24
Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 24 The Problem Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study
25
Computation reuse & speedup 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 25 “Shorter” tail smaller matrices r(1) r(2 14 ) r(2 13 ) r(2 12 ) r(2 11 ) r(2 10 ) r(2 14 ) r(2 13 ) r(2 12 ) r(2 11 ) r(1) Now: compute Later: hardware support Size=512 Poisson Approximation ■ ■ ■ ■ … ■ ■ i P(URD=i) r Now
26
Insights 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 26 x y z z y x : URD(x)=2 Unique “remember” addresses Only cardinality, not full addresses Bloom filter for compact (approximate) representation r(2 10 ) is seen by any set of a cache with S=2 10 Filter address stream ■ ■ ■ ■ … ■ ■ i P(URD=i) r
27
Reference address register access insert Set Filter Control Logic filtered access load hit inc reset read 1024-bit Bloom Filter 2 hash fns 9-bit Counter inc 512-entry Histogram array Hardware Support for estimating r(2 10 ) 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 27 Start Sample Addr match? Unique? Remember End Sample N Y (not hit) Y
28
Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 28 The Problem Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study + way counters
29
LRU Way Counters [Suh, et al. 2002] 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 29 One counter per logical way (stack position) Determining logical position is hard not totally (re-)ordered with every access heuristics, e.g., for PLRU [Kedzierski, et al. 2010] Other Limitations Inclusion property Fixed #sets S = S : special case of reuse framework S S ? Use B provided, enough tail of r(S) is available
30
Min. EDP configuration 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 30 EDP within 7% of minimum Reuse models outperform PLRU way counters in most cases
31
Summary 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 31 The Problem: Online miss-rate estimation for reconfigurable caches We propose a framework h = (r · B) · φ h: hit-ratio r: reuse-distance distribution (novel hardware support) B: stochastic Binomial matrix φ: hit function (LRU, PLRU, RANDOM, NMRU) Case study: EDP within 7% of minimum Future work: More policies, applications/case studies
32
Also in the paper 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 32 r: lossy summarization of the address trace Estimation for ARD Optimizations for LRU Conditions for PLRU eviction More details on models & evaluation
33
Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 33 Questions?
34
Example LLC performance 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 34 OLTP (TPC-C + IBM DB2)
35
Estimating cache performance 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 35 Hit ratio = hits/access ∑ P(URD=i) · P(hit|URD=i) = · Miss ratio = misses/access = 1 – hit ratio Miss rate = misses/instruction = miss ratio x access/instruction ■ ■ ■ ■ … ■ ■ i P(URD=i) r … i P(hit|URD=i) φ i
36
URD vs ARD 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 36 xx z0z0 z1z1 z2z2 z3z3 z k-1 {z 0 }*{z 0,z 1 }*{z 0,z 1,z 2 }*{z 0,z 1,z 2,...,z k-1 }* d k = d k-1 +1/ r i k Approximation: ∞ dkdk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.