Presentation is loading. Please wait.

Presentation is loading. Please wait.

\course\cpeg324-05F\Topic7c

Similar presentations


Presentation on theme: "\course\cpeg324-05F\Topic7c"— Presentation transcript:

1 \course\cpeg324-05F\Topic7c
Cache Design Cache parameters (organization and placement) Cache replacement policy Cache performance evaluation method 2019/1/2 \course\cpeg324-05F\Topic7c

2 \course\cpeg324-05F\Topic7c
Cache Parameters Cache size : Scache (lines) Set number: N (sets) Line number per set: K (lines/set) Scache = KN (lines) = KN * L (bytes) here L is line size in bytes K-way set-associative 2019/1/2 \course\cpeg324-05F\Topic7c

3 Trade-offs in Set-Associativity
Full-associative: Higher hit ratio, concurrent search, but slow access when associativity is large; Direct mapping: fast access (if hits) and simplicity for comparison trivial replacement alg. Also, if alternatively use 2 blocks which mapped into the same cache block frame: “trash” may happen. 2019/1/2 \course\cpeg324-05F\Topic7c

4 \course\cpeg324-05F\Topic7c
Note Main memory size: Smain (blocks) Cache memory Size: Scache (blocks) Let P = Since P >>1, average search length is much greater than 1. Set-associativity provides a trade-off between concurrency in search average search/access time per block Smain Scache You need search! 2019/1/2 \course\cpeg324-05F\Topic7c

5 \course\cpeg324-05F\Topic7c
1 N Scache < Full associative Set Direct Mapped Set # 2019/1/2 \course\cpeg324-05F\Topic7c

6 Important Factors in Cache Design
Address partitioning strategy (3-dimention freedom) Total cache size/memory size Work load 2019/1/2 \course\cpeg324-05F\Topic7c

7 \course\cpeg324-05F\Topic7c
Address Partitioning M bits Log N Log L Set number address in a line Byte addressing mode Cache memory size data part = NKL (bytes) Directory size (per entry) M - log2N - log2L Reduce clustering (randomize accesses) set size 2019/1/2 \course\cpeg324-05F\Topic7c

8 \course\cpeg324-05F\Topic7c
Note: The exists a knee 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Cache Size 0.34 Miss Ratio General Curve Describing Cache Behavior 2019/1/2 \course\cpeg324-05F\Topic7c

9 \course\cpeg324-05F\Topic7c
…the data are sketchy and highly dependent on the method of gathering... … designer must make critical choices using a combination of “hunches, skills, and experience” as supplement… “a strong intuitive feeling concerning a future event or result.” 2019/1/2 \course\cpeg324-05F\Topic7c

10 \course\cpeg324-05F\Topic7c
Basic Principle Typical workload study + intelligent estimate of others Good Engineering: small degree over-design “30% rule” Each doubling of the cache size reduces misses by 30% by A. Smith It is a rough estimate only 2019/1/2 \course\cpeg324-05F\Topic7c

11 \course\cpeg324-05F\Topic7c
Cache Design Process “Typical”, not “Standard” “Sensitive” to: Price-performance -- Technology main M access time cache access chip density bus speed on-chip cache 2019/1/2 \course\cpeg324-05F\Topic7c

12 Cache Design Process Choose cache size fix K, L, varying N
Pick k = (likely k = 1) K is small Choose line size L for fix cache size and K, varying L Use new k Choose associativity K fix cache size and L, varying K If k = old 2019/1/2 \course\cpeg324-05F\Topic7c

13 Step 1 : Choose Size fix K, L, varying N
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Relative Number of Misses Cache Size (N) 2019/1/2 \course\cpeg324-05F\Topic7c

14 Step 2 : Choose L fix NKL = size and K, varying L
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Relative Number of Misses When size and K is fixed N * L = constant but, use small L increase N, hence increase the directory size. K Cache Line Size (L) 2019/1/2 N \course\cpeg324-05F\Topic7c

15 Step 2 : Choose K fix NKL = size and L, varying K
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Relative Number of Misses When size and K is fixed N * L = constant but, use small L increase N, hence increase the directory size. K Cache Associativity Factor (K) 2019/1/2 N \course\cpeg324-05F\Topic7c

16 \course\cpeg324-05F\Topic7c
N: Set number Cache directory# = N K Cache size = N K L Constraints in selection of N: (page size) 2019/1/2 \course\cpeg324-05F\Topic7c

17 \course\cpeg324-05F\Topic7c
K: Associativity Bigger miss ratio Smaller is better in: faster Cheaper 4 ~ 8 get best miss ratio simpler 2019/1/2 \course\cpeg324-05F\Topic7c

18 \course\cpeg324-05F\Topic7c
L : Line Size Atomic unit of transmission Miss ratio Smaller Larger average delay Less traffic Larger average hardware cost for associative search Larger possibility of “Line crossers” Workload dependent 16 ~ 128 byte 2019/1/2 \course\cpeg324-05F\Topic7c

19 Cache Replacement Policy
FIFO (first-in-first-out) LRU (least-recently used) OPT (furthest-future used) do not retain lines that have next occurrence in the most distant future Note: LRU performance is close to OPT for frequently encountered program structures. 2019/1/2 \course\cpeg324-05F\Topic7c

20 \course\cpeg324-05F\Topic7c
Program Structure for i = to n for j = to n endfor last-in-first-out feature makes the recent past likes the near future …. 2019/1/2 \course\cpeg324-05F\Topic7c

21 \course\cpeg324-05F\Topic7c
B C A D E F G H Nearest Future Access Furthest [b] A B C D E F G H Nearest Future Access Furthest [a] B Z C A D E F G Nearest Future Access Furthest [c] An eight-way cache directory maintained with the OPT policy: [a] Initial state for future reference-string AZBZCADEFGH; [b] After the cache hit the Line A; and [c] After the cache miss to Line Z. 2019/1/2 \course\cpeg324-05F\Topic7c

22 Why LRU and OPT are Close to Each Other?
LRU : look only at past OPT : look only at future But, recent past nearest future Why? (Consider nested loops) ~ 2019/1/2 \course\cpeg324-05F\Topic7c

23 \course\cpeg324-05F\Topic7c
Problem with LRU Not good in mimic sequential/cyclic Example ABCDEF ABC…… ABC…… With a set size of 3 2019/1/2 \course\cpeg324-05F\Topic7c

24 \course\cpeg324-05F\Topic7c
Sequential Access OPT A B C D E F G A B C……G ABC…G ABC…G ABC A B C LRU D E F …... 2019/1/2 \course\cpeg324-05F\Topic7c

25 Empirical Data OPT can gain about 10% ~ 30% improvement over LRU
(in terms of miss reduction) 2019/1/2 \course\cpeg324-05F\Topic7c

26 \course\cpeg324-05F\Topic7c
A Comparison OPT has two candidates for replacement LRU only has one the least-recently used -- it never replaces the most recently referenced deadline in LRU the most recently referenced the furthest to be referenced in the future Example: Consider: line A is discarded by OPT, but retained by LRU then A is dead for LRU until it is replaced since A is most recently referenced, than k-1 more miss is generated before A can be ejected effective associative research is reduced by 1 2019/1/2 \course\cpeg324-05F\Topic7c

27 Performance Evaluation Methods for Workload
Analytical modeling Simulation Measuring 2019/1/2 \course\cpeg324-05F\Topic7c

28 Cache Analysis Methods
Hardware monitoring fast and accurate not fast enough (for high-performance machines) cost flexibility/repeatability 2019/1/2 \course\cpeg324-05F\Topic7c

29 Cache Analysis Methods
cont’d Cache Analysis Methods Address traces and machine simulator slow accuracy/fidelity cost advantage flexibility/repeatability OS/other impacts - how to put them in? 2019/1/2 \course\cpeg324-05F\Topic7c

30 Trace Driven Simulation for Cache
Workload dependence difficulty in characterizing the load no general accepted model Effectiveness possible simulation for many parameters repeatability 2019/1/2 \course\cpeg324-05F\Topic7c

31 Problem in Address Traces
Representative of the actual workload (hard) only cover milliseconds of real workload diversity of user programs Initialization transient use long enough traces to absorb the impact Inability to properly model multiprocessor effects 2019/1/2 \course\cpeg324-05F\Topic7c

32 \course\cpeg324-05F\Topic7c
An Example Assume a two-way associative cache with 256 sets Scache = 2 x 256 lines Assume that the difficulties of count or not count the initialization causes 512 more misses than actually required Assume a trace of length 100,000 with hit rate 0.99 than 1000 misses is generated the 512 makes big difference!! If want 512 miss count less than 5% then total misses = 512/5% = 10,240 miss thus with hit = required trace length > 1,024,000! 2019/1/2 \course\cpeg324-05F\Topic7c

33 One may not know the cache parameters before hand
What to do?? Make it longer than minimum acceptable length! 2019/1/2 \course\cpeg324-05F\Topic7c

34 \course\cpeg324-05F\Topic7c
100,000? too small (10 ~ 100) x 106 OK? 1000 x 106 or more being used now .. Note: You need to produce enough # of misses to get the precision for necessary # of misses. Each quachaping of the cache, the trace length increases roughly a factor of 8. For a 2 million byte caches million refs!! 2019/1/2 \course\cpeg324-05F\Topic7c


Download ppt "\course\cpeg324-05F\Topic7c"

Similar presentations


Ads by Google