Download presentation
Presentation is loading. Please wait.
1
Massachusetts Institute of Technology
A New Cache Monitoring Scheme for Memory-Aware Scheduling and Partitioning G. Edward Suh Srinivas Devadas Larry Rudolph Massachusetts Institute of Technology HPCA-8: 1
2
Problem Memory system performance is critical
Everyone thinks about their own application Tuning replacement policies Software/hardware prefetching But modern computer systems execute multiple applications concurrently/simultaneously Time-shared systems Context switches cause cold misses Multiprocessors systems sharing memory hierarchy (SMP, SMT, CMP) Simultaneous applications compete for cache space HPCA-8: 2
3
Solutions: Cache Partitioning & Memory-Aware Scheduling
Explicitly manage cache space allocation amongst concurrent/ simultaneous processes Each process gets different benefit from more cache space Similar to main memory partition (e.g.. Stone 1992) in the old days Memory-Aware Scheduling Choose a set of simultaneous processes to minimize memory/cache contention Schedule for SMT systems (Snavely 2000) Threads interact in various ways (RUU, functional units, caches, etc) Based on executing various schedules and profiling them Admission control for gang scheduling (Batat 2000) Based on the footprint of a job (total memory usage) HPCA-8: 3
4
BUT… Testing many possible schedules not viable
The number of possible schedules increase exponentially as the number of processes increase Need to decide a good schedule from individual process characteristics complexity increases linearly Footprint-based scheduling not enough information Footprint of a process is often larger than the cache Processes may not need the entire working set in the cache Can we find a good schedule for cache performance? What information do we need for each process? HPCA-8: 4
5
Information a Scheduler/Partitioner Needs
Characterizing a process For scheduling and partitioning, need to know the effect of varying cache size Multiple performance numbers for different cache sizes Ignore other effects than cache size Miss-rate curves; m(c) Cache miss-rates as a function of cache size (cache blocks) Assume a process is isolated Assume the cache is FULLY-ASSOCIATIVE Provides essential information for scheduling and partitioning 100 50 0.2 0.4 0.6 0.8 1 Cache Space (%) Miss-rate HPCA-8: 5
6
Using Miss-Rate Curves for Partitioning
What do miss-rate curves tell about cache allocation? 50 100 0.2 0.4 0.6 0.8 1 Cache Space (%) Miss-rate 50 100 0.2 0.4 0.6 0.8 1 Cache Space (%) Miss-rate Process A Process B Cache misses mA(cA)·refA+ mB(cB)·refB Cache Allocation cA cB A B HPCA-8: 6
7
Finding the best allocation
Use marginal gain; g(c) = m(c) ·ref - m(c+1)·ref Gain in the number of misses by increasing the cache space Allocate cache blocks to each process in a greedy manner Guaranteed to result in the optimal partition if m(c) are convex Cache Allocation Initially no cache block is allocated Compare Marginal Gains 987 > 746 Compare Marginal Gains 987 > 1568 Compare Marginal Gains 987 < 2111 Compare Marginal Gains 409 < 746 A Allocate a block to Process A Allocate a block to Process B B B Allocate a block to Process B B Allocate a block to Process B HPCA-8: 7
8
Partitioning Results Partition the L2 cache amongst two simultaneous processes (spec2000 benchmarks: art and mcf ) HPCA-8: 8
9
Intuition for Memory-Aware Scheduling
How to schedule 4 processes on 2 processor system using individual miss-rate curves? Schedule A and C, and B and D together Curves tend to have a knee The amount of cache space where the marginal gain diminishes a lot 100 50 0.2 0.4 0.6 0.8 1 Cache Space (%) Miss-rate 50 100 0.2 0.4 0.6 0.8 1 Cache Space (%) Miss-rate Process A Process B Working set size is larger than the cache for all processes All processes result in similar miss-rate if they have the entire cache Group processes based on the knees 50 100 0.2 0.4 0.6 0.8 1 Cache Space (%) Miss-rate 50 100 0.2 0.4 0.6 0.8 1 Cache Space (%) Miss-rate Process C Process D HPCA-8: 9
10
Determining the Knee of the Curve
Use partitioning technique Available cache resource should be doubled Cache Allocation Cache Allocation However, now we may need multiple time slices to schedule processes (2 time slices in our example) HPCA-8: 10
11
Scheduling Results Schedule 6 SPEC CPU benchmarks for 2 Processors
HPCA-8: 11
12
Analytical Model (ICS`01)
Miss-rate curves (or marginal gains) alone may not be enough for optimizing time-shared systems Partitioning amongst concurrent processes Scheduling considering the effects of context switches Use analytical model to predict cache-sharing effects 32-KB 8-way Set-Associative (bzip2+gcc+swim+mesa+vortex+vpr+twolf+iu) HPCA-8: 12
13
BUT… Processes to execute are only known at run-time
Users decide what applications to run Scheduling/Partitioning decisions should be made at run-time The behavior of a process changes over time Applications have different phases Miss-rates curves (and marginal gains) may change over an execution Cache configurations are different for systems Miss-rate curves (and marginal gains) are different for systems Need an on-line estimation of miss-rate curves (and marginal gains) HPCA-8: 13
14
On-Line Estimation of Marginal Gains: Fully-Associative Caches
Marginal gains can be directly counted based on the temporal ordering of cache blocks (LRU information) Use one counter per each cache block (or a group of cache blocks) and one for counting all accesses Hit on the ith MRU Increment ith counter Example: a FA cache with 4 blocks Increment the 1st Counter Increment the 3th Counter Access Counter 350 2 LRU Order 1 351 2432 2433 2433 2434 912 913 2432 912 722 350 124 Marginal-Gain Counters Hit on the 3rd MRU Cache Block Hit on the MRU Cache Block 1 LRU Order LRU Order 2 LRU Order 3 LRU Order Cache Blocks HPCA-8: 14
15
BUT… Most caches are SET-ASSOCIATIVE
Except main memory Usually up to 8-way associative Set-associative caches only maintain temporal ordering within a set No global temporal ordering Cannot use block-by-block temporal ordering to obtain marginal gains for fully-associative caches HPCA-8: 15
16
Way-Counters Way-Counters
Use the existing LRU information within a set One counter per way (D-way cacehs D counters) Hit on the ith MRU Increment ith counter Each way-counter represents the gain of having S more blocks (S is the number of sets) Increment the 1st Counter Increment the 2nd Counter Hit on the MRU Cache Block Way Counters 4384 376 121 31 Access Counter 5012 1 3 2 377 5014 4385 5013 1 4-way Associative Cache 2 3 … S sets Hit on the 2nd MRU Cache Block HPCA-8: 16
17
Way+Set Counters Use more counters for more detailed information
Maintain the LRU information of sets Hit on the ith MRU way and jth MRU set Increment counter(i,j) Hit on the MRU way the 2nd MRU group Counters 2132 5248 377 1073 283 431 31 Access Counter … 1 2-way Associative Cache … 1 Group 0 Group 1 Group S’ 8 1074 5249 1 Increment the Counter (0,1) Temporal Ordering Of Set Groups HPCA-8: 17
18
Summary Caches should be managed more carefully considering the effect of space/time-sharing Cache Partitioning Memory-Aware Scheduling Miss-rate curves provide very relevant information for scheduling and partitioning Enables us to predict the effect of varying the cache space Useful for any tradeoff between performance and space (power) On-line counters can estimate miss-rate curves at run-time Use the temporal ordering of blocks to predict miss-rates for smaller caches Works for both fully-associative and set-associative caches HPCA-8: 18
19
Partitioning Mechanism
Modify the LRU replacement policy to partition Count the number of cache blocks for each process (XA) Try to match XA to the allocated cache space (DA) Replacement Replace Process A’s LRU block if Replace Process B’s LRU block if Replace the standard LRU block if there is no over-allocated process HPCA-8: 19
20
Scheduling: L2 HPCA-8: 20
21
Way-Counter Implementation
LRU V Tag Data Way LRU of Set1 Way LRU of Set 0 Which Way? Counter(1) Way LRU of Hit Block Counter(2) HIT? Ref HPCA-8: 21
22
Way+Set Counter Implementation
Set LRU LRU V Tag Data Group 1 Way LRU of Set 0 Way LRU of Set1 Which Way? Counter(1) Way LRU of Hit Block Set LRU Ordering Counter(4) HIT? Ref HPCA-8: 22
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.