Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling Squillante & Lazowska, IEEE TPDS 4(2), February 1993.

Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling Squillante & Lazowska, IEEE TPDS 4(2), February 1993

Affinity ● On which processor should the next ready task run? ● Might be more efficient to choose one over another (but what does “efficient” mean) ● Affinity captures this notion of efficiency ● What is it? ● M-w.com: “sympathy marked by community of interest” ● Processor speed/type, resource availability ● This paper considers affinity based on processor caches

Cache affinity ● What happens when a task starts on a processor? ● A set of cache misses ● Number of misses depends on amount of task working set in cache ● Cache sizes trending upward => longer reload times when tasks are scheduled ● Also performance hits due to bus contention, write- invalidations ● How to reduce cache misses? ● Run task on processor with most “affinity” ● Why not just glue a task to a processor?

Analyzing cache affinity ● This paper explores the solution space in order to gain understanding ● Analytically model cache reload times ● Determine how different scheduling policies perform with affinity information ● Propose policies that make use of affinity information

Cache reload time ● Is it significant? ● Well, the paper got published, so…. ● Intuitively, we believe this might be true, but need evidence ● Experiments ● Task execution time on cold cache vs. warm cache up to 69% worse ● When bus contention and write-invalidations are considered, up to 99% worse ● Rising cache sizes, cache-miss costs… ● Why do cache sizes keep going up, anyway?

Modeling cache behavior ● Terminology ● Cache-reload transient: time delay due to initial burst of cache misses ● Footprint: group of cache blocks in active use by a task ● Closed queuing network model used to model system ● M processors, N tasks, exponential random distributions ● Assumes that cache footprints remain fairly static (in a single “footprint phase”)

Cache-reload transients ● How much of task T’s footprint must be reloaded when a task is rescheduled on a processor P? ● How much footprint got evicted since T last ran? ● How many tasks ran on P since T last ran? ● Expected cache-reload miss ratio for T ● How much of T’s footprint must be reloaded when scheduled on P, as a function of the number of other tasks executed on P since T last ran ● This is a function of two random variables and the footprint size ● Ratio increases rapidly with number of intervening tasks ● Effective scheduling intervention can only happen early if at all ● Bus interference depends on scheduling policy

Scheduling policies ● Abstract policies for evaluation of affinity ● FCFS – ignore affinity, use first available CPU ● Fixed – tasks permanently assigned to one CPU ● Last processor – simple affinity, CPUs look for tasks they’ve run before ● Minimum intervening – each CPU remembers number of intervening tasks since T, choose min ● Limited minimum intervening – only consider a subset of CPUs ● LMI-Routing – min( number of intervening tasks + number of tasks already assigned to that CPU )

Evaluation ● Vary CRT for heavy/light loads, measure throughput ● FCFS only good for light load, low CRT ● FP not good for light loads, but as load/CRT increase, CRT dominates load-balancing penalties ● LP very similar to FCFS on light loads, and almost as good as FP for heavy loads ● Even simple affinity information is beneficial ● Others ● MI better than LP, but requires more state ● LMI requires less state than MI, performance almost as good ● Both MI/LMI ignore fairness, though ● LMIR reduces variance in response time, improving fairness, throughput similar to MI

Bus traffic evaluation ● Bus contention occurs when tasks are switched ● So minimizing CRT is important ● LP directly minimizes CRT ● Not much better performance than FCFS at light loads ● Under heavy load, very significant improvement over FCFS ● Much higher CRT penalties at heavy load in FCFS

Practical policies ● Queue-based ● Use different task queues to represent affinity information ● Priority-based ● Use affinity information as a component in computing task-priority ● Expensive at runtime – precompute a table of expected CRTs indexed by footprint size

Conclusions ● As with everything else in CS, there are tradeoffs ● Amount of affinity state vs. marginal effect on performance ● “Greedy” schedulers (low CRT) give high throughput and low response times, but can be unfair & produce high variance in response time ● Adaptive behavior is important ● Footprint size, system load ● A good example of an “understanding” paper

Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling Squillante & Lazowska, IEEE TPDS 4(2), February 1993.

Similar presentations

Presentation on theme: "Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling Squillante & Lazowska, IEEE TPDS 4(2), February 1993."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling Squillante & Lazowska, IEEE TPDS 4(2), February 1993.

Similar presentations

Presentation on theme: "Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling Squillante & Lazowska, IEEE TPDS 4(2), February 1993."— Presentation transcript:

Similar presentations

About project

Feedback