Download presentation
Presentation is loading. Please wait.
Published byRandall Neal Modified over 9 years ago
1
Lecture 27 Multiprocessor Scheduling
3
Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues related to multi-core: scheduling and scalability
4
The cache coherence problem Since we have multiple private caches: How to keep the data consistent across caches? Each core should perceive the memory as a monolithic array, shared by all the cores
5
The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache Main memory x=15213 multi-core chip
6
The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache Main memory x=15213 multi-core chip assuming write-back caches
7
The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache Main memory x=15213 multi-core chip
8
The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches
9
Solutions for cache coherence There exist many solution algorithms, coherence protocols, etc. A simple solution: Invalidation protocol with bus snooping
10
Inter-core bus Core 1Core 2Core 3Core 4 One or more levels of cache Main memory multi-core chip inter-core bus
11
Invalidation protocol with snooping Invalidation: If a core writes to a data item, all other copies of this data item in other caches are invalidated Snooping: All cores continuously “snoop” (monitor) the bus connecting the cores.
12
The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache Main memory x=15213 multi-core chip
13
The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches INVALIDATED sends invalidation request
14
The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=21660 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches
15
Alternative to invalidate protocol: update protocol Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches broadcasts updated value
16
Alternative to invalidate protocol: update protocol Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=21660 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches broadcasts updated value
17
Invalidation vs update Multiple writes to the same location invalidation: only the first time update: must broadcast each write (which includes new variable value) Invalidation generally performs better: it generates less bus traffic
18
Programmers still Need to Worry about Concurrency Mutex Condition variables Lock-free data structures
19
Single-Queue Multiprocessor Scheduling reuse the basic framework for single processor scheduling put all jobs that need to be scheduled into a single queue pick the best two jobs to run, if there are two CPUs Advantage: simple Disadvantage: does not scale
20
SQMS and Cache Affinity
21
Cache Affinity Thread migration is costly Need to restart the execution pipeline Cached data is invalidated OS scheduler tries to avoid migration as much as possible: it tends to keeps a thread on the same core
22
SQMS and Cache Affinity.
23
Multi-Queue Multiprocessor Scheduling Scalable Cache affinity
24
Load Imbalance Migration
25
Work Stealing A (source) queue that is low on jobs will occasionally peek at another (target) queue If the target queue is (notably) more full than the source queue, the source will “steal” one or more jobs from the target to help balance load Cannot look around at other queues too often
26
Linux Multiprocessor Schedulers Both approaches can be successful O(1) scheduler Completely Fair Scheduler (CFS) BF Scheduler (BFS), uses a single queue
27
An Analysis of Linux Scalability to Many Cores This paper asks whether traditional kernel designs can be used and implemented in a way that allows applications to scale
28
Amdahl's Law N: the number of threads of execution B: the fraction of the algorithm that is strictly serial the theoretical speedup:
29
Scalability Issues Global lock used for a shared data structure longer lock wait time Shared memory location overhead caused by the cache coherency algorithms Tasks compete for limited size-shared hardware cache increased cache miss rates Tasks compete for shared hardware resources (interconnects, DRAMinterfaces) more time wasted waiting Too few available tasks: less efficiency
30
How to avoid/fix These issues can often be avoided (or limited) using popular parallel programming techniques Lock-free algorithms Per-core data structures Fine-grained locking Cache-alignment Sloppy Counters
32
Current bottlenecks https://www.usenix.org/conference/osdi10/analysis- linux-scalability-many-cores
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.