1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003

Outline  Multiprocessor Systems –Issues in MP Scheduling –How to Allocate Processors –Cache Affinity –Linux MP Scheduling  Simultaneous Multithreaded Systems –Issues in SMT Scheduling –Symbiotic Jobscheduling –SMT and Priorities –Linux SMT Scheduling  Conclusions

Multiprocessor Systems  Symmetric Multiprocessing (SMP): –One copy of OS in memory, any CPU can use it –OS must ensure that multiple processors cannot access shared data structures at the same time Shared Memory CPU Shared Memory Multiprocessors

Issues in MP Scheduling  Starvation –Number of active parallel threads < number of allocated processors  Overhead –CPU time used to transfer and start various portions of the application  Contention –Multiple threads attempt to use same shared resource  Latency –Delay in communication between processors and I/O devices

How to allocate processors  Allocate proportional to average parallelism  Other factors: –System load –Variable parallelism –Min/Max parallelism  Acquire/relinquish processors based on current program needs

Cache Affinity  While a program runs, data needed is placed in local cache  When job is rescheduled, it will likely access some of the same data  Scheduling jobs where they have “affinity” improves performance by reducing cache penalties

Cache Affinity (cont)  Tradeoff between processor reallocation and cost of reallocation –Utilization versus cache behavior  Scheduling policies: –Equipartition: constant number of processors allocated evenly to all jobs. Low overhead. –Dynamic: constantly reallocates jobs to maximize utilization. High utilization.

Cache Affinity (cont)  Vaswani and Zahoran, 1991 –When a processor becomes available, allocate it to runnable process that was last run on processor, or higher priority job –If a job requests additional processors, allocate critical tasks on processor with highest affinity –If an allocated processor becomes idle, hold it for a small amount of time in case task with affinity comes along

Vaswani and Zahoran, 1991  Results showed that utilization was dominant effect on performance, not cache affinity –But their algorithm did not degrade performance  Predicted that as processor speeds increase, significance of cache affinity will also increase  Later studies validated their predictions

Linux 2.5 MP Scheduling  Each processor responsible for scheduling own tasks –schedule()  After process switch, check if new process should be transferred to other CPU running lower priority task –reschedule_idle()  Cache affinity –Affinity mask stored in /proc/pid/affinity –sched_setaffinity(), sched_getaffinity()

What is SMT?  Simultaneous Multithreading – aka HyperThreading ®  Issue instructions from multiple threads simultaneously on a superscalar processor ALUFPUBPMem Thread 1 Thread 2 Time

Why SMT?  Technique to exploit parallelism in and between programs with minimal additions in chip resources  Operating system treats SMT processor as two separate processors* Operating System Thread 2 Thread 1 Operating System Processor 1 Processor 2

Issues With SMT Scheduling  *Not really separate processors: –Share same caches  MP scheduling attempts to avoid idle processors –SMT-aware scheduler must differentiate between physical and logical processors

Symbiotic Jobscheduling  Recent studies from U of Washington –Origin of early research into SMT  OS coschedules jobs to run on hardware threads  # of coscheduled jobs <= SMT level  Occasionally swap out running set to ensure fairness

Symbiotic Jobscheduling (cont)  Shared system resources: –Functional units, caches, TLB’s, etc…  Coscheduled jobs may interact well… –Few resource conflicts, high utilization  Or they may interact poorly –Many resource conflicts, lower utilization  Choice of coscheduled jobs can have large impact on system performance

Symbiotic Jobscheduling (cont)  Improve symbiosis by coscheduling jobs that get along well  Two phases of SOS (Sample, Optimize, Symbios) jobscheduler: –Sample – Gather data on current performance –Symbios – Use computed scheduling configuration

Symbiotic Jobscheduling (cont)  Sample phase: –Periodically alter coscheduled job mix –Record system utilization from hardware performance counter registers  Symbios phase: –Pick job mix that had the highest utilization  Trade-off between sampling often or infrequently

How to Measure Utilization?  IPC not necessarily best predictor: –IPC can have high variations throughout process –High-IPC threads may unfairly take system resources from low-IPC threads  Other predictors: low # conflicts, high cache hit rate, diverse instruction mix  Balance: schedule with lowest deviation in IPC between coschedules is considered best

What About Priorities?  Scheduler estimates the “natural” IPC of job  If a high-priority jobs is not meeting the desired IPC, it will be exclusively scheduled on CPU  Provides a truer implementation of priority: –Normal schedulers only guarantee proportional resource sharing, assumes no interaction between jobs

Another Priority Algorithm:  SMT hardware fetches instructions to issue from queue  Scheduler can bias fetching algorithm to give preference to high-priority threads  Hardware already exists, minimal modifications

Symbiosis Performance Results  Without priorities: –Up to 17% improvement  Software-enforced priorities: –Up to 20%, average 8%  Hardware-based priorities: –Up to 30%, average 15%

Linux 2.5 SMT Scheduling  Immediate reschedule forced when HT CPU is executing two idle processes  HT-aware affinity: processes prefer same physical CPU  HT-aware load-balancing: distinguish logical and physical CPU in resource allocation

Conclusions  Intelligent allocation of resources can improve performance in parallel systems  Dynamic scheduling of processors in MP systems produces better utilization as processor speeds increase –Cache affinity can help improve throughput  Symbiotic coscheduling of tasks in SMT systems can improve average response time

Resources  Kenneth Sevcik, “Characterizations of Parallelism in Applications and Their Use in Scheduling”  Raj Vaswani and John Zahoran, “The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors”  Allan Snavely et al., “Symbiotic Jobscheduling with Priorities for a Simultaneous Multithreading Processor”  Linux MP cache affinity, http://www.tech9.net/rml/linux http://www.tech9.net/rml/linux  Linux Hyperthreading Scheduler, http://www.kernel.org/pub/linux/kernel/people/rust y/Hyperthread_Scheduler_Modifications.html http://www.kernel.org/pub/linux/kernel/people/rust y/Hyperthread_Scheduler_Modifications.html http://www.kernel.org/pub/linux/kernel/people/rust y/Hyperthread_Scheduler_Modifications.html  Daniel Bovet and Marco Cesati, Understanding the Linux Kernel

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

Similar presentations

Presentation on theme: "1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

Similar presentations

Presentation on theme: "1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003."— Presentation transcript:

Similar presentations

About project

Feedback