1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

Slides:



Advertisements
Similar presentations
The Interaction of Simultaneous Multithreading processors and the Memory Hierarchy: some early observations James Bulpin Computer Laboratory University.
Advertisements

CPU Scheduling Tanenbaum Ch 2.4 Silberchatz and Galvin Ch 5.
Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Hikmet Aras
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.
Scheduling. Main Points Scheduling policy: what to do next, when there are multiple threads ready to run – Or multiple packets to send, or web requests.
Review: Chapters 1 – Chapter 1: OS is a layer between user and hardware to make life easier for user and use hardware efficiently Control program.
CS 7810 Lecture 20 Initial Observations of the Simultaneous Multithreading Pentium 4 Processor N. Tuck and D.M. Tullsen Proceedings of PACT-12 September.
Tao Yang, UCSB CS 240B’03 Unix Scheduling Multilevel feedback queues –128 priority queues (value: 0-127) –Round Robin per priority queue Every scheduling.
Cs238 CPU Scheduling Dr. Alan R. Davis. CPU Scheduling The objective of multiprogramming is to have some process running at all times, to maximize CPU.
USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.
EECC722 - Shaaban #1 Lec # 4 Fall Operating System Impact on SMT Architecture The work published in “An Analysis of Operating System Behavior.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
MULTIPROCESSOR SYSTEMS OUTLINE  Coordinated job Scheduling  Separate Systems  Homogeneous Processor Scheduling  Master/Slave Scheduling.
SyNAR: Systems Networking and Architecture Group Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Presenter: Alexandra Fedorova Simon.
1Chapter 05, Fall 2008 CPU Scheduling The CPU scheduler (sometimes called the dispatcher or short-term scheduler): Selects a process from the ready queue.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
1 Scheduling Processes. 2 Processes Each process has state, that includes its text and data, procedure call stack, etc. This state resides in memory.
Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.
Chapter 5 – CPU Scheduling (Pgs 183 – 218). CPU Scheduling  Goal: To get as much done as possible  How: By never letting the CPU sit "idle" and not.
Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Scheduling. Alternating Sequence of CPU And I/O Bursts.
Multiprocessor and Real-Time Scheduling Chapter 10.
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Authors: Matthew DeVuyst, Rakesh Kumar, and Dean M. Tullsen.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Lecture Topics: 11/15 CPU scheduling: –Scheduling goals and algorithms.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Chapter 3: Processes. 3.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 7, 2006 Chapter 3: Processes Process Concept.
Concepts and Structures. Main difficulties with OS design synchronization ensure a program waiting for an I/O device receives the signal mutual exclusion.
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
Uniprocessor Process Management & Process Scheduling Department of Computer Science Southern Illinois University Edwardsville Spring, 2016 Dr. Hiroshi.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Scheduling.
CPU Scheduling Scheduling processes (or kernel-level threads) onto the cpu is one of the most important OS functions. The cpu is an expensive resource.
lecture 5: CPU Scheduling
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Scheduling of Non-Real-Time Tasks in Linux (SCHED_NORMAL/SCHED_OTHER)
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Operating Systems (CS 340 D)
Processes and Threads Processes and their scheduling
Computer Structure Multi-Threading
Cache Memory Presentation I
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Operating Systems (CS 340 D)
Lecture 21: Introduction to Process Scheduling
Symmetric Multiprocessing (SMP)
Operating systems Process scheduling.
Chapter 5: CPU Scheduling
CPU SCHEDULING.
Fast Communication and User Level Parallelism
Multiprocessor and Real-Time Scheduling
Chapter 6: CPU Scheduling
Presented by Neha Agrawal
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Lecture 21: Introduction to Process Scheduling
Uniprocessor Process Management & Process Scheduling
Operating System Overview
Uniprocessor Process Management & Process Scheduling
Don Porter Portions courtesy Emmett Witchel
Presentation transcript:

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003

Outline  Multiprocessor Systems –Issues in MP Scheduling –How to Allocate Processors –Cache Affinity –Linux MP Scheduling  Simultaneous Multithreaded Systems –Issues in SMT Scheduling –Symbiotic Jobscheduling –SMT and Priorities –Linux SMT Scheduling  Conclusions

Multiprocessor Systems  Symmetric Multiprocessing (SMP): –One copy of OS in memory, any CPU can use it –OS must ensure that multiple processors cannot access shared data structures at the same time Shared Memory CPU Shared Memory Multiprocessors

Issues in MP Scheduling  Starvation –Number of active parallel threads < number of allocated processors  Overhead –CPU time used to transfer and start various portions of the application  Contention –Multiple threads attempt to use same shared resource  Latency –Delay in communication between processors and I/O devices

How to allocate processors  Allocate proportional to average parallelism  Other factors: –System load –Variable parallelism –Min/Max parallelism  Acquire/relinquish processors based on current program needs

Cache Affinity  While a program runs, data needed is placed in local cache  When job is rescheduled, it will likely access some of the same data  Scheduling jobs where they have “affinity” improves performance by reducing cache penalties

Cache Affinity (cont)  Tradeoff between processor reallocation and cost of reallocation –Utilization versus cache behavior  Scheduling policies: –Equipartition: constant number of processors allocated evenly to all jobs. Low overhead. –Dynamic: constantly reallocates jobs to maximize utilization. High utilization.

Cache Affinity (cont)  Vaswani and Zahoran, 1991 –When a processor becomes available, allocate it to runnable process that was last run on processor, or higher priority job –If a job requests additional processors, allocate critical tasks on processor with highest affinity –If an allocated processor becomes idle, hold it for a small amount of time in case task with affinity comes along

Vaswani and Zahoran, 1991  Results showed that utilization was dominant effect on performance, not cache affinity –But their algorithm did not degrade performance  Predicted that as processor speeds increase, significance of cache affinity will also increase  Later studies validated their predictions

Linux 2.5 MP Scheduling  Each processor responsible for scheduling own tasks –schedule()  After process switch, check if new process should be transferred to other CPU running lower priority task –reschedule_idle()  Cache affinity –Affinity mask stored in /proc/pid/affinity –sched_setaffinity(), sched_getaffinity()

What is SMT?  Simultaneous Multithreading – aka HyperThreading ®  Issue instructions from multiple threads simultaneously on a superscalar processor ALUFPUBPMem Thread 1 Thread 2 Time

Why SMT?  Technique to exploit parallelism in and between programs with minimal additions in chip resources  Operating system treats SMT processor as two separate processors* Operating System Thread 2 Thread 1 Operating System Processor 1 Processor 2

Issues With SMT Scheduling  *Not really separate processors: –Share same caches  MP scheduling attempts to avoid idle processors –SMT-aware scheduler must differentiate between physical and logical processors

Symbiotic Jobscheduling  Recent studies from U of Washington –Origin of early research into SMT  OS coschedules jobs to run on hardware threads  # of coscheduled jobs <= SMT level  Occasionally swap out running set to ensure fairness

Symbiotic Jobscheduling (cont)  Shared system resources: –Functional units, caches, TLB’s, etc…  Coscheduled jobs may interact well… –Few resource conflicts, high utilization  Or they may interact poorly –Many resource conflicts, lower utilization  Choice of coscheduled jobs can have large impact on system performance

Symbiotic Jobscheduling (cont)  Improve symbiosis by coscheduling jobs that get along well  Two phases of SOS (Sample, Optimize, Symbios) jobscheduler: –Sample – Gather data on current performance –Symbios – Use computed scheduling configuration

Symbiotic Jobscheduling (cont)  Sample phase: –Periodically alter coscheduled job mix –Record system utilization from hardware performance counter registers  Symbios phase: –Pick job mix that had the highest utilization  Trade-off between sampling often or infrequently

How to Measure Utilization?  IPC not necessarily best predictor: –IPC can have high variations throughout process –High-IPC threads may unfairly take system resources from low-IPC threads  Other predictors: low # conflicts, high cache hit rate, diverse instruction mix  Balance: schedule with lowest deviation in IPC between coschedules is considered best

What About Priorities?  Scheduler estimates the “natural” IPC of job  If a high-priority jobs is not meeting the desired IPC, it will be exclusively scheduled on CPU  Provides a truer implementation of priority: –Normal schedulers only guarantee proportional resource sharing, assumes no interaction between jobs

Another Priority Algorithm:  SMT hardware fetches instructions to issue from queue  Scheduler can bias fetching algorithm to give preference to high-priority threads  Hardware already exists, minimal modifications

Symbiosis Performance Results  Without priorities: –Up to 17% improvement  Software-enforced priorities: –Up to 20%, average 8%  Hardware-based priorities: –Up to 30%, average 15%

Linux 2.5 SMT Scheduling  Immediate reschedule forced when HT CPU is executing two idle processes  HT-aware affinity: processes prefer same physical CPU  HT-aware load-balancing: distinguish logical and physical CPU in resource allocation

Conclusions  Intelligent allocation of resources can improve performance in parallel systems  Dynamic scheduling of processors in MP systems produces better utilization as processor speeds increase –Cache affinity can help improve throughput  Symbiotic coscheduling of tasks in SMT systems can improve average response time

Resources  Kenneth Sevcik, “Characterizations of Parallelism in Applications and Their Use in Scheduling”  Raj Vaswani and John Zahoran, “The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors”  Allan Snavely et al., “Symbiotic Jobscheduling with Priorities for a Simultaneous Multithreading Processor”  Linux MP cache affinity,  Linux Hyperthreading Scheduler, y/Hyperthread_Scheduler_Modifications.html y/Hyperthread_Scheduler_Modifications.html y/Hyperthread_Scheduler_Modifications.html  Daniel Bovet and Marco Cesati, Understanding the Linux Kernel