Synchronization and Scheduling in Multiprocessor Operating Systems

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Multiple Processor Systems
Chapter 1 Introduction Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Introduction Abstract Views of an Operating System.
L.N. Bhuyan Adapted from Patterson’s slides
The University of Adelaide, School of Computer Science
Distributed Systems CS
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Background Computer System Architectures Computer System Software.
Chapter 5 Processes and Threads Copyright © 2008.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
G Robert Grimm New York University Disco.
Chapter 17 Parallel Processing.
CS533 Concepts of Operating Systems Class 2 Thread vs Event-Based Programming.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
Chapter 3 Overview of Operating Systems Copyright © 2008.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Computer System Architectures Computer System Software
ITEC 325 Lecture 29 Memory(6). Review P2 assigned Exam 2 next Friday Demand paging –Page faults –TLB intro.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Chapter 5 – CPU Scheduling (Pgs 183 – 218). CPU Scheduling  Goal: To get as much done as possible  How: By never letting the CPU sit "idle" and not.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Supporting Multi-Processors Bernard Wong February 17, 2003.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Martin Kruliš by Martin Kruliš (v1.1)1.
Kernel Synchronization in Linux Uni-processor and Multi-processor Environment By Kathryn Bean and Wafa’ Jaffal (Group A3)
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Low Overhead Real-Time Computing General Purpose OS’s can be highly unpredictable Linux response times seen in the 100’s of milliseconds Work around this.
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
The University of Adelaide, School of Computer Science
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Multiple processor systems
Distributed Processors
Lecture 21 Synchronization
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Chapter 15 – Multiprocessor Management
The University of Adelaide, School of Computer Science
Multiple Processor Systems
High Performance Computing
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
CSC Multiprocessor Programming, Spring, 2011
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Synchronization and Scheduling in Multiprocessor Operating Systems Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Introduction Architecture of Multiprocessor Systems Issues in Multiprocessor Operating Systems Kernel Structure Process Synchronization Process Scheduling Case Studies Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 2

Architecture of Multiprocessor Systems Performance of uniprocessor systems depends on CPU and memory performance, and Caches Further improvements in system performance can be obtained only by using multiple CPUs Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 3

Architecture of Multiprocessor Systems (continued) Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 4

Architecture of Multiprocessor Systems (continued) Use of a cache coherence protocol is crucial to ensure that caches do not contain stale copies of data Snooping-based approach (bus interconnection) CPU snoops on the bus to analyze traffic and eliminate stale copies Write-invalidate variant At a write, CPU updates memory and invalidates copies in other caches Directory-based approach Directory contains information about copies in caches TLB coherence is an analogous problem Solution: TLB shootdown action Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 5

Architecture of Multiprocessor Systems (continued) Multiprocessor Systems are classified according to the manner of associating CPUs and memory units Uniform memory access (UMA) architecture Previously called tightly coupled multiprocessor Also called symmetrical multiprocessor (SMP) Examples: Balance system and VAX 8800 Nonuniform memory access (NUMA) architecture Examples: HP AlphaServer and IBMNUMA-Q No-remote-memory-access (NORMA) architecture Example: Hypercube system by Intel Is actually a distributed system (discussed later) Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 6

Architecture of Multiprocessor Systems (continued) Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 7

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008

SMP Architecture Popularly use a bus or a cross-bar switch as the interconnection network Only one conversation can be in progress over the bus at any time; other conversations are delayed CPUs face unpredictable delays in accessing memory Bus may become a bottleneck With a cross-bar switch, performance is better Switch delays are also more predictable Cache coherence protocols add to the delays SMP systems do not scale well beyond a small number of CPUs Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 9

NUMA Architecture Actual performance of a NUMA system depends on the nonlocal memory accesses made by processes Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10

Issues in Multiprocessor Operating Systems Synchronization and scheduling algorithms should be scalable, so that system performance does not degrade with a growth in its size Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 11

Kernel Structure Kernel of a multiprocessor OS (SMP architecture) is called an SMP kernel Any CPU can execute code in the kernel, and many CPUs could do so in parallel Based on two fundamental provisions: Kernel is reentrant CPUs coordinate their activities through synchronization and interprocessor interrupts Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 12

Kernel Structure: Synchronization Mutex locks for synchronization Locking can be coarse-grained or fine-grained Tradeoffs: simplicity vs. loss of parallelism Deadlocks are an issue in fine-grained locking Parallelism can be ensured without substantial locking overhead: Use of separate locks for kernel functionalities Partitioning of the data structures of a kernel functionality Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 13

Kernel Structure: Heap Management Parallelism in heap management can be provided by maintaining several free lists Locking is unnecessary if each CPU has its own free list Would degrade performance Allocation decisions would not be optimal Alternative: separate free lists to hold free memory areas of different sizes CPU locks an appropriate free list Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 14

Kernel Structure: Scheduling Suffers from heavy contention for mutex locks Lrq and Lawt because every CPU needs to set/release these locks while scheduling Alternative: Partition processes into subsets and entrust each subset to a CPU for scheduling Fast scheduling but suboptimal performance An SMP kernel provides graceful degradation Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 15

Kernel Structure: NUMA Kernel CPUs in NUMA systems have different memory access times for local and nonlocal memory Each node in a NUMA system has its own separate kernel Exclusively schedules processes whose address spaces are in local memory of the node Concept can be generalized: An application region ensures good performance of an application. It has A resource partition with one or more CPUs An instance of the kernel Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 16

Process Synchronization Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 17

Process Synchronization (continued) Queued locks may not be scalable In NUMA, spin locks may lead to lock starvation Sleep locks may be preferred to spin locks if the memory or network traffic densities are high Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 18

Special Hardware for Process Synchronization The Sequent Balance system uses a special bus called system link and interface controller (SLIC) for synchronization Special 64-bit register in each CPU in the system Each bit implements a spin lock using SLIC Spinning doesn’t generate memory/network traffic Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 19

A Scalable Software Scheme for Process Synchronization NUMA and NORMA architectures Scalable performance Minimizes synchronization traffic to nonlocal memory units (NUMA) and over network (NORMA) Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 20

Process Synchronization (continued) Scheduling aware synchronization Adaptive lock A process waiting for this lock spins if holder of the lock is scheduled to run in parallel Otherwise, the process is preempted and queued as in a queued lock Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere 21

Process Scheduling CPU scheduling decisions affect performance How, when and where to schedule processes Affinity scheduling: schedule a process on a CPU where it has executed in the past Good cache hit ratio Interferes with load balancing across CPUs In SMP kernel CPUs can perform own scheduling Prevents kernel from becoming bottleneck Leads to scheduling anomalies Correcting requires shuffling of processes Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 22

Example: Process Shuffling in an SMP Kernel Process shuffling can be implemented by using the assigned workload table AWT and the interprocessor interrupt (IPI) However, it leads to high scheduling overhead Effect is more pronounced in a system containing a large number of CPUs Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 23

Process Scheduling (continued) Processes of an application should be scheduled on different CPUs at the same time if they use spin locks for synchronization Called coscheduling or gang scheduling A different approach is required when processes exchange messages by using a blocking protocol In some situations, special efforts should be made not to schedule such processes in same time slice Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 24

Case Studies Mach Linux SMP Support in Windows Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 25

Mach Mach OS implements scheduling hints Thread issues hint to influence processor scheduling For example, a hands-off hint to relinquish CPU in favor of a specific thread Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 26

Linux Multiprocessing support introduced in 2.0 kernel Coarse-grained locking was employed Granularity of locks was made finer in later releases Kernel was still nonpreemptible until 2.6 kernel Kernel provides: Spin locks for locking of data structures Reader–writer spin locks Sequence lock Per-CPU data structures to reduce lock contention Other features: hard and soft affinity, load balancing Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 27

SMP Support in Windows A hyperthreaded CPU is considered to be several logical processors Spin locks provide mutual exclusion over kernel data A thread holding a spinlock is never preempted Queued spinlock uses a scalable software implementation scheme Uses many free lists of memory for parallel access Process default processor affinity and thread processor affinity together define thread affinity set Ideal processor defines hard affinity for a thread Uses both hard and soft affinity Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 28

Summary Multiprocessor OS exploits multiple CPUs in computer to provide high throughput (system), computation speedup (application), and graceful degradation (of OS, when faults occur) Classification of uniprocessors Uniform memory architecture (UMA) Also called Symmetrical multiprocessor (SMP) Nonuniform memory architecture (NUMA) OS efficiently schedules user processes in parallel Issues: kernel structure and synchronization delays Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 29

Summary (continued) Multiprocessor OS algorithms must be scalable Use of special kinds of locks: Spin locks and sleep locks Important scheduling concepts in multiprocessor OSs: Affinity scheduling Coscheduling Process shuffling Operating Systems, by Dhananjay Dhamdhere Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 30