Multiprocessors and Multithreading – classroom slides.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Threads, SMP, and Microkernels
Process management Information maintained by OS for process management  process context  process control block OS virtualization of CPU for each process.
COMPUTER SYSTEMS An Integrated Approach to Architecture and Operating Systems Chapter 12 Multithreaded Programming and Multiprocessors ©Copyright 2008.
Chapter 6: Process Synchronization
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Computer Systems/Operating Systems - Class 8
Threads Irfan Khan Myo Thein What Are Threads ? a light, fine, string like length of material made up of two or more fibers or strands of spun cotton,
Chapter 5 Processes and Threads Copyright © 2008.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.
Threads. Announcements Cooperating Processes Last time we discussed how processes can be independent or work cooperatively Cooperating processes can.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Threads CSCI 444/544 Operating Systems Fall 2008.
Processes April 5, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Process Concept An operating system executes a variety of programs
© 2004, D. J. Foreman 2-1 Concurrency, Processes and Threads.
CS 3013 & CS 502 Summer 2006 Threads1 CS-3013 & CS-502 Summer 2006.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Processes Part I Processes & Threads* *Referred to slides by Dr. Sanjeev Setia at George Mason University Chapter 3.
Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.
Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.
Threads, Thread management & Resource Management.
Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.
Implementing Processes and Process Management Brian Bershad.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
© 2004, D. J. Foreman 2-1 Concurrency, Processes and Threads.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership: process includes a virtual address space to hold the process image (fig 3.16)
CS333 Intro to Operating Systems Jonathan Walpole.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
PREPARED BY : MAZHAR JAVED AWAN BSCS-III Process Communication & Threads 4 December 2015.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
CS 2200 Presentation 19 Threads. Test Preview A misguided java programmer is looking at the specs for a computer. It's a byte addressed 32-bit model equipped.
Department of Computer Science and Software Engineering
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Processes & Threads Introduction to Operating Systems: Module 5.
1 OS Review Processes and Threads Chi Zhang
Threads-Process Interaction. CONTENTS  Threads  Process interaction.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
Threads & Multithreading
Process concept.
CS 6560: Operating Systems Design
Lecture Topics: 11/1 Processes Process Management
CS399 New Beginnings Jonathan Walpole.
Process Management Presented By Aditya Gupta Assistant Professor
Chapter 4 Threads.
Threads & multithreading
Operating Systems Processes and Threads.
Threads, SMP, and Microkernels
PROCESS MANAGEMENT Information maintained by OS for process management
Lecture Topics: 11/1 General Operating System Concepts Processes
Lecture 4- Threads, SMP, and Microkernels
Threads Chapter 4.
Processes Hank Levy 1.
Processes and Process Management
Threads Chapter 5 2/17/2019 B.Ramamurthy.
Threads Chapter 5 2/23/2019 B.Ramamurthy.
CS510 Operating System Foundations
CS333 Intro to Operating Systems
Processes Hank Levy 1.
Threads.
Presentation transcript:

Multiprocessors and Multithreading – classroom slides

compute I/O I/O result Needed (a) Sequential process compute thread I/O result Needed (b) Multithreaded process I/O request I/O complete I/O thread Example use of threads - 1

DigitizerTrackerAlarm Example use of threads - 2

Programming Support for Threads creation –pthread_create(top-level procedure, args) termination –return from top-level procedure –explicit kill rendezvous –creator can wait for children pthread_join(child_tid) synchronization –mutex –condition variables Main thread thread_create(foo, args) (a) Before thread creation main thread thread_create(foo, args) (b) After thread creation foo thread

Sample program – thread create/join int foo(int n) {..... return 0; } int main() { int f; thread_type child_tid;..... child_tid = thread_create (foo, &f);..... thread_join(child_tid); }

Programming with Threads synchronization –for coordination of the threads communication –for inter-thread sharing of data –threads can be in different processors –how to achieve sharing in SMP? software: accomplished by keeping all threads in the same address space by the OS hardware: accomplished by hardware shared memory and coherent caches producerconsumer buffer

Need for Synchronization digitizer() { image_type dig_image; int tail = 0; loop { if (bufavail > 0) { grab(dig_image); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; bufavail = bufavail - 1; } tracker() { image_type track_image; int head = 0; loop { if (bufavail < MAX) { track_image = frame_buf[head mod MAX]; head = head + 1; bufavail = bufavail + 1; analyze(track_image); } Problem?

digitizertracker bufavailbufavail = bufavail – 1; bufavail = bufavail + 1; Shared data structure …… head tail (First valid filled frame in frame_buf ) (First empty spot in frame_buf ) 0 99 frame_buf

Synchronization Primitives lock and unlock –mutual exclusion among threads –busy-waiting Vs. blocking –pthread_mutex_trylock: no blocking –pthread_mutex_lock: blocking –pthread_mutex_unlock

Fix number 1 – with locks digitizer() { image_type dig_image; int tail = 0; loop { thread_mutex_lock(buflock); if (bufavail > 0) { grab(dig_image); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; bufavail = bufavail - 1; } thread_mutex_unlock(buflock); } tracker() ( image_type track_image; int head = 0; loop { thread_mutex_lock(buflock); if (bufavail < MAX) { track_image = frame_buf[head mod MAX]; head = head + 1; bufavail = bufavail + 1; analyze(track_image); } thread_mutex_unlock(buflock); } Problem?

Fix number 2 digitizer() { image_type dig_image; int tail = 0; loop { grab(dig_image); thread_mutex_lock(buflock); while (bufavail == 0) do nothing; thread_mutex_unlock(buflock); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; thread_mutex_lock(buflock); bufavail = bufavail - 1; thread_mutex_unlock(buflock); } tracker() { image_type track_image; int head = 0; loop { thread_mutex_lock(buflock); while (bufavail == MAX) do nothing; thread_mutex_unlock(buflock); track_image = frame_buf[head mod MAX]; head = head + 1; thread_mutex_lock(buflock); bufavail = bufavail + 1; thread_mutex_unlock(buflock); analyze(track_image); } Problem?

Fix number 3 digitizer() { image_type dig_image; int tail = 0; loop { grab(dig_image); while (bufavail == 0) do nothing; frame_buf[tail mod MAX] = dig_image; tail = tail + 1; thread_mutex_lock(buflock); bufavail = bufavail - 1; thread_mutex_unlock(buflock); } tracker() { image_type track_image; int head = 0; loop { while (bufavail == MAX) do nothing; track_image = frame_buf[head mod MAX]; head = head + 1; thread_mutex_lock(buflock); bufavail = bufavail + 1; thread_mutex_unlock(buflock); analyze(track_image); } Problem?

condition variables –pthread_cond_wait: block for a signal –pthread_cond_signal: signal one waiting thread –pthread_cond_broadcast: signal all waiting threads

T1T2 cond_wait (c, m) cond_signal (c) blocked resumed T1T2 cond_wait (c, m) cond_signal (c) (a) Wait before signal (b) Wait after signal (T1 blocked forever) Wait and signal with cond vars

Fix number 4 – cond var digitizer() { image_type dig_image; int tail = 0; loop { grab(dig_image); thread_mutex_lock(buflock); if (bufavail == 0) thread_cond_wait(buf_not_full, buflock); thread_mutex_unlock(buflock); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; thread_mutex_lock(buflock); bufavail = bufavail - 1; thread_cond_signal(buf_not_empty); thread_mutex_unlock(buflock); } tracker() { image_type track_image; int head = 0; loop { thread_mutex_lock(buflock); if (bufavail == MAX) thread_cond_wait(buf_not_empty, buflock); thread_mutex_unlock(buflock); track_image = frame_buf[head mod MAX]; head = head + 1; thread_mutex_lock(buflock); bufavail = bufavail + 1; thread_cond_signal(buf_not_full); thread_mutex_unlock(buflock); analyze(track_image); } This solution is correct so long as there is exactly one producer and one consumer

Gotchas in programming with cond vars acquire_shared_resource() { thread_mutex_lock(cs_mutex); if (res_state == BUSY) thread_cond_wait (res_not_busy, cs_mutex); res_state = BUSY; thread_mutex_unlock(cs_mutex); } release_shared_resource() { thread_mutex_lock(cs_mutex); res_state = NOT_BUSY; thread_cond_signal(res_not_busy); thread_mutex_unlock(cs_mutex); } T3 is here T2 is here T1 is here

cs_mutex T3 res_not_busy T2 (a) Waiting queues before T1 signals cs_mutex T3 res_not_busy T2 (a) Waiting queues after T1 signals State of waiting queues

Defensive programming – retest predicate acquire_shared_resource() { thread_mutex_lock(cs_mutex); T3 is here while (res_state == BUSY) thread_cond_wait (res_not_busy, cs_mutex); T2 is here res_state = BUSY; thread_mutex_unlock(cs_mutex); } release_shared_resource() { thread_mutex_lock(cs_mutex); res_state = NOT_BUSY; T1 is here thread_cond_signal(res_not_buys); thread_mutex_unlock(cs_mutex); }

mail box Dispatcher workers (a) Dispatcher model mail box mail box (b) Team model (c) Pipelined model stages Threads as software structuring abstraction

Threads and OS Traditional OS DOS –memory layout –protection between user and kernel? User Kernel Program data DOS code data

Unix –memory layout –protection between user and kernel? –PCB? user kernel P1 P2 process code and data process code and data kernel code and data PCB

programs in these traditional OS are single threaded –one PC per program (process), one stack, one set of CPU registers –if a process blocks (say disk I/O, network communication, etc.) then no progress for the program as a whole

MT Operating Systems How widespread is support for threads in OS? Digital Unix, Sun Solaris, Win95, Win NT, Win XP Process Vs. Thread? in a single threaded program, the state of the executing program is contained in a process in a MT program, the state of the executing program is contained in several ‘concurrent’ threads

Process Vs. Thread –computational state (PC, regs, …) for each thread –how different from process state? P1P2 User Kernel kernel code and data codedata code data PCB T2 T3T1 P1P2

(a) ST program(b) MT program code global heap stack stack1stack2stack3stack4

threads –share address space of process –cooperate to get job done threads concurrent? –may be if the box is a true multiprocessor –share the same CPU on a uniprocessor threaded code different from non-threaded? –protection for data shared among threads –synchronization among threads

Threads Implementation user level threads –OS independent –scheduler is part of the runtime system –thread switch is cheap (save PC, SP, regs) –scheduling customizable, i.e., more app control –blocking call by thread blocks process

Kernel User P2 Threads library T1T2 T3 T2 T3 T1 P1P2 P3 process ready_q P3 mutex, cond_var thread ready_q P1 Threads library T1T2 T3 T2 T3 T1 mutex, cond_var thread ready_q

Kernel User P1 Threads library T2 T3 T1 Currently executing thread Blocking call to the OS Upcall to the threads library

solution to blocking problem in user level threads –non-blocking version of all system calls –polling wrapper in scheduler for such calls switching among user level threads –yield voluntarily –how to make preemptive? timer interrupt from kernel to switch

Kernel level –expensive thread switch –makes sense for blocking calls by threads –kernel becomes complicated: process vs. threads scheduling –thread packages become non-portable problems common to user and kernel level threads –libraries –solution is to have thread-safe wrappers to such library calls

Kernel User P2 P1P2 P3 process ready_q P3 P1 T2 T3 T1 T2 T1 T2 T3 T1T2 thread level scheduler process level scheduler

Solaris Threads Three kinds –user, lwp, kernel user: any number can be created and attached to lwp’s one to one mapping between lwp and kernel threads kernel threads known to the OS scheduler if a kernel thread blocks, associated lwp, and user level threads block as well

Kernel User P2 P3 P1 T2 T3 T1 T2 T1 lwp Solaris threads

/* original version */ | /* thread safe version */ | | mutex_lock_type cs_mutex; void *malloc(size_t size)| void *malloc(size_t size) { | { | thread_mutex_lock(cs_mutex); | | | | thread_mutex_unlock(cs_mutex); | return(memory_pointer);| return (memory_pointer); } | } Thread safe libraries

Synchronization support Lock –Test and set instruction

Shared Memory CPU.. Input/output Shared bus SMP

cache Shared Memory CPU Shared bus cache CPU cache CPU.. SMP with per-processor caches

X Shared Memory P1 Shared bus X P2 X P3 T1T2 T3 Cache consistency problem

X -> X’ Shared Memory P1 Shared bus X -> inv P2 X -> inv P3 T1T2 T3 (b) write-invalidate protocol invalidate -> X -> X’ Shared Memory P1 Shared bus X -> X’ P2 X -> X’ P3 T1T2 T3 (c) write-update protocol update -> Two possible solutions

Given the following details about an SMP (symmetric multiprocessor): Cache coherence protocol:write-invalidate Cache to memory policy:write-back Initially: The caches are empty Memory locations: A contains 10 B contains 5 Consider the following timeline of memory accesses from processors P1, P2, and P3. Contents of caches and memory? Time (in increasing order) Processor P1Processor P2Processor P3 T1Load A T2Load A T3Load A T4Store #40, A T5Store #30, B

What is multithreading? technique allowing program to do multiple tasks is it a new technique? –has existed since the 70’s (concurrent Pascal, Ada tasks, etc.) why now? –emergence of SMPs in particular –“time has come for this technology”

active allows concurrency between I/O and user processing even in a uniprocessor box process threads in a uniprocessor?

Multiprocessor: First Principles processors, memories, interconnection network Classification: SISD, SIMD, MIMD, MISD message passing MPs: e.g. IBM SP2 shared address space MPs –cache coherent (CC) SMP: a bus-based CC MIMD machine –several vendors: Sun, Compaq, Intel,... CC-NUMA: SGI Origin 2000 –non-cache coherent (NCC) Cray T3D/T3E

What is an SMP? –multiple CPUs in a single box sharing all the resources such as memory and I/O Is an SMP more cost effective than two uniprocessor boxes? –yes (roughly 20% more for a dual processor SMP compared to a uni) –modest speedup for a program on a dual- processor SMP over a uni will make it worthwhile