Programming with Shared Memory

Slides:

Advertisements

Similar presentations

CSCC69: Operating Systems

Advertisements

Chapter 6: Process Synchronization

8a-1 Programming with Shared Memory Threads Accessing shared data Critical sections ITCS4145/5145, Parallel Programming B. Wilkinson Jan 4, 2013 slides8a.ppt.

Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.

1 Programming with Shared Memory. 2 Shared memory multiprocessor system Any memory location can be accessible by any of the processors. A single address.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Programming with Shared Memory

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.

Threads© Dr. Ayman Abdel-Hamid, CS4254 Spring CS4254 Computer Network Architecture and Programming Dr. Ayman A. Abdel-Hamid Computer Science Department.

Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.

Semaphores. Readings r Silbershatz: Chapter 6 Mutual Exclusion in Critical Sections.

Multiprocessors and Multi-computers Multi-computers –Distributed address space accessible by local processors –Requires message passing –Programming tends.

Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.

Threads, Thread management & Resource Management.

CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.

The University of Adelaide, School of Computer Science

1 Programming with Shared Memory. 2 Shared memory multiprocessor system Any memory location can be accessible by any of the processors. A single address.

CSCI-455/552 Introduction to High Performance Computing Lecture 19.

Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.

1 Pthread Programming CIS450 Winter 2003 Professor Jinhua Guo.

13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.

C H A P T E R E L E V E N Concurrent Programming Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.

CS 360 pthreads Condition Variables for threads. Page 2 CS 360, WSU Vancouver What is the issue? Creating a thread to perform a task and then joining.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

1 Programming with Shared Memory - 2 Issues with sharing data ITCS 4145 Parallel Programming B. Wilkinson Jan 22, _Prog_Shared_Memory_II.ppt.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Chapter 4 – Thread Concepts

Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.

OPERATING SYSTEM CONCEPT AND PRACTISE

PROCESS MANAGEMENT IN MACH

Background on the need for Synchronization

Day 12 Threads.

Chapter 4 – Thread Concepts

Threads Threads.

Operating System (013022) Dr. H. Iwidat

Multithreading Tutorial

Process Management Presented By Aditya Gupta Assistant Professor

Other Important Synchronization Primitives

Computer Engg, IIT(BHU)

Chapter 5: Process Synchronization

Chapter 4: Threads.

Chapter 4: Threads.

Lecture 14: Pthreads Mutex and Condition Variables

Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.

PTHREADS AND SEMAPHORES

Multithreading Tutorial

Introduction to High Performance Computing Lecture 20

Threading And Parallel Programming Constructs

Shared Memory Programming

CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM

Concurrency: Mutual Exclusion and Process Synchronization

Multithreading Tutorial

Programming with Shared Memory

Multithreading Tutorial

Lecture 14: Pthreads Mutex and Condition Variables

Chapter 3: Processes.

Chapter 4: Threads & Concurrency

Chapter 4: Threads.

Programming with Shared Memory

- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts: multiprogramming, multiprocessing, multitasking,

CS510 Operating System Foundations

CSE 153 Design of Operating Systems Winter 19

Programming with Shared Memory

Programming with Shared Memory - 2 Issues with sharing data

Programming with Shared Memory Specifying parallelism

“The Little Book on Semaphores” Allen B. Downey

Chapter 3: Process Management

Presentation transcript:

Programming with Shared Memory Threads Accessing shared data Critical sections ITCS4145/5145, Parallel Programming B. Wilkinson Feb 6a, 2013 slides8a.ppt

Shared memory multiprocessor system Processors cores and processors Single address space exists – each memory location given unique address within single range of addresses. Any memory location can be accessible by any of the processors. Multicore processors are of this type. Also multiprocessor servers such as coit-grid05.uncc.edu, which have both multicore processors and multiple such processors Address 1 2 3 Memory locations

Programming a Shared Memory System Generally, more convenient and efficient than message passing. Can take advantage of shared memory for holding data rather than explicit message passing to share data. However access to shared data by different processors needs to be carefully controlled usually explicitly by programmer. Shared memory systems have been around for a long time but with the advent of multi-core systems, it has become very important to able to program for them

Methods for Programming Shared Memory Multiprocessors Using heavyweight processes. Using threads explicitly - e.g. Pthreads, Java threads Using a sequential programming language such as C supplemented with compiler directives and libraries for specifying parallelism. e.g. OpenMP. Underlying mechanism on OpenMP is thread-based. Using a “parallel programming” language, e.g. Ada, UPC - not popular. We will look mostly at thread API’s and OpenMP

(Heavyweight) Processes Basically a self-contained program having its own allocation of memory, stack, registers, instruction pointer, and other resources … . Operating systems often based upon notion of a process. Processor time shares between processes, switching from one process to another. Might occur at regular intervals or when an active process becomes delayed. Offers opportunity to de-schedule processes blocked from proceeding for some reason, e.g. waiting for an I/O operation to complete. Process is the basic execution unit in MPI. Each process runs on a separate core or processor or shared on one

as used to dynamically create a process from a process Fork pattern as used to dynamically create a process from a process fork Both main program and forked program sequence execute at the same time if resources available “Forked” child program sequence parent program sequence Time Parent process Child process Although general concept of a fork does not require it, child process being created by the Linux fork is a replica of parent program with same instructions and variable declarations even prior to fork. However, child process only starts at fork and both parent and child process execute onwards together.

Multiple and nested forks pattern Main program Both main program and forked program sequences execute at the same time if resources available Parent program sequence “Forked” child program sequence

“Forked” program sequence Fork-join pattern fork join Main program “Forked” program sequence Both main program and forked program sequence execute at the same time if resources available Explicit “join” placed in calling parent program. Parent will not proceed past this point until child has terminated. Join acts a barrier synchronization point for both sequences. Child can terminate before join is reached, but if not, parent will wait for it terminate.

UNIX System Calls No join routine – use exit() to exit from process and wait() to wait for slave to complete: . pid = fork(); if (pid == 0) { // code to be executed by child } else { //code to be executed by parent } if (pid == 0) exit(0); else wait (0); // join

Using processes in shared memory programming Concept could be used for shared memory parallel programming but not much used because of overhead of process creation and not being able to shared data directly between processes

Threads A separate program sequence that can be executed separately by a processor core, usually within a process. Threads share memory space and global variables but have their own instruction pointer and stack. An OS will manage the threads within each process. Example my Destop i7-3770 quad core processor. Supports 8 threads simultaneously (hyperthreading)

Threads in shared memory programming A common approach, either directly creating threads (a low level approach) or indirectly.

Really low level -- Pthreads IEEE Portable Operating System Interface, POSIX standard. Fork-join pattern

Pthreads detached threads Threads not joined are called detached threads. When detached threads terminate, they are destroyed and their resource released. Fork pattern

Thread pool pattern Very common to need group of threads to start together or to be used together from one execution point. Underlying structure of OpenMP. Group of threads readied to be allocated work and are brought into service. Whether threads actually exist or are created just for then is an implementation detail. Thread pool implies threads already created. Probably best as eliminates thread creation overhead. Pool of threads waiting to allocated Activated threads sequences Main program Synchronization point as fork-join pattern. Similarly, synchronization can be removed in a thread pool (“no-wait” clause in OpenMP).

Some basic issues in writing shared memory programs 1. Sharing data

Accessing Shared Data Accessing shared data needs careful control. Consider two processes each of which is to add one to a shared data item, x. Location x is read, x + 1 computed, and the result written back to the location:

Instruction Process/thread 1 Process/thread 2 x = x + 1; read x compute x + 1 write to x read x Get x = x + 2 finally. Instruction Process/thread 1 Process/thread 2 x = x + 1; read x read x compute x + 1 write to x Get x = x + 1 finally.

Critical Section A mechanism for ensuring that only one process accesses a particular resource at a time. critical section – a section of code for accessing resource Arrange that only one such critical section is executed at a time. This mechanism is known as mutual exclusion. Concept also appears in an operating systems.

Locks Simplest mechanism for ensuring mutual exclusion of critical sections. A lock - a 1-bit variable that is a 1 to indicate that a process has entered the critical section and a 0 to indicate that no process is in the critical section. Operates much like that of a door lock: A process coming to the “door” of a critical section and finding it open may enter the critical section, locking the door behind it to prevent other processes from entering. Once the process has finished the critical section, it unlocks the door and leaves.

Control of critical sections through busy waiting

Pthreads Lock routines Locks implemented in Pthreads with mutually exclusive lock variables, or “mutex” variables: … pthread_mutex_lock(&mutex1); critical section pthread_mutex_unlock(&mutex1); If a thread reaches mutex lock and finds it locked, it will wait for lock to open. If more than one thread waiting for lock to open, when it does open, system selects one thread to be allowed to proceed. Only thread that locks a mutex can unlock it. Same mutex variable

Condition Variables Often, a critical section is to be executed if a specific global condition exists; for example, if a certain value of a variable has been reached. With locks, the global variable would need to be examined at frequent intervals (“polled”) within a critical section. Very time-consuming and unproductive exercise. Can be overcome by introducing so-called condition variables.

Pthread Condition Variables Pthreads arrangement for signal and wait: Notes: Signals not remembered - threads must already be waiting for a signal to receive it. Pthread_cond_wait() unlocks mutex1 so that it can be used other thread and relocks it after woken up. Value of c checked in both threads

Critical Sections Serializing Code High performance programs should have as few as possible critical sections as their use can serialize the code. Suppose, all processes happen to come to their critical section together. They will execute their critical sections one after the other. In that situation, the execution time becomes almost that of a single processor.

Illustration

Deadlock Can occur with two processes when one requires a resource held by the other, and this process requires a resource held by the first process.

Deadlock Deadlock can also occur in a circular fashion with several processes having a resource wanted by another.

pthread_mutex_trylock routine pthread_mutex_trylock() Pthreads pthread_mutex_trylock routine Offers one routine that can test whether a lock is actually closed without blocking the thread: pthread_mutex_trylock() Will lock an unlocked mutex and return 0 or will return with EBUSY if the mutex is already locked – might find a use in overcoming deadlock.

P operation on semaphore s A positive integer (including zero) operated upon by two operations: P operation on semaphore s Waits until s is greater than zero and then decrements s by one and allows the process to continue. V operation on semaphore s Increments s by one and releases one of the waiting processes (if any).

P and V operations are performed indivisibly. Mechanism for activating waiting processes implicit in P and V operations. Though exact algorithm not specified, algorithm expected to be fair. Processes delayed by P(s) are kept in abeyance until released by a V(s) on the same semaphore. Devised by Dijkstra in 1968. Letter P from Dutch word passeren, meaning “to pass” Letter V from Dutch word vrijgeven, meaning “to release”

Mutual exclusion of critical sections can be achieved with one semaphore having the value 0 or 1 (a binary semaphore), which acts as a lock variable, but P and V operations include a process scheduling mechanism: Process 1 Process 2 Process 3 Noncritical section Noncritical section Noncritical section … ... … P(s) P(s) P(s) Critical section Critical section Critical section V(s) V(s) V(s) … … …

(or counting semaphore) General semaphore (or counting semaphore) Can take on positive values other than zero and one. Provide, for example, a means of recording number of “resource units” available or used. Can solve producer/ consumer problems - more on that in operating system courses. Semaphore routines exist for UNIX processes. Does not exist in Pthreads as such, though they can be written. Do exist in real-time extension to Pthreads.

Monitor lock(x); monitor body unlock(x); return; } Suite of procedures that provides only way to access shared resource. Only one process can use a monitor procedure at any instant. Could be implemented using a semaphore or lock to protect entry, i.e.: monitor_proc1() { lock(x); monitor body unlock(x); return; } A version of a monitor exists in Java threads, see later

Program example To sum the elements of an array, a[1000]: int sum, a[1000]; sum = 0; for (i = 0; i < 1000; i++) sum = sum + a[i];

Pthreads program example Modified within critical sections n threads created, each taking numbers from list to add to their local partial sums. When all numbers taken, threads can add their partial sums to a shared location sum. Shared location global_index used by each thread to select next element of a[]. After global_index read, it is incremented in preparation for next element to be read.

Questions