Shared Memory Programming via Posix threads

Slides:



Advertisements
Similar presentations
CS Lecture 4 Programming with Posix Threads and Java Threads George Mason University Fall 2009.
Advertisements

Threads. Readings r Silberschatz et al : Chapter 4.
Threads By Dr. Yingwu Zhu. Review Multithreading Models Many-to-one One-to-one Many-to-many.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
B.Ramamurthy1 POSIX Thread Programming 2/14/97 B.Ramamurthy.
POSIX Threads HUJI Spring 2007.
Concurrency 1 CS502 Spring 2006 Thought experiment static int y = 0; int main(int argc, char **argv) { extern int y; y = y + 1; return 0; }
Introduction to Pthreads. Pthreads Pthreads is a POSIX standard for describing a thread model, it specifies the API and the semantics of the calls. Model.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
The University of Adelaide, School of Computer Science
10/16/ Realizing Concurrency using the thread model B. Ramamurthy.
Operating Systems CMPSC 473 Multi-threading models Tutorial on pthreads Lecture 10: September Instructor: Bhuvan Urgaonkar.
Threads and Thread Control Thread Concepts Pthread Creation and Termination Pthread synchronization Threads and Signals.
Includes slides from course CS194 at UC Berkeley, by prof. Katherine Yelick Shared Memory Programming Pthreads: an overview Ing. Andrea Marongiu
Source: Operating System Concepts by Silberschatz, Galvin and Gagne.
CS333 Intro to Operating Systems Jonathan Walpole.
1 Pthread Programming CIS450 Winter 2003 Professor Jinhua Guo.
POSIX Synchronization Introduction to Operating Systems: Discussion Module 5.
(p)Threads Libraries Math 442 es Jim Fix. Life cycle of a thread.
POSIX Threads HUJI Spring 2011.
Lecture 7: POSIX Threads - Pthreads. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Pthreads.
Threads Tutorial #7 CPSC 261. A thread is a virtual processor Each thread is provided the illusion that it owns a core – Copy of the registers – It is.
Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Unix System Calls and Posix Threads.
Threads A thread is an alternative model of program execution
1 Introduction to Threads Race Conditions. 2 Process Address Space Revisited Code Data OS Stack (a)Process with Single Thread (b) Process with Two Threads.
1 Programming with Shared Memory - 2 Issues with sharing data ITCS 4145 Parallel Programming B. Wilkinson Jan 22, _Prog_Shared_Memory_II.ppt.
回到第一頁 What are threads n Threads are often called "lightweight processes” n In the UNIX environment a thread: u Exists within a process and uses the process.
7/9/ Realizing Concurrency using Posix Threads (pthreads) B. Ramamurthy.
Case Study: Pthread Synchronization Dr. Yingwu Zhu.
A thread is a basic unit of CPU utilization within a process Each thread has its own – thread ID – program counter – register set – stack It shares the.
Realizing Concurrency using the thread model
CS 537 – Introduction to Operating Systems
C Threads and Semaphores
Process Tables; Threads
Realizing Concurrency using the thread model
Threads in C Caryl Rahn.
Shared-Memory Programming with Threads
Pthreads – Create and Join
CS399 New Beginnings Jonathan Walpole.
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Threads and Cooperation
Realizing Concurrency using Posix Threads (pthreads)
Operating Systems Lecture 13.
Recitation 14: Proxy Lab Part 2
CS 240 – Lecture 18 Command-line Arguments, Typedef, Union, Bit Fields, Pointers to Functions.
Jonathan Walpole Computer Science Portland State University
Realizing Concurrency using the thread model
Process Tables; Threads
PTHREADS AND SEMAPHORES
Realizing Concurrency using the thread model
CS510 Operating System Foundations
Jonathan Walpole Computer Science Portland State University
Unix System Calls and Posix Threads
Operating System Concepts
Programming with Shared Memory
Jonathan Walpole Computer Science Portland State University
Synchronization Primitives – Semaphore and Mutex
Realizing Concurrency using the thread model
Realizing Concurrency using Posix Threads (pthreads)
Realizing Concurrency using the thread model
Programming with Shared Memory
Realizing Concurrency using Posix Threads (pthreads)
CSE 153 Design of Operating Systems Winter 19
Programming with Shared Memory - 2 Issues with sharing data
Synchronization.
Lecture 20: Synchronization
POSIX Threads(pthreads)
Presentation transcript:

Shared Memory Programming via Posix threads Laxmikant Kale CS433

Shared Address Space Model All memory is accessible to all processes Processes are mapped to processors, typically by a symmetric OS Coordination among processes: by sharing variables Avoid “stepping on toes”: using locks and barriers

Running Example: computing pi Area of circle : π*r*r Ratio of the area of a circle, and that of the enclosing square: π/4 Method: compute a set of random number pairs (in the range 0-1) and count the number of pairs that fall inside the circle The ratio gives us an estimate for π/4 In parallel: Let each processor compute a different set of random number pairs (in the range 0-1) and count the number of pairs that fall inside the circle

Pi on shared memory int count; Lock countLock; piFunction(int myProcessor) { seed s = makeSeed(myProcessor); for (I=0; I<100000/P; I++) { x = random(s); y = random(s); if (x*x + y*y < 1.0) { lock(countLock); count++; unlock(countLock); }} barrier(); if (myProcessor == 0) { printf(“pi=%f\n”, 4*count/100000); }

main() { countLock = createLock(); parallel(piFunction); } The system needs to provide the functions for locks, barriers, and thread (or process) creation.

How fast will this run? Assume perfect shared memory machine (I.e. no problem scaling up because of limited bandwidth to memory) But locks are a sequential bottleneck If you have lots of processors, you will find most of them in the queue waiting for the lock, at any given time But we are doing really little work inside the “locked” critical section But obtaining lock is expensive: (we will revisit “why?” later). Can we analyze the performance more precisely? Let Tw be the time for computing outside the lock Let Tc be the time in getting the lock, doing the critical section work, and unlocking Let P be the number of processors

Analysis: How fast will this run? Can we analyze the performance more precisely? Let Tw be the time for computing outside the lock Let Tc be the time in getting the lock, doing the critical section work, and unlocking Let P be the number of processors Tw: work Tc: critical section

Analysis: How fast will this run? The other case is when the work section larger than P*Tc Write expressions for completion time in both cases: Tw: work Tc: critical section

Pi on shared memory: efficient version int count; Lock countLock; piFunction(int myProcessor) { int c; seed s = makeSeed(myProcessor); for (I=0; I<100000/P; I++) { x = random(s); y = random(s); if (x*x + y*y < 1.0) c++; }} lock(countLock); count += c;; unlock(countLock); barrier(); if (myProcessor == 0) { printf(“pi=%f\n”, 4*count/100000); }

Real SAS systems Posix threads (Pthreads) is a standard for threads-based shared memory programming Shared memory calls: just a few, normally standard calls In addition, lower level calls: fetch-and-inc, fetch-and-add

Posix Threads on Origin 2000 Shared memory programming on Origin 2000: Important calls Thread creation and joining pthread_create(pthread_t *threadID, At,functionName, (void *) arg); pthread_join(pthread_t, threadID, void **result); Locks pthread_mutex_t lock; pthread_mutex_lock(&lock); pthread_mutex_unlock(&lock); Condition variables: pthread_cond_t cv; pthread_cond_init(&cv, (pthread_condattr_t *) 0); pthread_cond_wait(&cv, &cv_mutex); pthread_cond_broadcast(&cv); Semaphores, and other calls Follow the web link on the class web page for detailed documentation

Computing pi (Pthreads): Declarations /* pgm.c */ #include <pthread.h> #include <stdlib.h> #include <stdio.h> #define nThreads 4 #define nSamples 1000000 typedef struct _shared_value { pthread_mutex_t lock; int value; } shared_value; shared_value sval;

Function in each thread void *doWork(void *id) { size_t tid = (size_t) id; int nsucc, ntrials, i; ntrials = nSamples/nThreads; nsucc = 0; srand48((long) tid); for(i=0;i<ntrials;i++) { double x = drand48(); double y = drand48(); if((x*x + y*y) <= 1.0) nsucc++; } pthread_mutex_lock(&(sval.lock)); sval.value += nsucc; pthread_mutex_unlock(&(sval.lock)); return 0;

Main function Init lock/s Create threads Wait for threads to complete int main(int argc, char *argv[]) { pthread_t tids[nThreads]; size_t i; double est; pthread_mutex_init(&(sval.lock), NULL); sval.value = 0; printf("Creating Threads\n"); for(i=0;i<nThreads;i++) pthread_create(&tids[i], NULL, doWork, (void *) i); printf("Created Threads... waiting for them to complete\n"); for(i=0;i<nThreads;i++) pthread_join(tids[i], NULL); printf("Threads Completed...\n"); est = 4.0 * ((double) sval.value / (double) nSamples); printf("Estimated Value of PI = %lf\n", est); exit(0); } Init lock/s Create threads Wait for threads to complete

Compiling : Makefile # Makefile #for solaris FLAGS = -mt #for Origin2000 #FLAGS = pgm: pgm.c cc -o pgm $(FLAGS) pgm.c -lpthread clean: rm -f pgm *.o *~

So, do we understand the prog. Model? Consider the following code: a = 1; if (b == 0) { if (z ==0) {z = 1; t=1; } a = 0;} b = 1; if (a == 0) { if (z==0) {z =2; t=2;}; b = 0; } 1 3 - 1 2 5 6 7 8 store a load b load z store z store t 4 1 5 7 10 11 2 4 - 3 4 - store b load a load z store z store t 2 3 6 8 9 12 Expectation: (if z, t began as (0,0)): they can be (0,0) (1,1) or (2,2) but not (1,2), or (2,1). If each processor allows its instructions to be out of order (as long as its own results are consistent), the result can be wrong. For example: the store from processor A may get delayed.

Sequential consistency So, we want to state that the implementation should disallow such reordering (of one processor’s instructions) As seen by other processors I.e. it is not enough for processor A to issue its operation in order, they must be seen as completed by others in the same order But we don’t want to restrict the freedom of the processor any more than really necessary Speed will suffer Sequential consistency: A parallel program should behave as if there is one processor and one memory (and no cache) I.e. the results should be as if the instructions were interleaved in some order

Sequential consistency More precisely: operational semantics behave as if there is a single FIFO queue of memory operations coming from all processors (and there is no cache) Now, the architect must keep this contract in mind while building a machine, but the programmer has a concrete understanding of what to expect from their programs and it agrees with their intuitions (for most people).. The architect is NOT required to build such a FIFO queue Just make sure the system behaves as if there is one.

Another example: Proc 1 Proc 2 a=1; b =1; while (b==0) ; // wait print a; We should not see a 0 printed, right? But a and b may be in different memory modules (or caches) and the change in b may become “visible” to the second process before the change in a Sequential consistency forces the machine (designer) to make a visible before b is visible