Multicore Programming (Parallel Computing) HP Training Session 5 Prof. Dan Connors.

Slides:



Advertisements
Similar presentations
Threads. Readings r Silberschatz et al : Chapter 4.
Advertisements

Threads. What do we have so far The basic unit of CPU utilization is a process. To run a program (a sequence of code), create a process. Processes are.
Pthreads & Concurrency. Acknowledgements  The material in this tutorial is based in part on: POSIX Threads Programming, by Blaise Barney.
Lecture 4: Concurrency and Threads CS 170 T Yang, 2015 Chapter 4 of AD textbook.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Parallel Programming Lecture Set 4 POSIX Threads Overview & OpenMP Johnnie Baker February 2,
Fork Fork is used to create a child process. Most network servers under Unix are written this way Concurrent server: parent accepts the connection, forks.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Lecture 18 Threaded Programming CPE 401 / 601 Computer Network Systems slides are modified from Dave Hollinger.
Unix Threads operating systems. User Thread Packages pthread package mach c-threads Sun Solaris3 UI threads Kernel Threads Windows NT, XP operating systems.
Threads© Dr. Ayman Abdel-Hamid, CS4254 Spring CS4254 Computer Network Architecture and Programming Dr. Ayman A. Abdel-Hamid Computer Science Department.
Parallel Programming with Threads CS 240A Tao Yang, 2013.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 9 th Edition Chapter 4: Threads.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
The University of Adelaide, School of Computer Science
10/16/ Realizing Concurrency using the thread model B. Ramamurthy.
09/06/2011CS4961 CS4961 Parallel Programming Lecture 5: Introduction to Threads (Pthreads and OpenMP) Mary Hall September 6, 2011.
Today’s topic Pthread Some materials and figures are obtained from the POSIX threads Programming tutorial at
B. RAMAMURTHY 10/24/ Realizing Concurrency using the thread model.
CS 346 – Chapter 4 Threads –How they differ from processes –Definition, purpose Threads of the same process share: code, data, open files –Types –Support.
Threads and Thread Control Thread Concepts Pthread Creation and Termination Pthread synchronization Threads and Signals.
CS345 Operating Systems Threads Assignment 3. Process vs. Thread process: an address space with 1 or more threads executing within that address space,
POSIX Threads Programming Operating Systems. Processes and Threads In shared memory multiprocessor architectures, such as SMPs, threads can be used to.
Includes slides from course CS194 at UC Berkeley, by prof. Katherine Yelick Shared Memory Programming Pthreads: an overview Ing. Andrea Marongiu
Programming with POSIX* Threads Intel Software College.
Linux Programming –Threads CS Threads Review Threads in the same address space –share everything in the address space –lighter than process –no.
CS333 Intro to Operating Systems Jonathan Walpole.
09/17/2010CS4961 CS4961 Parallel Programming Lecture 8: Introduction to Threads and Data Parallelism in OpenMP Mary Hall September 17, 2009.
Threads CSCE Thread Motivation Processes are expensive to create. Context switch between processes is expensive Communication between processes.
S -1 Posix Threads. S -2 Thread Concepts Threads are "lightweight processes" –10 to 100 times faster than fork() Threads share: –process instructions,
1 Pthread Programming CIS450 Winter 2003 Professor Jinhua Guo.
CS 838: Pervasive Parallelism Introduction to pthreads Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
12/22/ Thread Model for Realizing Concurrency B. Ramamurthy.
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Threads A thread is an alternative model of program execution
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Thread S04, Recitation, Section A Thread Memory Model Thread Interfaces (System Calls) Thread Safety (Pitfalls of Using Thread) Racing Semaphore.
POSIX Threads Loren Stroup EEL 6897 Software Development for R-T Engineering Systems.
NCHU System & Network Lab Lab #6 Thread Management Operating System Lab.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Thread Basic Thread operations include thread creation, termination, synchronization, data management Threads in the same process share:  Process address.
B. RAMAMURTHY 5/10/2013 Amrita-UB-MSES Realizing Concurrency using the thread model.
CPE 779 Parallel Computing - Spring Creating and Using Threads Based on Slides by Katherine Yelick
7/9/ Realizing Concurrency using Posix Threads (pthreads) B. Ramamurthy.
Tutorial 4. In this tutorial session we’ll see Threads.
Realizing Concurrency using the thread model
Threads Some of these slides were originally made by Dr. Roger deBry. They include text, figures, and information from this class’s textbook, Operating.
Lesson One – Creating a thread
Realizing Concurrency using the thread model
Threads in C Caryl Rahn.
Threads Threads.
CS399 New Beginnings Jonathan Walpole.
Linux Processes & Threads
Multithreading Tutorial
Threads and Cooperation
CS4961 Parallel Programming Lecture 4: Memory Systems and Introduction to Threads (Pthreads and OpenMP) Mary Hall August 30, /30/2012 CS4230.
Realizing Concurrency using Posix Threads (pthreads)
CS4961 Parallel Programming Lecture 5: Data and Task Parallelism, cont
Realizing Concurrency using the thread model
PTHREADS AND SEMAPHORES
Multithreading Tutorial
Realizing Concurrency using the thread model
Multithreading Tutorial
Programming with Shared Memory
Multithreading Tutorial
Realizing Concurrency using the thread model
Realizing Concurrency using Posix Threads (pthreads)
Realizing Concurrency using the thread model
Programming with Shared Memory
Realizing Concurrency using Posix Threads (pthreads)
Presentation transcript:

Multicore Programming (Parallel Computing) HP Training Session 5 Prof. Dan Connors

2 Shared Memory Programming Program is a collection of threads of control. Can be created dynamically, mid-execution, in some languages Each thread has a set of private variables, e.g., local stack variables Also a set of shared variables, e.g., static variables, shared common blocks, or global heap. Threads communicate implicitly by writing and reading shared variables. Threads coordinate by synchronizing on shared variables PnP1 P0 s s =... y =..s... Shared memory i: 2i: 5 Private memory i: 8

3 Shared Memory “Code” for Computing a Sum Thread 1 for i = 0, n/2-1 s = s + sqr(A[i]) Thread 2 for i = n/2, n-1 s = s + sqr(A[i]) static int s = 0; Problem is a race condition on variable s in the program A race condition or data race occurs when: -two processors (or two threads) access the same variable, and at least one does a write. -The accesses are concurrent (not synchronized) so they could happen simultaneously

4 Shared Memory Code for Computing a Sum Thread 1 …. compute f([A[i]) and put in reg0 reg1 = s reg1 = reg1 + reg0 s = reg1 … Thread 2 … compute f([A[i]) and put in reg0 reg1 = s reg1 = reg1 + reg0 s = reg1 … static int s = 0; Assume A = [3,5], f is the square function, and s=0 initially For this program to work, s should be 34 at the end but it may be 34,9, or 25 The atomic operations are reads and writes Never see ½ of one number, but no += operation is not atomic All computations happen in (private) registers Af = square

5 Corrected Code for Computing a Sum Thread 1 local_s1= 0 for i = 0, n/2-1 local_s1 = local_s1 + sqr(A[i]) s = s + local_s1 Thread 2 local_s2 = 0 for i = n/2, n-1 local_s2= local_s2 + sqr(A[i]) s = s +local_s2 static int s = 0; Since addition is associative, it’s OK to rearrange order Right? (floating point numbers are not precisely associative) Most computation is on private variables -Sharing frequency is also reduced, which might improve speed -But there is still a race condition on the update of shared s static lock lk; lock(lk); unlock(lk); lock(lk); unlock(lk);

Parallel Programming with Threads

7 Overview of POSIX Threads POSIX: Portable Operating System Interface for UNIX Interface to Operating System utilities PThreads: The POSIX threading interface System calls to create and synchronize threads Should be relatively uniform across UNIX-like OS platforms PThreads contain support for Creating parallelism Synchronizing No explicit support for communication, because shared memory is implicit; a pointer to shared data is passed to a thread

8 Forking Posix Threads thread_id is the thread id or handle (used to halt, etc.) thread_attribute various attributes standard default values obtained by passing a NULL pointer thread_fun the function to be run (takes and returns void*) fun_arg an argument can be passed to thread_fun when it starts errorcode will be set nonzero if the create operation fails Signature: int pthread_create(pthread_t *, const pthread_attr_t *, void * (*)(void *), void *); Example call: errcode = pthread_create(&thread_id, &thread_attribute, &thread_fun, &fun_arg);

9 Simple Threading Example void* SayHello(void *foo) { printf( "Hello, world!\n" ); return NULL; } int main() { pthread_t threads[16]; int tn; for(tn=0; tn<16; tn++) { pthread_create(&threads[tn], NULL, SayHello, NULL); } for(tn=0; tn<16 ; tn++) { pthread_join(threads[tn], NULL); } return 0; } // pthread_t - integer datastructure type for thread handle Compile using gcc –lpthread

10 Simple Threading Alternative Example #include void *print_message_function( void *ptr ) { char *message; message = (char *) ptr; printf("%s \n", message); } main() { pthread_t thread1, thread2; char *message1 = "Thread 1"; char *message2 = "Thread 2"; int iret1, iret2; /* Create independent threads each of which will execute function */ iret1 = pthread_create( &thread1, NULL, print_message_function, (void*) message1); iret2 = pthread_create( &thread2, NULL, print_message_function, (void*) message2); /* Wait till threads are complete before main continues. Unless we */ /* wait we run the risk of executing an exit which will terminate */ /* the process and all threads before the threads have completed. */ pthread_join( thread1, NULL); pthread_join( thread2, NULL); printf("Thread 1 returns: %d\n",iret1); printf("Thread 2 returns: %d\n",iret2); exit(0); }

11 Loop Level Parallelism Many application have parallelism in loops With threads: … A [n]; for (int i = 0; i < n; i++) … pthread_create (square, …, &A[i]); Problem: Overhead of thread creation is nontrivial Square is not enough work for a separate thread (much less time to square a number than to create a thread) Unlike original example, this would overwrite A[i]; how would you do this if you wanted a separate result array?

12 Thread Model A thread is defined as an independent stream of instructions that can be scheduled to run as such by the operating system Comparison to processes Processes require a fair amount of overhead as they contain information about program resources and program execution state Process ID, Environment, working directory, program instructions, registers, stack, heap, file descriptors, signal actions, shared libraries, and inter-process communication tools (message queues, pipes, semaphores, or shared memory)

13 Process vs. Thread Model A thread is “lightweight” because most of the overhead has already been accomplished through the creation of its process. Duplicates only the essential resources to be independently schedulable.

14 Thread Model All threads have access to the same global, shared memory Threads also have their own private data Programmers are responsible for synchronizing access (protecting) globally shared data.

15 Shared Data and Threads Variables declared outside of main are shared Object allocated on the heap may be shared (if pointer is passed) Variables on the stack are private: passing pointer to these around to other threads can cause problems Often done by creating a large “thread data” struct Passed into all threads as argument Simple example: char *message = "Hello World!\n"; pthread_create( &thread1, NULL, (void*)&print_fun, (void*) message);

16 (Details: Setting Attribute Values) Once an initialized attribute object exists, changes can be made. For example: To change the stack size for a thread to 8192 (before calling pthread_create), do this: pthread_attr_setstacksize(&my_attributes, (size_t)8192); To get the stack size, do this: size_t my_stack_size; pthread_attr_getstacksize(&my_attributes, &my_stack_size); Other attributes: Detached state – set if no other thread will use pthread_join to wait for this thread (improves efficiency) Guard size – use to protect against stack overflow Inherit scheduling attributes (from creating thread) – or not Scheduling parameter(s) – in particular, thread priority Scheduling policy – FIFO or Round Robin Contention scope – with what threads does this thread compete for a CPU Stack address – explicitly dictate where the stack is located Lazy stack allocation – allocate on demand (lazy) or all at once, “up front”

17 Barrier -- global synchronization Especially common when running multiple copies of the same function in parallel SPMD “Single Program Multiple Data” simple use of barriers -- all threads hit the same one work_on_my_problem(); barrier; get_data_from_others(); barrier; more complicated -- barriers on branches (or loops) if (tid % 2 == 0) { work1(); barrier } else { barrier } Basic Types of Synchronization: Barrier

18 Creating and Initializing a Barrier To (dynamically) initialize a barrier, use code similar to this (which sets the number of threads to 3): pthread_barrier_t b; pthread_barrier_init(&b,NULL,3); The second argument specifies an object attribute; using NULL yields the default attributes. To wait at a barrier, a process executes: pthread_barrier_wait(&b); This barrier could have been statically initialized by assigning an initial value created using the macro PTHREAD_BARRIER_INITIALIZER(3). Note: barrier is not in all pthreads implementations.

19 Basic Types of Synchronization: Mutexes Mutexes -- mutual exclusion aka locks threads are working mostly independently need to access common data structure lock *l = alloc_and_init(); /* shared */ acquire(l); access data release(l); Semaphores give guarantees on “fairness” in getting the lock, but the same idea of mutual exclusion Locks only affect processors using them: pair-wise synchronization

20 Mutexes in POSIX Threads To create a mutex: #include pthread_mutex_t amutex = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_init(&amutex, NULL); To use it: int pthread_mutex_lock(amutex); int pthread_mutex_unlock(amutex); To deallocate a mutex int pthread_mutex_destroy(pthread_mutex_t *mutex); Multiple mutexes may be held, but can lead to deadlock: thread1 thread2 lock(a) lock(b) lock(b) lock(a)

21 Summary of Programming with Threads POSIX Threads are based on OS features Can be used from multiple languages (need appropriate header) Familiar language for most of program Ability to shared data is convenient Pitfalls Data race bugs are very nasty to find because they can be intermittent Deadlocks are usually easier, but can also be intermittent Researchers look at transactional memory an alternative OpenMP is commonly used today as an alternative

22 Pthread Performance The primary motivation for Pthread is performance gains When compared to the cost of creating and managing a process, a thread can be created with much less operating system overhead. Managing threads requires fewer system resources. Example table (50,000 process/thread creations) Platformrealusersysrealusersys AMD 2.4 GHz Opteron (8cpus/node) IBM 1.9 GHz POWER5 p5-575 (8cpus) IBM 1.5 GHz POWER4 (8cpus/node) INTEL 2.4 GHz Xeon (2 cpus/node) INTEL 1.4 GHz Itanium2 (4 cpus/node) fork() pthread_create()