COMP7330/7336 Advanced Parallel and Distributed Computing POSIX Thread API Dr. Xiao Qin Auburn University

Slides:



Advertisements
Similar presentations
Pthreads & Concurrency. Acknowledgements  The material in this tutorial is based in part on: POSIX Threads Programming, by Blaise Barney.
Advertisements

Multi-core Programming Programming with Posix Threads.
Shared Address Space Computing: Programming Alistair Rendell See Chapter 6 or Lin and Synder, Chapter 7 of Grama, Gupta, Karypis and Kumar, and Chapter.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
8-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 10 - Supplement Introduction to Pthreads for Application Portability.
B.Ramamurthy1 POSIX Thread Programming 2/14/97 B.Ramamurthy.
Programming Shared Address Space Platforms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel.
Threads? Threads allow us to have multiple tasks active at the same time in one executable –think of a server handling multiple connections Each thread.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7 thru 7.9) Prof. Chris Carothers.
UNIT -6 PROGRAMMING SHARED ADDRESS SPACE PLATFORMS THREAD BASICS PREPARED BY:-H.M.PATEL.
Jaehyuk Huh Computer Science, KAIST
Pthread (continue) General pthread program structure –Encapsulate parallel parts (can be almost the whole program) in functions. –Use function arguments.
Introduction to Pthreads. Pthreads Pthreads is a POSIX standard for describing a thread model, it specifies the API and the semantics of the calls. Model.
Multi-threaded Programming with POSIX Threads CSE331 Operating Systems Design.
Threads and Thread Control Thread Concepts Pthread Creation and Termination Pthread synchronization Threads and Signals.
Parallel Programming with PThreads. Threads  Sometimes called a lightweight process  smaller execution unit than a process  Consists of:  program.
CS345 Operating Systems Threads Assignment 3. Process vs. Thread process: an address space with 1 or more threads executing within that address space,
What is a thread? process: an address space with 1 or more threads executing within that address space, and the required system resources for those threads.
Programming with POSIX* Threads Intel Software College.
Programming Shared Address Space Platforms Carl Tropper Department of Computer Science.
Super computers By: Lecturer \ Aisha Dawood. Overview of Programming Models Programming models provide support for expressing concurrency and synchronization.
Pthreads: A shared memory programming model
1 Pthread Programming CIS450 Winter 2003 Professor Jinhua Guo.
IT 325 Operating systems Chapter6.  Threads can greatly simplify writing elegant and efficient programs.  However, there are problems when multiple.
Pthreads.
Threads Tutorial #7 CPSC 261. A thread is a virtual processor Each thread is provided the illusion that it owns a core – Copy of the registers – It is.
CS252: Systems Programming Ninghui Li Based on Slides by Prof. Gustavo Rodriguez-Rivera Topic 11: Thread-safe Data Structures, Semaphores.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Software Systems Advanced Synchronization Emery Berger and Mark Corner University.
Chapter 6 P-Threads. Names The naming convention for a method/function/operation is: – pthread_thing_operation(..) – Where thing is the object used (such.
Programming Shared Address Space Platforms Adapted from Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar ``Introduction to Parallel Computing'',
Super computers By: Lecturer \ Aisha Dawood. Overview of Programming Models Programming models provide support for expressing concurrency and synchronization.
PThread Synchronization. Thread Mechanisms Birrell identifies four mechanisms commonly used in threading systems –Thread creation –Mutual exclusion (mutex)
Thread Basic Thread operations include thread creation, termination, synchronization, data management Threads in the same process share:  Process address.
COMP7330/7336 Advanced Parallel and Distributed Computing Pthread RW Locks - Implementation Dr. Xiao Qin Auburn University
Working with Pthreads. Operations on Threads int pthread_create (pthread_t *thread, const pthread_attr_t *attr, void * (*routine)(void*), void* arg) Creates.
Programming with Threads. Threads  Sometimes called a lightweight process  smaller execution unit than a process  Consists of:  program counter 
COMP7330/7336 Advanced Parallel and Distributed Computing Midterm Exam - Review Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
pThread synchronization
COMP7330/7336 Advanced Parallel and Distributed Computing POSIX Thread API Dr. Xiao Qin Auburn University
Case Study: Pthread Synchronization Dr. Yingwu Zhu.
Tutorial 4. In this tutorial session we’ll see Threads.
COMP7330/7336 Advanced Parallel and Distributed Computing Pthread RW Locks – An Application Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Thread Basics Dr. Xiao Qin Auburn University
Auburn University
Auburn University
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Exploratory Decomposition Dr. Xiao Qin Auburn.
Programming Shared-memory Machines
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Thread Basics Dr. Xiao Qin Auburn University.
PThreads.
Principles of Operating Systems Lecture 11
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Principles of Message-Passing Programming.
Boost String API & Threads
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Pthread Mutex Dr. Xiao Qin Auburn University.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Multithreading Tutorial
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Principles of Message-Passing Programming.
Dr. Xiao Qin Auburn University
Thread synchronization
Programming Shared Address Space Platforms
Lecture 14: Pthreads Mutex and Condition Variables
Programming Shared Address Space Platforms
Multithreading Tutorial
Pthread Prof. Ikjun Yeom TA – Mugyo
Multithreading Tutorial
Multithreading Tutorial
Tutorial 4.
POSIX Threads(pthreads)
Presentation transcript:

COMP7330/7336 Advanced Parallel and Distributed Computing POSIX Thread API Dr. Xiao Qin Auburn University Slides are adopted from Drs. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Review: Thread Basics: Creation Pthreads provides two basic functions for specifying concurrency in a program: #include int pthread_create ( pthread_t *thread_handle, const pthread_attr_t *attribute, void * (*thread_function)(void *), void *arg); int pthread_join ( pthread_t thread, void **ptr); 2

Thread Basics: Termination A thread can terminate in three ways : If the thread returns from its start routine. pthread_join() If it is canceled by some other thread. The function used here is pthread_cancel(). If its calls pthread_exit() function from within itself. 3

Computing Pi: Basic Idea Generate random numbers in a unit length square. Count the number of points that fall within the largest circle inscribed in the square. Because the area of the circle (pi*r^2) equals to pi/4, the fraction of random points that fall in the circle should approach to pi/4. 4

Computing Pi #include #define MAX_THREADS 512 void *compute_pi (void *);.... main() {... pthread_t p_threads[MAX_THREADS]; pthread_attr_t attr; pthread_attr_init (&attr); for (i=0; i< num_threads; i++) { hits[i] = i; pthread_create(&p_threads[i], &attr, compute_pi, (void *) &hits[i]); } for (i=0; i< num_threads; i++) { pthread_join(p_threads[i], NULL); total_hits += hits[i]; }... } 5

void *compute_pi (void *s) { int seed, i, *hit_pointer; double rand_no_x, rand_no_y; int local_hits; hit_pointer = (int *) s; seed = *hit_pointer; local_hits = 0; for (i = 0; i < sample_points_per_thread; i++) { rand_no_x =(double)(rand_r(&seed))/(double)((2<<14)- 1); rand_no_y =(double)(rand_r(&seed))/(double)((2<<14)- 1); if (((rand_no_x - 0.5) * (rand_no_x - 0.5) + (rand_no_y - 0.5) * (rand_no_y - 0.5)) < 0.25) local_hits ++; seed *= i; } *hit_pointer = local_hits; pthread_exit(0); } 6

Performance Notes Note the use of the function rand_r (instead of superior random number generators such as drand48 ). Executing this on a 4-processor SGI Origin, we observe a 3.91 fold speedup at 32 threads. This corresponds to a parallel efficiency of 0.98! We can also modify the program slightly to observe the effect of false-sharing. The program can also be used to assess the secondary cache line size. 7

Execution time of the compute_pi program. 8

Synchronization Primitives in Pthreads When multiple threads attempt to manipulate the same data item, the results can often be incoherent if proper care is not taken to synchronize them. Consider: /* each thread tries to update variable best_cost as follows */ if (my_cost < best_cost) best_cost = my_cost; Assume that there are two threads, the initial value of best_cost is 100, and the values of my_cost are 50 and 75 at threads t1 and t2. Depending on the schedule of the threads, the value of best_cost could be 50 or 75! The value 75 does not correspond to any serialization of the threads. 9

Mutual Exclusion The code in the previous example corresponds to a critical segment; i.e., a segment that must be executed by only one thread at any time. Critical segments in Pthreads are implemented using mutex locks. Mutex-locks have two states: locked and unlocked. At any point of time, only one thread can lock a mutex lock. A lock is an atomic operation. A thread entering a critical segment first tries to get a lock. It goes ahead when the lock is granted. 10

Mutual Exclusion The Pthreads API provides the following functions for handling mutex-locks: int pthread_mutex_lock ( pthread_mutex_t *mutex_lock); int pthread_mutex_unlock ( pthread_mutex_t *mutex_lock); int pthread_mutex_init ( pthread_mutex_t *mutex_lock, const pthread_mutexattr_t *lock_attr); 11

How to use locks in pthread? We can now write our previously incorrect code segment as: pthread_mutex_t minimum_value_lock;... main() {.... pthread_mutex_init(&minimum_value_lock, NULL);.... } void *find_min(void *list_ptr) {.... pthread_mutex_lock(&minimum_value_lock); if (my_min < minimum_value) minimum_value = my_min; /* and unlock the mutex */ pthread_mutex_unlock(&minimum_value_lock); } 12

Producer-Consumer Using Locks The producer-consumer scenario imposes the following constraints: The producer thread must not overwrite the shared buffer when the previous task has not been picked up by a consumer thread. The consumer threads must not pick up tasks until there is something present in the shared data structure. Individual consumer threads should pick up tasks one at a time. 13

How does the producer behave? pthread_mutex_t task_queue_lock; int task_available;... main() {.... task_available = 0; pthread_mutex_init(&task_queue_lock, NULL);.... } void *producer(void *producer_thread_data) {.... while (!done()) { inserted = 0; create_task(&my_task); while (inserted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 0) { insert_into_queue(my_task); task_available = 1; inserted = 1; } pthread_mutex_unlock(&task_queue_lock); } 14

How does the consumer behave? Why we need a mutex? void *consumer(void *consumer_thread_data) { int extracted; struct task my_task; /* local data structure declarations */ while (!done()) { extracted = 0; while (extracted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 1) { extract_from_queue(&my_task); task_available = 0; extracted = 1; } pthread_mutex_unlock(&task_queue_lock); } process_task(my_task); } 15

Normal Mutex vs. Recursive Mutex Normal mutex deadlocks if a thread that already has a lock tries a second lock on it. A recursive mutex allows a single thread to lock a mutex as many times as it wants. It simply increments a count on the number of locks. A lock is relinquished by a thread when the count becomes zero. 16

Error-Check Mutexes An error check mutex reports an error when a thread with a lock tries to lock it again (as opposed to deadlocking in the first case, or granting the lock, as in the second case). The type of the mutex can be set in the attributes object before it is passed at time of initialization. 17

Overheads of Locking Locks represent serialization points. What happens if we encapsulate large segments of the program within locks? Reduce idling overhead: int pthread_mutex_trylock ( pthread_mutex_t *mutex_lock); pthread_mutex_trylock is typically much faster than pthread_mutex_lock on typical systems since it does not have to deal with queues associated with locks for multiple threads waiting on the lock. 18

Alleviating Locking Overhead (Example) /* Finding k matches in a list */ void *find_entries(void *start_pointer) { /* This is the thread function */ struct database_record *next_record; int count; current_pointer = start_pointer; do { next_record = find_next_entry(current_pointer); count = output_record(next_record); } while (count < requested_number_of_records); } int output_record(struct database_record *record_ptr) { int count; pthread_mutex_lock(&output_count_lock); output_count ++; count = output_count; pthread_mutex_unlock(&output_count_lock); if (count <= requested_number_of_records) print_record(record_ptr); return (count); } 19