Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Programming with PThreads. Threads  Sometimes called a lightweight process  smaller execution unit than a process  Consists of:  program.

Similar presentations


Presentation on theme: "Parallel Programming with PThreads. Threads  Sometimes called a lightweight process  smaller execution unit than a process  Consists of:  program."— Presentation transcript:

1 Parallel Programming with PThreads

2 Threads  Sometimes called a lightweight process  smaller execution unit than a process  Consists of:  program counter  register set  stack space  Threads share:  memory space  code section  OS resources(open files, signals, etc.)  Sometimes called a lightweight process  smaller execution unit than a process  Consists of:  program counter  register set  stack space  Threads share:  memory space  code section  OS resources(open files, signals, etc.)

3 Threads

4 POSIX Threads  Thread API available on many OS’s  #include  cc myprog.c –o myprog -lpthread  Thread creation  int pthread_create(pthread_t * thread, pthread_attr_t * attr, void * (*start_routine)(void *), void * arg);  Thread termination  void pthread_exit(void *retval);  Waiting for Threads  int pthread_join(pthread_t th, void **thread_return);  Thread API available on many OS’s  #include  cc myprog.c –o myprog -lpthread  Thread creation  int pthread_create(pthread_t * thread, pthread_attr_t * attr, void * (*start_routine)(void *), void * arg);  Thread termination  void pthread_exit(void *retval);  Waiting for Threads  int pthread_join(pthread_t th, void **thread_return);

5

6 Thread Issues  Timing  False Sharing  Variables are not shared but are so close together, they are within the same cache line.  writes to a shared cache line invalidate other processes caches.  Locking Overhead  Timing  False Sharing  Variables are not shared but are so close together, they are within the same cache line.  writes to a shared cache line invalidate other processes caches.  Locking Overhead

7 Thread Timing  Sc enario 1:  Thread T1 creates thread T2  T2 requires data from T1 that will be placed in global memory  T1 places the data in global memory after it creates T2  Assumes T1 will be able to place the data before T2 starts or before T2 needs the data  Scenario 2:  T1 creates T2 and needs to pass data to T2  T1 gives T2 a pointer to a variable on its stack  What happens if / when T1 finishes?  Sc enario 1:  Thread T1 creates thread T2  T2 requires data from T1 that will be placed in global memory  T1 places the data in global memory after it creates T2  Assumes T1 will be able to place the data before T2 starts or before T2 needs the data  Scenario 2:  T1 creates T2 and needs to pass data to T2  T1 gives T2 a pointer to a variable on its stack  What happens if / when T1 finishes?

8 Thread Timing  Set up all requirements before creating the thread  It is possible that the created thread will start and run to completion before the creating thread gets scheduled again  Producer – Consumer relationships  make sure the data is placed before it is needed (Don’t rely on timing)  make sure the data is there before consuming (Don’t rely on timing)  make sure the data lasts until all potential consumers have consumed the data (Don’t rely on timing)  Use synchronization primitives to enforce order  Set up all requirements before creating the thread  It is possible that the created thread will start and run to completion before the creating thread gets scheduled again  Producer – Consumer relationships  make sure the data is placed before it is needed (Don’t rely on timing)  make sure the data is there before consuming (Don’t rely on timing)  make sure the data lasts until all potential consumers have consumed the data (Don’t rely on timing)  Use synchronization primitives to enforce order

9 False Sharing Location ALocation B Cache Line Location ALocation B Cache Memory Read A Write A (invalidate cache line) Location ALocation B Cache P1P2 Read B Write B (invalidate cache line) Variables are not shared but are so close together, they are within the same cache line.

10 Effects of False Sharing

11 Synchronization Primitives  int pthread_mutex_init( pthread_mutex_t *mutex_lock, const pthread_mutexattr_t *lock_attr);  int pthread_mutex_lock( pthread_mutex_t *mutex_lock);  int pthread_mutex_unlock( pthread_mutex_t *mutex_lock);  int pthread_mutex_trylock( pthread_mutex_t *mutex_lock);  int pthread_mutex_init( pthread_mutex_t *mutex_lock, const pthread_mutexattr_t *lock_attr);  int pthread_mutex_lock( pthread_mutex_t *mutex_lock);  int pthread_mutex_unlock( pthread_mutex_t *mutex_lock);  int pthread_mutex_trylock( pthread_mutex_t *mutex_lock);

12 #include void *find_min(void *list_ptr) pthread_mutex_t minimum_value_lock; int minimum_value, partial_list_size; main(){ minimum_value = MIN_INT; pthread_init(); pthread_mutex_init(&minimum_value_lock, NULL); /*inititalize lists etc, create and join threads*/ } void *find_min(void *list_ptr){ int *partial_list_ptr, my_min = MIN_INT, i; partial_list_ptr = (int *)list_ptr; for (i = 0; i < partial_list_size; i++) if (partial_list_ptr[i] < my_min) my_min = partial_list_ptr[i]; pthread_mutex_lock(minimum_value_lock); if (my_min < minimum_value) minimum_value = my_min; pthread_mutex_unlock(minimum_value_lock); pthread_exit(0); }

13 Locking Overhead  Serialization points  Minimize the size of critical sections  Be careful  Rather than wait, check if lock is available  pthread_mutex_trylock  If already locked, will return EBUSY  Will require restructuring of code  Serialization points  Minimize the size of critical sections  Be careful  Rather than wait, check if lock is available  pthread_mutex_trylock  If already locked, will return EBUSY  Will require restructuring of code

14 /* Finding k matches in a list */ void *find_entries(void *start_pointer) { /* This is the thread function */ struct database_record *next_record; int count; current_pointer = start_pointer; do { next_record = find_next_entry(current_pointer); count = output_record(next_record); } while (count < requested_number_of_records); } int output_record(struct database_record *record_ptr) { int count; pthread_mutex_lock(&output_count_lock); output_count ++; count = output_count; pthread_mutex_unlock(&output_count_lock); if (count <= requested_number_of_records) print_record(record_ptr); return (count); }

15 /* rewritten output_record function */ int output_record(struct database_record *record_ptr) { int count; int lock_status; lock_status=pthread_mutex_trylock(&output_count_lock); if (lock_status == EBUSY) { insert_into_local_list(record_ptr); return(0); } else { count = output_count; output_count += number_on_local_list + 1; pthread_mutex_unlock(&output_count_lock); print_records(record_ptr, local_list, requested_number_of_records - count); return(count + number_on_local_list + 1); }

16 Mutex features/issues  Limited to just mutex  Use posix semaphores for more control  Can only have one process in mutex  What if you get in and then realize things aren’t quite ready yet?  Must exit the mutex and start over  Can we avoid going to the back of the line?  Limited to just mutex  Use posix semaphores for more control  Can only have one process in mutex  What if you get in and then realize things aren’t quite ready yet?  Must exit the mutex and start over  Can we avoid going to the back of the line?

17 Condition Variables for Synchronization  A condition variable allows a thread to block itself until specified data reaches a predefined state.  A condition variable is associated with a predicate.  When the predicate becomes true, the condition variable is used to signal one or more threads waiting on the condition.  A single condition variable may be associated with more than one predicate.  A condition variable always has a mutex associated with it.  A thread locks this mutex and tests the predicate defined on the shared variable.  If the predicate is not true, the thread waits on the condition variable associated with the predicate using the function pthread_cond_wait.  A condition variable allows a thread to block itself until specified data reaches a predefined state.  A condition variable is associated with a predicate.  When the predicate becomes true, the condition variable is used to signal one or more threads waiting on the condition.  A single condition variable may be associated with more than one predicate.  A condition variable always has a mutex associated with it.  A thread locks this mutex and tests the predicate defined on the shared variable.  If the predicate is not true, the thread waits on the condition variable associated with the predicate using the function pthread_cond_wait.

18 Using Condition Variables Main Thread * Declare and initialize global data/variables which require synchronization (such as "count") * Declare and initialize global data/variables which require synchronization (such as "count") * Declare and initialize a condition variable object * Declare and initialize a condition variable object * Declare and initialize an associated mutex * Declare and initialize an associated mutex * Create threads A and B to do work * Create threads A and B to do work Thread A * Do work up to the point where a certain condition must occur (such as "count" must reach a specified value) * Do work up to the point where a certain condition must occur (such as "count" must reach a specified value) * Lock associated mutex and check value of a global variable * Lock associated mutex and check value of a global variable * Call pthread_cond_wait() to perform a blocking wait for signal from Thread-B. * Call pthread_cond_wait() to perform a blocking wait for signal from Thread-B. Note that a call to pthread_cond_wait() automatically and atomically unlocks Note that a call to pthread_cond_wait() automatically and atomically unlocks the associated mutex variable so that it can be used by Thread-B. the associated mutex variable so that it can be used by Thread-B. * When signalled, wake up. Mutex is automatically and atomically locked. * When signalled, wake up. Mutex is automatically and atomically locked. * Explicitly unlock mutex * Explicitly unlock mutex * Continue * Continue Thread B * Do work * Do work * Lock associated mutex * Lock associated mutex * Change the value of the global variable that Thread-A is waiting upon. * Change the value of the global variable that Thread-A is waiting upon. * Check value of the global Thread-A wait variable. If it fulfills the desired condition, signal Thread-A. * Check value of the global Thread-A wait variable. If it fulfills the desired condition, signal Thread-A. * Unlock mutex. * Unlock mutex. * Continue * Continue

19 Condition Variables for Synchronization  Pthreads provides the following functions for condition variables: int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex); int pthread_cond_signal(pthread_cond_t *cond); int pthread_cond_broadcast(pthread_cond_t *cond); int pthread_cond_init(pthread_cond_t *cond, const pthread_condattr_t *attr); int pthread_cond_destroy(pthread_cond_t *cond);  Pthreads provides the following functions for condition variables: int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex); int pthread_cond_signal(pthread_cond_t *cond); int pthread_cond_broadcast(pthread_cond_t *cond); int pthread_cond_init(pthread_cond_t *cond, const pthread_condattr_t *attr); int pthread_cond_destroy(pthread_cond_t *cond);

20 Producer-Consumer Using Locks pthread_mutex_t task_queue_lock; int task_available;... main() {.... task_available = 0; pthread_mutex_init(&task_queue_lock, NULL);.... } void *producer(void *producer_thread_data) {.... while (!done()) { inserted = 0; create_task(&my_task); while (inserted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 0) { insert_into_queue(my_task); task_available = 1; inserted = 1; } pthread_mutex_unlock(&task_queue_lock); } pthread_mutex_t task_queue_lock; int task_available;... main() {.... task_available = 0; pthread_mutex_init(&task_queue_lock, NULL);.... } void *producer(void *producer_thread_data) {.... while (!done()) { inserted = 0; create_task(&my_task); while (inserted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 0) { insert_into_queue(my_task); task_available = 1; inserted = 1; } pthread_mutex_unlock(&task_queue_lock); }

21 Producer-Consumer Using Locks void *consumer(void *consumer_thread_data) { int extracted; struct task my_task; /* local data structure declarations */ while (!done()) { extracted = 0; while (extracted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 1) { extract_from_queue(&my_task); task_available = 0; extracted = 1; } pthread_mutex_unlock(&task_queue_lock); } process_task(my_task); } void *consumer(void *consumer_thread_data) { int extracted; struct task my_task; /* local data structure declarations */ while (!done()) { extracted = 0; while (extracted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 1) { extract_from_queue(&my_task); task_available = 0; extracted = 1; } pthread_mutex_unlock(&task_queue_lock); } process_task(my_task); }

22 Producer-Consumer Using Condition Variables pthread_cond_t cond_queue_empty, cond_queue_full; pthread_mutex_t task_queue_cond_lock; int task_available; /* other data structures here */ main() { /* declarations and initializations */ task_available = 0; pthread_init(); pthread_cond_init(&cond_queue_empty, NULL); pthread_cond_init(&cond_queue_full, NULL); pthread_mutex_init(&task_queue_cond_lock, NULL); /* create and join producer and consumer threads */ } pthread_cond_t cond_queue_empty, cond_queue_full; pthread_mutex_t task_queue_cond_lock; int task_available; /* other data structures here */ main() { /* declarations and initializations */ task_available = 0; pthread_init(); pthread_cond_init(&cond_queue_empty, NULL); pthread_cond_init(&cond_queue_full, NULL); pthread_mutex_init(&task_queue_cond_lock, NULL); /* create and join producer and consumer threads */ }

23 Producer-Consumer Using Condition Variables void *producer(void *producer_thread_data) { int inserted; while (!done()) { create_task(); pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 1) pthread_cond_wait(&cond_queue_empty, &task_queue_cond_lock); insert_into_queue(); task_available = 1; pthread_cond_signal(&cond_queue_full); pthread_mutex_unlock(&task_queue_cond_lock); } void *producer(void *producer_thread_data) { int inserted; while (!done()) { create_task(); pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 1) pthread_cond_wait(&cond_queue_empty, &task_queue_cond_lock); insert_into_queue(); task_available = 1; pthread_cond_signal(&cond_queue_full); pthread_mutex_unlock(&task_queue_cond_lock); }

24 Producer-Consumer Using Condition Variables void *consumer(void *consumer_thread_data) { while (!done()) { pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 0) pthread_cond_wait(&cond_queue_full, &task_queue_cond_lock); my_task = extract_from_queue(); task_available = 0; pthread_cond_signal(&cond_queue_empty); pthread_mutex_unlock(&task_queue_cond_lock); process_task(my_task); } void *consumer(void *consumer_thread_data) { while (!done()) { pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 0) pthread_cond_wait(&cond_queue_full, &task_queue_cond_lock); my_task = extract_from_queue(); task_available = 0; pthread_cond_signal(&cond_queue_empty); pthread_mutex_unlock(&task_queue_cond_lock); process_task(my_task); }

25 Condition Variables  Rather than just signaling one blocked thread, we can signal all  int pthread_cond_broadcast(pthread_cond_t *cond)  Can also have a timeout  int pthread_cond_timedwait( pthread_cond_t *cond, pthread_mutex_t *mutex, const struct timespec *abstime)  Rather than just signaling one blocked thread, we can signal all  int pthread_cond_broadcast(pthread_cond_t *cond)  Can also have a timeout  int pthread_cond_timedwait( pthread_cond_t *cond, pthread_mutex_t *mutex, const struct timespec *abstime)

26 #include #define NUM_THREADS 3 #define TCOUNT 10 #define COUNT_LIMIT 12 int count = 0; int thread_ids[5] = {0,1,2,3,4}; pthread_mutex_t count_mutex; pthread_cond_t count_threshold_cv; int main(int argc, char *argv[]) { int i, rc; pthread_t threads[5]; pthread_attr_t attr; /* Initialize mutex and condition variable objects */ pthread_mutex_init(&count_mutex, NULL); pthread_cond_init (&count_threshold_cv, NULL); /* For portability, explicitly create threads in a joinable state */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); pthread_create(&threads[4], &attr, watch_count, (void *)&thread_ids[4]); pthread_create(&threads[3], &attr, watch_count, (void *)&thread_ids[3]); pthread_create(&threads[2], &attr, watch_count, (void *)&thread_ids[2]); pthread_create(&threads[1], &attr, inc_count, (void *)&thread_ids[1]); pthread_create(&threads[0], &attr, inc_count, (void *)&thread_ids[0]); /* Wait for all threads to complete */ for (i = 0; i < NUM_THREADS; i++) { pthread_join(threads[i], NULL); } printf ("Main(): Waited on %d threads. Done.\n", NUM_THREADS); /* Clean up and exit */ pthread_attr_destroy(&attr); pthread_mutex_destroy(&count_mutex); pthread_cond_destroy(&count_threshold_cv); pthread_exit (NULL); } #include #define NUM_THREADS 3 #define TCOUNT 10 #define COUNT_LIMIT 12 int count = 0; int thread_ids[5] = {0,1,2,3,4}; pthread_mutex_t count_mutex; pthread_cond_t count_threshold_cv; int main(int argc, char *argv[]) { int i, rc; pthread_t threads[5]; pthread_attr_t attr; /* Initialize mutex and condition variable objects */ pthread_mutex_init(&count_mutex, NULL); pthread_cond_init (&count_threshold_cv, NULL); /* For portability, explicitly create threads in a joinable state */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); pthread_create(&threads[4], &attr, watch_count, (void *)&thread_ids[4]); pthread_create(&threads[3], &attr, watch_count, (void *)&thread_ids[3]); pthread_create(&threads[2], &attr, watch_count, (void *)&thread_ids[2]); pthread_create(&threads[1], &attr, inc_count, (void *)&thread_ids[1]); pthread_create(&threads[0], &attr, inc_count, (void *)&thread_ids[0]); /* Wait for all threads to complete */ for (i = 0; i < NUM_THREADS; i++) { pthread_join(threads[i], NULL); } printf ("Main(): Waited on %d threads. Done.\n", NUM_THREADS); /* Clean up and exit */ pthread_attr_destroy(&attr); pthread_mutex_destroy(&count_mutex); pthread_cond_destroy(&count_threshold_cv); pthread_exit (NULL); }

27 void *inc_count(void *idp) { int j,i; double result=0.0; int *my_id = idp; for (i=0; i < TCOUNT; i++) { pthread_mutex_lock(&count_mutex); count++; /* Check the value of count and signal waiting thread when condition is reached. Note that this occurs while mutex is locked. */ if (count == COUNT_LIMIT) { pthread_cond_broadcast(&count_threshold_cv); printf("inc_count(): thread %d, count = %d Threshold reached.\n", *my_id, count); } printf("inc_count(): thread %d, count = %d, unlocking mutex\n", *my_id, count); pthread_mutex_unlock(&count_mutex); /* Do some work so threads can alternate on mutex lock */ for (j=0; j < 1000; j++) result = result + (double)random(); } pthread_exit(NULL); } void *inc_count(void *idp) { int j,i; double result=0.0; int *my_id = idp; for (i=0; i < TCOUNT; i++) { pthread_mutex_lock(&count_mutex); count++; /* Check the value of count and signal waiting thread when condition is reached. Note that this occurs while mutex is locked. */ if (count == COUNT_LIMIT) { pthread_cond_broadcast(&count_threshold_cv); printf("inc_count(): thread %d, count = %d Threshold reached.\n", *my_id, count); } printf("inc_count(): thread %d, count = %d, unlocking mutex\n", *my_id, count); pthread_mutex_unlock(&count_mutex); /* Do some work so threads can alternate on mutex lock */ for (j=0; j < 1000; j++) result = result + (double)random(); } pthread_exit(NULL); } void *watch_count(void *idp) { int *my_id = idp; printf("Starting watch_count(): thread %d\n", *my_id); /* Lock mutex and wait for signal. Note that the pthread_cond_wait routine will automatically and atomically unlock mutex while it waits. Also, note that if COUNT_LIMIT is reached before this routine is run by the waiting thread, the loop will be skipped to prevent pthread_cond_wait from never returning. */ pthread_mutex_lock(&count_mutex); if (count < COUNT_LIMIT) { pthread_cond_wait(&count_threshold_cv, &count_mutex); printf("watch_count(): thread %d Condition signal received.\n", *my_id); } pthread_mutex_unlock(&count_mutex); pthread_exit(NULL); } void *watch_count(void *idp) { int *my_id = idp; printf("Starting watch_count(): thread %d\n", *my_id); /* Lock mutex and wait for signal. Note that the pthread_cond_wait routine will automatically and atomically unlock mutex while it waits. Also, note that if COUNT_LIMIT is reached before this routine is run by the waiting thread, the loop will be skipped to prevent pthread_cond_wait from never returning. */ pthread_mutex_lock(&count_mutex); if (count < COUNT_LIMIT) { pthread_cond_wait(&count_threshold_cv, &count_mutex); printf("watch_count(): thread %d Condition signal received.\n", *my_id); } pthread_mutex_unlock(&count_mutex); pthread_exit(NULL); } int count = 0; int thread_ids[5] = {0,1,2,3,4}; pthread_mutex_t count_mutex; pthread_cond_t count_threshold_cv; int count = 0; int thread_ids[5] = {0,1,2,3,4}; pthread_mutex_t count_mutex; pthread_cond_t count_threshold_cv;

28 Composite Synchronization Constructs  By design, Pthreads provide support for a basic set of operations.  Higher level constructs can be built using basic synchronization constructs.  Consider Read-Write Locks and Barriers  By design, Pthreads provide support for a basic set of operations.  Higher level constructs can be built using basic synchronization constructs.  Consider Read-Write Locks and Barriers

29 Barriers  A barrier holds a thread until all threads participating in the barrier have reached it.  Some versions of the Pthreads library support barriers (not required)  pthread_barrier_t barr;  attributes = NULL  pthread_barrier_init(&barr, attributes, nthreads)  pthread_barrier_wait(&barr)  pthread_barrier_destroy(&barr)  pthread barriers are available on most Linux implementations.  A barrier holds a thread until all threads participating in the barrier have reached it.  Some versions of the Pthreads library support barriers (not required)  pthread_barrier_t barr;  attributes = NULL  pthread_barrier_init(&barr, attributes, nthreads)  pthread_barrier_wait(&barr)  pthread_barrier_destroy(&barr)  pthread barriers are available on most Linux implementations.

30 Barrier Implementation  Barriers can be implemented using a counter, a mutex and a condition variable.  A single integer is used to keep track of the number of threads that have reached the barrier.  If the count is less than the total number of threads, the threads execute a condition wait.  The last thread entering (and setting the count to the number of threads) wakes up all the threads using a condition broadcast.  Barriers can be implemented using a counter, a mutex and a condition variable.  A single integer is used to keep track of the number of threads that have reached the barrier.  If the count is less than the total number of threads, the threads execute a condition wait.  The last thread entering (and setting the count to the number of threads) wakes up all the threads using a condition broadcast.

31 Barriers typedef struct { pthread_mutex_t count_lock; pthread_cond_t ok_to_proceed; int count; } mylib_barrier_t; void mylib_init_barrier(mylib_barrier_t *b) { b -> count = 0; pthread_mutex_init(&(b -> count_lock), NULL); pthread_cond_init(&(b -> ok_to_proceed), NULL); } typedef struct { pthread_mutex_t count_lock; pthread_cond_t ok_to_proceed; int count; } mylib_barrier_t; void mylib_init_barrier(mylib_barrier_t *b) { b -> count = 0; pthread_mutex_init(&(b -> count_lock), NULL); pthread_cond_init(&(b -> ok_to_proceed), NULL); }

32 Barriers void mylib_barrier (mylib_barrier_t *b, int num_threads) { pthread_mutex_lock(&(b -> count_lock)); b -> count ++; if (b -> count == num_threads) { b -> count = 0; pthread_cond_broadcast(&(b -> ok_to_proceed)); } else while (pthread_cond_wait(&(b -> ok_to_proceed), &(b -> count_lock)) != 0); pthread_mutex_unlock(&(b -> count_lock)); } void mylib_barrier (mylib_barrier_t *b, int num_threads) { pthread_mutex_lock(&(b -> count_lock)); b -> count ++; if (b -> count == num_threads) { b -> count = 0; pthread_cond_broadcast(&(b -> ok_to_proceed)); } else while (pthread_cond_wait(&(b -> ok_to_proceed), &(b -> count_lock)) != 0); pthread_mutex_unlock(&(b -> count_lock)); }

33 Barriers  Linear barrier.  The trivial lower bound on execution time of this function is O(n) for n threads.  This implementation of a barrier can be speeded up using multiple barrier variables organized in a tree.  Use n/2 condition variable-mutex pairs for implementing a barrier for n threads.  At the lowest level, threads are paired up and each pair of threads shares a single condition variable-mutex pair.  Once both threads arrive, one of the two moves on, the other one waits.  This process repeats up the tree.  This is called a log barrier and its runtime grows as O(log p).  Linear barrier.  The trivial lower bound on execution time of this function is O(n) for n threads.  This implementation of a barrier can be speeded up using multiple barrier variables organized in a tree.  Use n/2 condition variable-mutex pairs for implementing a barrier for n threads.  At the lowest level, threads are paired up and each pair of threads shares a single condition variable-mutex pair.  Once both threads arrive, one of the two moves on, the other one waits.  This process repeats up the tree.  This is called a log barrier and its runtime grows as O(log p).

34 Barrier  Execution time of 1000 sequential and logarithmic barriers as a function of number of threads on a 32 processor SGI Origin 2000.

35 Barriers  Use pthread condition variables and mutexes.  Is this the best way?  Forces a thread to sleep and give up the processor  Rather than give up the processor, just wait on a variable.  busy wait  Will it be faster?  Use pthread condition variables and mutexes.  Is this the best way?  Forces a thread to sleep and give up the processor  Rather than give up the processor, just wait on a variable.  busy wait  Will it be faster?

36 QbusyBarrier void qbusy_init_barrier(qbusy_barrier_t *b, int nthreads) { b -> wait = (int *)malloc(nthreads * sizeof(int)); b -> count = 0; pthread_mutex_init(&(b -> count_lock), NULL); } void qbusy_barrier (qbusy_barrier_t *b, int iproc, int num_threads) { int i; float *tmp; if (num_threads == 1) return; b->wait[iproc] = 1; pthread_mutex_lock(&(b -> count_lock)); b -> count ++; if (b -> count == num_threads) { pthread_mutex_unlock(&(b -> count_lock)); /* Now release the hounds */ for (i = 0; i < num_threads; i++) b->wait[i] = 0; } else { pthread_mutex_unlock(&(b -> count_lock)); while(b->wait[iproc]); } void qbusy_init_barrier(qbusy_barrier_t *b, int nthreads) { b -> wait = (int *)malloc(nthreads * sizeof(int)); b -> count = 0; pthread_mutex_init(&(b -> count_lock), NULL); } void qbusy_barrier (qbusy_barrier_t *b, int iproc, int num_threads) { int i; float *tmp; if (num_threads == 1) return; b->wait[iproc] = 1; pthread_mutex_lock(&(b -> count_lock)); b -> count ++; if (b -> count == num_threads) { pthread_mutex_unlock(&(b -> count_lock)); /* Now release the hounds */ for (i = 0; i < num_threads; i++) b->wait[i] = 0; } else { pthread_mutex_unlock(&(b -> count_lock)); while(b->wait[iproc]); }

37 Read-Write Locks  In many applications, a data structure is read frequently but written infrequently.  For such applications, we should use read-write locks.  A read lock is granted when there are other threads that may already have read locks.  If there is a write lock on the data (or if there are queued write locks), the thread performs a condition wait.  If there are multiple threads requesting a write lock, they must perform a condition wait.  In many applications, a data structure is read frequently but written infrequently.  For such applications, we should use read-write locks.  A read lock is granted when there are other threads that may already have read locks.  If there is a write lock on the data (or if there are queued write locks), the thread performs a condition wait.  If there are multiple threads requesting a write lock, they must perform a condition wait.

38 Read-Write Locks  The lock data type mylib_rwlock_t holds the following:  a count of the number of readers,  the writer (a 0/1 integer specifying whether a writer is present),  a condition variable readers_proceed that is signaled when readers can proceed,  a condition variable writer_proceed that is signaled when one of the writers can proceed,  a count pending_writers of pending writers, and  a mutex read_write_lock associated with the shared data structure  The lock data type mylib_rwlock_t holds the following:  a count of the number of readers,  the writer (a 0/1 integer specifying whether a writer is present),  a condition variable readers_proceed that is signaled when readers can proceed,  a condition variable writer_proceed that is signaled when one of the writers can proceed,  a count pending_writers of pending writers, and  a mutex read_write_lock associated with the shared data structure

39 Read-Write Locks typedef struct { int readers; int writer; pthread_cond_t readers_proceed; pthread_cond_t writer_proceed; int pending_writers; pthread_mutex_t read_write_lock; } mylib_rwlock_t; void mylib_rwlock_init (mylib_rwlock_t *l) { l -> readers = l -> writer = l -> pending_writers = 0; pthread_mutex_init(&(l -> read_write_lock), NULL); pthread_cond_init(&(l -> readers_proceed), NULL); pthread_cond_init(&(l -> writer_proceed), NULL); } typedef struct { int readers; int writer; pthread_cond_t readers_proceed; pthread_cond_t writer_proceed; int pending_writers; pthread_mutex_t read_write_lock; } mylib_rwlock_t; void mylib_rwlock_init (mylib_rwlock_t *l) { l -> readers = l -> writer = l -> pending_writers = 0; pthread_mutex_init(&(l -> read_write_lock), NULL); pthread_cond_init(&(l -> readers_proceed), NULL); pthread_cond_init(&(l -> writer_proceed), NULL); }

40 Read-Write Locks void mylib_rwlock_rlock(mylib_rwlock_t *l) { /* if there is a write lock or pending writers, perform condition wait.. else increment count of readers and grant read lock */ pthread_mutex_lock(&(l -> read_write_lock)); while ((l -> pending_writers > 0) || (l -> writer > 0)) pthread_cond_wait(&(l -> readers_proceed), &(l -> read_write_lock)); l -> readers ++; pthread_mutex_unlock(&(l -> read_write_lock)); } void mylib_rwlock_rlock(mylib_rwlock_t *l) { /* if there is a write lock or pending writers, perform condition wait.. else increment count of readers and grant read lock */ pthread_mutex_lock(&(l -> read_write_lock)); while ((l -> pending_writers > 0) || (l -> writer > 0)) pthread_cond_wait(&(l -> readers_proceed), &(l -> read_write_lock)); l -> readers ++; pthread_mutex_unlock(&(l -> read_write_lock)); }

41 Read-Write Locks void mylib_rwlock_wlock(mylib_rwlock_t *l) { /* if there are readers or writers, increment pending writers count and wait. On being woken, decrement pending writers count and increment writer count */ pthread_mutex_lock(&(l -> read_write_lock)); while ((l -> writer > 0) || (l -> readers > 0)) { l -> pending_writers ++; pthread_cond_wait(&(l -> writer_proceed), &(l -> read_write_lock)); } l -> pending_writers --; l -> writer ++; pthread_mutex_unlock(&(l -> read_write_lock)); } void mylib_rwlock_wlock(mylib_rwlock_t *l) { /* if there are readers or writers, increment pending writers count and wait. On being woken, decrement pending writers count and increment writer count */ pthread_mutex_lock(&(l -> read_write_lock)); while ((l -> writer > 0) || (l -> readers > 0)) { l -> pending_writers ++; pthread_cond_wait(&(l -> writer_proceed), &(l -> read_write_lock)); } l -> pending_writers --; l -> writer ++; pthread_mutex_unlock(&(l -> read_write_lock)); }

42 Read-Write Locks void mylib_rwlock_unlock(mylib_rwlock_t *l) { /* if there is a write lock then unlock, else if there are read locks, decrement count of read locks. If the count is 0 and there is a pending writer, let it through, else if there are pending readers, let them all go through */ pthread_mutex_lock(&(l -> read_write_lock)); if (l -> writer > 0) l -> writer = 0; else if (l -> readers > 0) l -> readers --; pthread_mutex_unlock(&(l -> read_write_lock)); if ((l -> readers == 0) && (l -> pending_writers > 0)) pthread_cond_signal(&(l -> writer_proceed)); else if (l -> readers > 0) pthread_cond_broadcast(&(l -> readers_proceed)); } void mylib_rwlock_unlock(mylib_rwlock_t *l) { /* if there is a write lock then unlock, else if there are read locks, decrement count of read locks. If the count is 0 and there is a pending writer, let it through, else if there are pending readers, let them all go through */ pthread_mutex_lock(&(l -> read_write_lock)); if (l -> writer > 0) l -> writer = 0; else if (l -> readers > 0) l -> readers --; pthread_mutex_unlock(&(l -> read_write_lock)); if ((l -> readers == 0) && (l -> pending_writers > 0)) pthread_cond_signal(&(l -> writer_proceed)); else if (l -> readers > 0) pthread_cond_broadcast(&(l -> readers_proceed)); }

43 Semaphores  Synchronization tool provided by the OS  Integer variable and 2 operations  Wait(s) while (s <= 0) do noop; /*sleep*/ s = s - 1;  Signal(s) s = s + 1;  All modifications to s are atomic  Synchronization tool provided by the OS  Integer variable and 2 operations  Wait(s) while (s <= 0) do noop; /*sleep*/ s = s - 1;  Signal(s) s = s + 1;  All modifications to s are atomic

44 The critical section problem repeat wait(mutex) critical section signal(mutex) remainder section until false Shared semaphore: mutex = 1;

45 Using semaphores  Two processes P1 and P2  Statements S1 and S2  S2 must execute only after S1  Two processes P1 and P2  Statements S1 and S2  S2 must execute only after S1 P1: S1; signal(synch); P2: wait(synch); S2;

46 Bounded Buffer Solution repeat produce an item in nextp wait(empty); wait(mutex); add nextp to the buffer signal(mutex); signal(full); until false repeat wait(full); wait(mutex); remove an item from buffer place it in nextc signal(mutex); signal(empty); consume the item in nextc until false Shared semaphore: empty = n, full = 0, mutex = 1;

47 Readers - Writers (priority?) Shared Semaphore mutex=1, wrt = 1; Shared integerreadcount = 0; wait(wrt); write to the data object signal(wrt); wait(mutex); readcount = readcount + 1; if (readcount == 1) wait(wrt); signal(mutex); read the data wait(mutex); readcount = readcount - 1; if (readcount == 0) signal(wrt); signal(mutex);

48 Readers – Writers (priority?) wait (outerQ) wait (rsem) wait (rmutex) readcnt++ if (readcnt == 1) wait (wsem) signal(rmutex) signal (rsem) signal (outerQ) READ wait (rmutex) readcnt--; if (readcnt == 0) signal (wsem) signal (rmutex) wait (wsem) writecnt++; if (writecnt == 1) wait (rsem) signal (wsem) wait (wmutex) WRITE signal (wmutex) wait (wsem) writecnt--; if (writecnt == 0) signal (rsem) signal (wsem) outerQ, rsem, rmutex, wmutex, wsem: = 1

49 Unix Semaphores  Are a generalization of the counting semaphores (more operations are permitted).  A semaphore includes:  the current value S of the semaphore  number of processes waiting for S to increase  number of processes waiting for S to be 0  System calls  semget creates an array of semaphores  semctl allows for the initialization of semaphores  semop performs a list of operations: one on each semaphore (atomically)  Are a generalization of the counting semaphores (more operations are permitted).  A semaphore includes:  the current value S of the semaphore  number of processes waiting for S to increase  number of processes waiting for S to be 0  System calls  semget creates an array of semaphores  semctl allows for the initialization of semaphores  semop performs a list of operations: one on each semaphore (atomically)

50 Unix Semaphores  Each operation to be done is specified by a value sop.  Let S be the semaphore value  if sop > 0: (signal operation)  S is incremented and process awaiting for S to increase are awaken  if sop = 0:  If S=0: do nothing  if S!=0, block the current process on the event that S=0  if sop < 0: (wait operation)  if S >= | sop | then S = S - | sop | then if S <=0 wait  Each operation to be done is specified by a value sop.  Let S be the semaphore value  if sop > 0: (signal operation)  S is incremented and process awaiting for S to increase are awaken  if sop = 0:  If S=0: do nothing  if S!=0, block the current process on the event that S=0  if sop < 0: (wait operation)  if S >= | sop | then S = S - | sop | then if S <=0 wait

51


Download ppt "Parallel Programming with PThreads. Threads  Sometimes called a lightweight process  smaller execution unit than a process  Consists of:  program."

Similar presentations


Ads by Google