Download presentation
Presentation is loading. Please wait.
Published byEthan Weaver Modified over 9 years ago
1
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita
2
04/10/25Parallel and Distributed Programming2 Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
3
04/10/25Parallel and Distributed Programming3 Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
4
04/10/25Parallel and Distributed Programming4 Parallel Programming Model Message Passing Model Talked by Imatake-kun just now Shared Memory Model Memory is shared with all process elements Multiprocessor (SMP, SunFire, …) DSM (Distributed Shared Memory) Process elements can communicate each other through the shared memory
5
04/10/25Parallel and Distributed Programming5 Shared Memory Model PE …… Memory
6
04/10/25Parallel and Distributed Programming6 Shared Memory Model Simplicity not necessary to think about the location of the computation data Fast communication (Multiprocessor) not necessary to use networks in process communication Dynamic load sharing the same reason as simplicity
7
04/10/25Parallel and Distributed Programming7 Shared Memory Parallel Programming Multi-thread programming Pthreads OpenMP Parallel Programming model for shared memory multiprocessor
8
04/10/25Parallel and Distributed Programming8 Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
9
04/10/25Parallel and Distributed Programming9 Sample Sequential Program … loop{ for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } } … FDM (Finite Difference Method)
10
04/10/25Parallel and Distributed Programming10 Parallelization Procedure Sequential Computation Decomposition Tasks Assignment Process Elements Orchestration Mapping Processors
11
04/10/25Parallel and Distributed Programming11 Parallelize the Sequential Program Decomposition a task … loop{ for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } } …
12
04/10/25Parallel and Distributed Programming12 Parallelize the Sequential Program Assignment PE Divide the tasks equally among process elements
13
04/10/25Parallel and Distributed Programming13 Parallelize the Sequential Program Orchestration PE need to communicate and to synchronize
14
04/10/25Parallel and Distributed Programming14 Parallelize the Sequential Program Mapping PE Multiprocessor
15
04/10/25Parallel and Distributed Programming15 Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
16
04/10/25Parallel and Distributed Programming16 Multi-thread Programming A process element is a thread cf. a process Memory is shared among all threads generated from the same process Threads can communicate with each other through shared memory
17
04/10/25Parallel and Distributed Programming17 Fork-Join Model Fork Join Parallelized Section Serialized Section Program starts (Main Thread) Main Thread creates new threads Other threads join Main Thread Main Thread continues processing Main Thread
18
04/10/25Parallel and Distributed Programming18 Libraries for Thread Programming Pthreads (C/C++) pthread_create() pthread_join() Java Thread Thread Class / Runnable Interface
19
04/10/25Parallel and Distributed Programming19 Pthreads API (fork/join) pthread_t // thread variable pthread_create ( pthread_t *thread, // thread variable pthread_attr_t *attr, // thread attributes void *(*func)(void *), // start function void *arg // arguments of the function ) pthread_join ( pthread_t thread, // thread variable void **thread_return // the return value of the thread )
20
04/10/25Parallel and Distributed Programming20 Pthreads Parallel Programming #include … void do_sequentially (void){ /* sequential execution */ } main (){ … do_sequentially(); // want to parallelize … }
21
04/10/25Parallel and Distributed Programming21 Pthreads Parallel Programming #include … #include void do_in_parallel (void){ /* parallel execution */ } main (){ pthread_t tid; … pthread_create(&tid, NULL, (void *)do_in_parallel, NULL); do_in_parallel(); pthread_join(tid); … }
22
04/10/25Parallel and Distributed Programming22 Exclusive Access Control int sum = 0; thread_A(){ sum++; } thread_B(){ sum++; } ThreadAThreadB a ← read sum write a → sum a = a + 1 a ← read sum write a → sum a = a + 1 0 0 1 1 sum = 0 sum = 1
23
04/10/25Parallel and Distributed Programming23 Pthreads API ( Exclusive Access Control ) Variable pthread_mutex_t Initialization Function pthread_mutex_init( pthread_mutex_t *mutex, pthread_mutexattr_t *mutexattr ) Lock Function pthread_mutex_lock(pthread_mutex_t *mutex) pthread_mutex_unlock(pthread_mutex_t *mutex)
24
04/10/25Parallel and Distributed Programming24 Exclusive Access Control int sum = 0; pthread_mutex_t mutex; pthread_mutex_init(&mutex, 0) thread_A(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex); } thread_B(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex); } acquire lock sum ++ release lock sum ++ acquire lock release lock ThreadAThreadB
25
04/10/25Parallel and Distributed Programming25 Pthreads API ( Condition Variable ) Variable pthread_cond_t Initialization Function pthread_cond_init( pthread_cond_t *cond, pthread_condattr_t *condattr ) Condition Function pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) pthread_cond_broadcast(pthread_cond_t *cond) pthread_cond_signal(pthread_cond_t *cond);
26
04/10/25Parallel and Distributed Programming26 Condition Wait acquire lock release lock pthread_mutex_lock(&mutex) while( condition is not satisfied ){ pthread_cond_wait(&cond, &mutex); } pthread_mutex_unlock(&mutex); Is condition satisfied? release lock sleep pthread_cond_broadcast pthread_cond_signal pthread_mutex_lock(&mutex) update_condition(); pthread_cond_broadcast(&cond); pthread_mutex_unlock(&mutex); ThreadA ThreadB
27
04/10/25Parallel and Distributed Programming27 Synchronization Synchronization in the sample program n = 0; … pthread_mutex_lock(&mutex); n++; while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex); } pthread_cond_broadcast(&cond); pthread_mutex_unlock(&mutex);
28
04/10/25Parallel and Distributed Programming28 Characteristics of Pthreads troublesome to describe exclusive access control and synchronization likely to be deadlocked still hard to parallelize a given sequential program
29
04/10/25Parallel and Distributed Programming29 Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
30
04/10/25Parallel and Distributed Programming30 What’s OpenMP? specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs Fortran ver1.0 API – Oct.1997 C/C++ ver1.0 API – Oct. 1998
31
04/10/25Parallel and Distributed Programming31 Background of OpenMP spread of shared memory multiprocessors need for common directives in shared memory multiprocessors Each vendors had provided a different set of directives need for simpler and more flexible interface for developing parallel applications Pthread is hard for developers to describe parallel applications
32
04/10/25Parallel and Distributed Programming32 OpenMP API Directives Libraries Environment Variables
33
04/10/25Parallel and Distributed Programming33 Directives C/C++ Fortran #pragma omp directive_name … !$OMP directive_name … If user’s compiler doesn’t support openMP, the directive sentences are ignored and therefore the program can be executed as a sequential program.
34
04/10/25Parallel and Distributed Programming34 Parallel Region the part parallelized by some threads #pragma omp parallel { /* parallel region */ } create some threads at the beginning of the parallel region join at the end of the parallel region
35
04/10/25Parallel and Distributed Programming35 Parallel Region (thread) the number of thread omp_get_num_threads() : get current # of threads omp_set_num_threads(int nthreads) : set # of threads to nthreads $OMP_NUM_THREADS thread ID (0 ~ # of threads-1) omp_get_thread_num() : get thread ID
36
04/10/25Parallel and Distributed Programming36 Work Sharing Construction specify the task assignment inside parallel region for sharing iterations among threads sections sharing sections among threads single executing only by one thread
37
04/10/25Parallel and Distributed Programming37 Example of Work Sharing for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } omp_set_num_threads(4); #pragma omp parallel #pragma omp for for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } omp_set_num_threads(4); #pragma omp parallel for for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } Memory access conflict at i and j makes the computation slow
38
04/10/25Parallel and Distributed Programming38 Data Scoping Attributes specify the data scoping at parallel construction or work sharing construction shared( var_list ) var_list is shared among threads private( var_list ) var_list is private reduction (operator : var_list ) ex) #pragma omp for reduction (+: sum) var_list is private in construction and reflected after the construction
39
04/10/25Parallel and Distributed Programming39 Example of Data Scoping Attributes omp_set_num_threads(4); #pragma omp parallel for private(i, j) for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } }
40
04/10/25Parallel and Distributed Programming40 Synchronization barrier wait until all threads reach this line #pragma omp barrier critical execute exclusively #pragma omp critical [(name)] { … } atomic update a scalar variable atomically #pragma omp atomic ……
41
04/10/25Parallel and Distributed Programming41 Synchronization (Pthreads/OpenMP) Synchronization in the sample program pthread_mutex_lock(&mutex); n++; while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex); } pthread_cond_broadcast(&cond); pthread_mutex_unlock(&mutex); #pragma omp barrier
42
04/10/25Parallel and Distributed Programming42 Summary of OpenMP Incremental parallelization of sequential programs Portability Easier to implement parallel application than Pthreads and MPI
43
04/10/25Parallel and Distributed Programming43 Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
44
04/10/25Parallel and Distributed Programming44 Message Passing Model / Shared Memory Model Message PassingShared Memory ArchitectureanySMP or DSM Programmingdifficulteasier Performancegoodbetter (SMP) worse (DSM) Costless expensivevery expensive SunFire15K $4,140,830
45
04/10/25Parallel and Distributed Programming45 Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.