Shared-Memory Paradigm & OpenMP FDI 2007 Track Q Day 3 – Morning Session
Characteristics of Shared-Memory Machines Modest number of processors, e.g., 8, 16, 32 One bus, one memory, one address space. Major issues: bus contention, cache coherency, synchronization. Incremental parallelization is easy. NUMA vs. UMA: the former scales better, but with increased latency for some data. Important trends in computer architecture imply that shared-memory parallelism will be increasingly important, i.e., multicore. Major issues: with openmp, you don’t have to worry about the 1st two problems, and even synchronization isn’t usually a problem … at least as far as correctness goes. But performance? Well … that can definitely be important. Example of bus contention and cache coherency overhead. For I = 1, n do in parallel a[I] = b[I] + c[I] /// note benefit of chunking NUMA not that common right now. Dual core. Hybrid codes? Not 1st priority. But may become increasingly important.
OpenMP Compiler directives, library routines, and environment variables for specifying shared-memory thread-based parallelism. Fortran and C/C++ specified. Supported by many compilers. Directives allow work sharing, synchronization, and sharing and privatizing of data. Directives are ignored by compiler unless command-line option is specified. What’s a thread?
OpenMP Other aspects of the OpenMP model: For further information: Explicit, user-defined parallelism SPMD Fork/join model: only a master thread is executing when outside a parallel region. The parallel directive causes multiple threads to be started (or continued), each executing all or a part of the specified block. For further information: www.openmp.org OpenMP Quick Guide (pdf)
Loop-based Parallelism in OpenMP #pragma omp parallel for [clause [ clause ...]] for ( … ) { } where clause is one of the following: private (list) shared (list) copyin (list) firstprivate (list) lastprivate (list) reduction (operator: list) ordered schedule(kind [, chunk_size]) nowait
Loop-based Parallelism in OpenMP: Example sum = 0 #pragma omp parallel for \ private( ... ) \ shared ( ... ) \ reduction ( ... ) for (i=0; i<n; i++) { k = 2*i-1; j = k + 1; x = a[k] * sin(pi*b[k]); c[i] = func(a[j], b[j], x) * beta; sum = sum + c[i]*c[i]; }
Loop-based Parallelism in OpenMP: A 2nd Example Monte Carlo estimate of pi: generate random coordinate (x,y) in unit square and count how many land inside unit circle.