Presentation is loading. Please wait.

Presentation is loading. Please wait.

September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture.

Similar presentations


Presentation on theme: "September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture."— Presentation transcript:

1 September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture was derived from chapters 1-2 in Chandra et al. Oct. 23, 2002 Parallel Processing

2 September 4, 1997 Introduction Objective: To further study the shared memory model of parallel programming. Introduction to the OpenMP standard for shared memory parallel programming Topics Introduction to OpenMP hello.c hello.f Loop level parallelism Shared vs. private variables Synchronization (implicit and explicit) Parallel regions Oct. 23, 2002 Parallel Processing

3 OpenMP Extension to FORTRAN, C/C++ Shared memory model
Uses directives (comments in FORTRAN, pragma in C/C++) ignored without compiler support Some library support required Shared memory model parallel regions loop level parallelism implicit thread model communication via shared address space private vs. shared variables (declaration) explicit synchronization via directives (e.g. critical) library routines for returning thread information (e.g. omp_get_num_threads(), omp_get_thread_num() ) Environment variables used to provide system info (e.g. OMP_NUM_THREADS) Oct. 23, 2002 Parallel Processing

4 Benefits Provides incremental parallelism Small increase in code size
Simpler model than message passing Easier to use than thread library With hardware and compiler support smaller granularity than message passing. Oct. 23, 2002 Parallel Processing

5 Further Information Adopted as a standard in 1997 www.openmp.org
Initiated by SGI Chandra, Dagum, Kohr, Maydan, McDonald, Menon, “Parallel Programming in OpenMP”, Morgan Kaufman Publishers, 2001. Oct. 23, 2002 Parallel Processing

6 Shared vs. Distributed Memory
P0 P1 Pn Memory P0 P1 Pn ... ... M0 M1 Mn Interconnection Network Shared memory Distributed memory Oct. 23, 2002 Parallel Processing

7 Shared Memory Programming Model
Shared memory programming does not require physically shared memory so long as there is support for logically shared memory (in either hardware or software) If logical shared memory then there may be different costs for accessing memory depending on the physical location. UMA - uniform memory access SMP - symmetric multi-processor typically memory connected to processors via a bus NUMA - non-uniform memory access typically physically distributed memory connected via an interconnection network Oct. 23, 2002 Parallel Processing

8 IBM S80 An SMP with upto 24 processors (RS64 III processors)
Name: Goopi.coe.drexel.edu Machine type: S80 12-Way with 8Gb RAM Specifications: 2 x 6 way 450 MHz RS64 III Processor Card, 8Mb L2 Cache 2 x 4096 Mb Memory 9 x 18.2 Gb Ultra SCSI Hot Swappable Hard Disk Drives. Name: bagha.coe.drexel.edu Machine Type: 44P Model way with 2 Gb RAM 2 x 2 way 375 MHz POWER3-II Processor, 4 Mb L2 Cache 4 x 512 Mb SDRAM DIMMs 2 x 9.1 Gb Ultra SCSI HDD Oct. 23, 2002 Parallel Processing

9 hello.c #include <stdio.h> int main(int argc, char **argv) {
int n; n = atoi(argv[1]); omp_set_num_threads(n); printf("Number of threads = %d\n",omp_get_num_threads()); #pragma omp parallel int id = omp_get_thread_num(); if (id == 0) printf("Hello World from %d\n",id); } exit(0); Oct. 23, 2002 Parallel Processing

10 hello.f program hello integer omp_get_thread_num, omp_get_num_threads
print *, "Hello parallel world from threads" print *, "Num threads = ", omp_get_num_threads() !$omp parallel print *, omp_get_thread_num() !$omp end parallel print *, "Back to the sequential world" end Oct. 23, 2002 Parallel Processing

11 Compiling and Executing OpenMP Programs on the IBM S80
To compile a C program with OpenMP directives cc_r -qsmp=omp hello.c -o hello To compile a Fortran program with OpenMP directives xlf_r -qsmp=omp hello.f -o hello The environment variable OMP_NUM_THREADS controls the number of threads used in OpenMP parallel regions. It can be set from the C shell setenv OMP_NUM_THREAD <count> where <count> is a positive integer On Sun Machines Cc –xopenmp hello.c –o hello Oct. 23, 2002 Parallel Processing

12 Parallel Loop subroutine saxpy(z, a, x, y, n) integer I, n
read z(n), a, x(n), y do i = 1, n z(i) = a * x(i) + y end do return end subroutine saxpy(z, a, x, y, n) integer I, n read z(n), a, x(n), y !$omp parallel do do i = 1, n z(i) = a * x(i) + y end do return end Oct. 23, 2002 Parallel Processing

13 Execution Model Master thread Parallel Region Master and slave threads
Implicit thread creation Parallel Region Master and slave threads Implicit barrier synchronization Master thread Oct. 23, 2002 Parallel Processing

14 More Complicated Example
Real*8 x, y integer i, j, m, n, maxiter integer depth(*,*) integer mandel_val maxiter = 200 do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) end do Oct. 23, 2002 Parallel Processing

15 Parallel Loop !$omp parallel do private(j,x, y) maxiter = 200
do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) end do !$omp end parallel do Oct. 23, 2002 Parallel Processing

16 Parallel Loop maxiter = 200 !$omp parallel do private(j,x, y)
do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) end do !$omp end parallel do Oct. 23, 2002 Parallel Processing

17 Explicit Synchronization
maxiter = 200 total_iters = 0 !$omp parallel do private(j,x, y) do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) !$omp critical total_iters = total_iters + depth(j,i) !$omp end critical end do !$omp end parallel do Oct. 23, 2002 Parallel Processing

18 Reduction Variables maxiter = 200 total_iters = 0
!$omp parallel do private(j,x, y) !$omp+ reduction(+:total_iters) do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) total_iters = total_iters + depth(j,I) end do !$omp end parallel do Oct. 23, 2002 Parallel Processing


Download ppt "September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture."

Similar presentations


Ads by Google