Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hybrid Parallel Programming with the Paraguin compiler

Similar presentations


Presentation on theme: "Hybrid Parallel Programming with the Paraguin compiler"— Presentation transcript:

1 Hybrid Parallel Programming with the Paraguin compiler

2 The Paraguin compiler can also create hybrid programs
This is because it uses mpicc, it will pass the OpenMP pragma through to the resulting source

3 Compiling First we need to compile to source code
scc -DPARAGUIN -D__x86_64__ matrixmult.c -.out.c Then we can compile with MPI and openmp mpicc –fopenmp matrixmult.out.c –o matrixmult.out

4 Hybrid Matrix Multiplication using Paraguin
#pragma paraguin begin_parallel #pragma paraguin scatter a #pragma paraguin bcast b #pragma paraguin forall for (i = 0; i < N; i++) { #pragma omp parallel for private(tID, j,k) num_threads(4) for (j = 0; j < N; j++) { c[i][j] = 0.0; for (k = 0; k < N; k++) { c[i][j] = c[i][j] + a[i][k] * b[k][j]; } The i loop will be partitioned among the computers The j loop will be partitioned among the 4 cores within a computer

5 Debug Statements <pid 0, thread 1>: c[0][1] += a[0][0] * b[1][0] <pid 0, thread 1>: c[0][1] += a[0][1] * b[1][1] <pid 0, thread 1>: c[0][1] += a[0][2] * b[1][2] <pid 0, thread 2>: c[0][2] += a[0][0] * b[2][0] <pid 0, thread 2>: c[0][2] += a[0][1] * b[2][1] <pid 0, thread 2>: c[0][2] += a[0][2] * b[2][2] <pid 1, thread 1>: c[1][1] += a[1][0] * b[1][0] <pid 1, thread 1>: c[1][1] += a[1][1] * b[1][1] <pid 1, thread 1>: c[1][1] += a[1][2] * b[1][2] <pid 2, thread 1>: c[2][1] += a[2][0] * b[1][0] <pid 2, thread 1>: c[2][1] += a[2][1] * b[1][1] <pid 2, thread 1>: c[2][1] += a[2][2] * b[1][2] <pid 0, thread 0>: c[0][0] += a[0][0] * b[0][0]

6 Debug Statements <pid 0, thread 0>: c[0][0] += a[0][1] * b[0][1] <pid 0, thread 0>: c[0][0] += a[0][2] * b[0][2] <pid 2, thread 0>: c[2][0] += a[2][0] * b[0][0] <pid 2, thread 0>: c[2][0] += a[2][1] * b[0][1] <pid 2, thread 0>: c[2][0] += a[2][2] * b[0][2] <pid 1, thread 0>: c[1][0] += a[1][0] * b[0][0] <pid 1, thread 0>: c[1][0] += a[1][1] * b[0][1] <pid 1, thread 0>: c[1][0] += a[1][2] * b[0][2] <pid 1, thread 2>: c[1][2] += a[1][0] * b[2][0] <pid 1, thread 2>: c[1][2] += a[1][1] * b[2][1] <pid 1, thread 2>: c[1][2] += a[1][2] * b[2][2] <pid 2, thread 2>: c[2][2] += a[2][0] * b[2][0] <pid 2, thread 2>: c[2][2] += a[2][1] * b[2][1] <pid 2, thread 2>: c[2][2] += a[2][2] * b[2][2]

7 Sobel Edge Detection Given an image, the problem is to detect where the “edges” are in the picture

8 Sobel Edge Detection

9 Sobel Edge Detection Algorithm
/* 3x3 Sobel masks. */ GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1; GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1; for(x=0; x < N; ++x){ for(y=0; y < N; ++y){ sumx = 0; sumy = 0; // handle image boundaries if(x==0 || x==(h-1) || y==0 || y==(w-1)) sum = 0; else{

10 Sobel Edge Detection Algorithm
//x gradient approx for(i=-1; i<=1; i++) for(j=-1; j<=1; j++) sumx += (grayImage[x+i][y+j] * GX[i+1][j+1]); //y gradient approx sumy += (grayImage[x+i][y+j] * GY[i+1][j+1]); //gradient magnitude approx sum = (abs(sumx) + abs(sumy)); } edgeImage[x][y] = clamp(sum); There are no loop-carried dependencies. Therefore, this is a Scatter/Gather pattern.

11 Sobel Edge Detection Algorithm
Inputs (that need to be broadcast or scattered): GX and GY arrays grayImage array w and h (width and height) There are 4 nested loops (x, y, i, and j) The final answer is the array edgeImage

12 Hybrid Sobel Edge Detection using Paraguin
#pragma paraguin begin_parallel /* 3x3 Sobel masks. */ GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1; GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1; #pragma paraguin bcast grayImage w h

13 Hybrid Sobel Edge Detection using Paraguin
#pragma paraguin forall #pragma omp parallel for private(x,y,i,j,sumx,sumy,sum) shared(w,h) num_threads(4) for(x=0; x < N; ++x){ for(y=0; y < N; ++y){ sumx = 0; sumy = 0; ... edgeImage[x][y] = clamp(sum); } ; #pragma paraguin gather edgeImage #pragma paraguin end_parallel The x loop is partitioned 1st amount the computers, then 2nd again among the cores

14 What does not work with Paraguin
Syntax: #pragma omp parallel structured_block Example: #pragma omp parallel private(tID) num_threads(4) { tID = omp_get_thread_num(); printf("<pid %d>: tid = %d\n", __guin_rank, tID); } Very Important Opening brace must be on a new line

15 What does not work with Paraguin
The SUIF compiler removes the braces because they are not associated with a control structure A #pragma is not a control structure, but rather a pre-processor directive. After compiling with scc: #pragma omp parallel private(tID) num_threads(4) tID = omp_get_thread_num(); printf("<pid %d>: tid = %d\n", __guin_rank, tID); Braces are removed

16 The Fix The trick is to put in a control structure that basically does nothing: dummy = 0; #pragma omp parallel private(tID) num_threads(4) if (dummy == 0) { tID = omp_get_thread_num(); printf ("<pid %d>: tid = %d\n", __guin_rank, tID); } “if (1)” does not work If statement will always be true. This code is basically left intact.

17 Result <pid 1>: tid = 1 <pid 1>: tid = 2 <pid 1>: tid = 3 <pid 2>: tid = 2 <pid 0>: tid = 3 <pid 1>: tid = 0 <pid 2>: tid = 3 <pid 3>: tid = 2 <pid 3>: tid = 1 <pid 0>: tid = 2 <pid 2>: tid = 0 <pid 0>: tid = 1 <pid 3>: tid = 0 <pid 3>: tid = 3 <pid 0>: tid = 0

18 Questions


Download ppt "Hybrid Parallel Programming with the Paraguin compiler"

Similar presentations


Ads by Google