Download presentation
Presentation is loading. Please wait.
Published byAlannah Chapman Modified over 6 years ago
1
Hybrid Parallel Programming with the Paraguin compiler
2
The Paraguin compiler can also create hybrid programs
This is because it uses mpicc, it will pass the OpenMP pragma through to the resulting source
3
Compiling First we need to compile to source code
scc -DPARAGUIN -D__x86_64__ matrixmult.c -.out.c Then we can compile with MPI and openmp mpicc –fopenmp matrixmult.out.c –o matrixmult.out
4
Hybrid Matrix Multiplication using Paraguin
#pragma paraguin begin_parallel #pragma paraguin scatter a #pragma paraguin bcast b #pragma paraguin forall for (i = 0; i < N; i++) { #pragma omp parallel for private(tID, j,k) num_threads(4) for (j = 0; j < N; j++) { c[i][j] = 0.0; for (k = 0; k < N; k++) { c[i][j] = c[i][j] + a[i][k] * b[k][j]; } The i loop will be partitioned among the computers The j loop will be partitioned among the 4 cores within a computer
5
Debug Statements <pid 0, thread 1>: c[0][1] += a[0][0] * b[1][0] <pid 0, thread 1>: c[0][1] += a[0][1] * b[1][1] <pid 0, thread 1>: c[0][1] += a[0][2] * b[1][2] <pid 0, thread 2>: c[0][2] += a[0][0] * b[2][0] <pid 0, thread 2>: c[0][2] += a[0][1] * b[2][1] <pid 0, thread 2>: c[0][2] += a[0][2] * b[2][2] <pid 1, thread 1>: c[1][1] += a[1][0] * b[1][0] <pid 1, thread 1>: c[1][1] += a[1][1] * b[1][1] <pid 1, thread 1>: c[1][1] += a[1][2] * b[1][2] <pid 2, thread 1>: c[2][1] += a[2][0] * b[1][0] <pid 2, thread 1>: c[2][1] += a[2][1] * b[1][1] <pid 2, thread 1>: c[2][1] += a[2][2] * b[1][2] <pid 0, thread 0>: c[0][0] += a[0][0] * b[0][0]
6
Debug Statements <pid 0, thread 0>: c[0][0] += a[0][1] * b[0][1] <pid 0, thread 0>: c[0][0] += a[0][2] * b[0][2] <pid 2, thread 0>: c[2][0] += a[2][0] * b[0][0] <pid 2, thread 0>: c[2][0] += a[2][1] * b[0][1] <pid 2, thread 0>: c[2][0] += a[2][2] * b[0][2] <pid 1, thread 0>: c[1][0] += a[1][0] * b[0][0] <pid 1, thread 0>: c[1][0] += a[1][1] * b[0][1] <pid 1, thread 0>: c[1][0] += a[1][2] * b[0][2] <pid 1, thread 2>: c[1][2] += a[1][0] * b[2][0] <pid 1, thread 2>: c[1][2] += a[1][1] * b[2][1] <pid 1, thread 2>: c[1][2] += a[1][2] * b[2][2] <pid 2, thread 2>: c[2][2] += a[2][0] * b[2][0] <pid 2, thread 2>: c[2][2] += a[2][1] * b[2][1] <pid 2, thread 2>: c[2][2] += a[2][2] * b[2][2]
7
Sobel Edge Detection Given an image, the problem is to detect where the “edges” are in the picture
8
Sobel Edge Detection
9
Sobel Edge Detection Algorithm
/* 3x3 Sobel masks. */ GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1; GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1; for(x=0; x < N; ++x){ for(y=0; y < N; ++y){ sumx = 0; sumy = 0; // handle image boundaries if(x==0 || x==(h-1) || y==0 || y==(w-1)) sum = 0; else{
10
Sobel Edge Detection Algorithm
//x gradient approx for(i=-1; i<=1; i++) for(j=-1; j<=1; j++) sumx += (grayImage[x+i][y+j] * GX[i+1][j+1]); //y gradient approx sumy += (grayImage[x+i][y+j] * GY[i+1][j+1]); //gradient magnitude approx sum = (abs(sumx) + abs(sumy)); } edgeImage[x][y] = clamp(sum); There are no loop-carried dependencies. Therefore, this is a Scatter/Gather pattern.
11
Sobel Edge Detection Algorithm
Inputs (that need to be broadcast or scattered): GX and GY arrays grayImage array w and h (width and height) There are 4 nested loops (x, y, i, and j) The final answer is the array edgeImage
12
Hybrid Sobel Edge Detection using Paraguin
#pragma paraguin begin_parallel /* 3x3 Sobel masks. */ GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1; GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1; #pragma paraguin bcast grayImage w h
13
Hybrid Sobel Edge Detection using Paraguin
#pragma paraguin forall #pragma omp parallel for private(x,y,i,j,sumx,sumy,sum) shared(w,h) num_threads(4) for(x=0; x < N; ++x){ for(y=0; y < N; ++y){ sumx = 0; sumy = 0; ... edgeImage[x][y] = clamp(sum); } ; #pragma paraguin gather edgeImage #pragma paraguin end_parallel The x loop is partitioned 1st amount the computers, then 2nd again among the cores
14
What does not work with Paraguin
Syntax: #pragma omp parallel structured_block Example: #pragma omp parallel private(tID) num_threads(4) { tID = omp_get_thread_num(); printf("<pid %d>: tid = %d\n", __guin_rank, tID); } Very Important Opening brace must be on a new line
15
What does not work with Paraguin
The SUIF compiler removes the braces because they are not associated with a control structure A #pragma is not a control structure, but rather a pre-processor directive. After compiling with scc: #pragma omp parallel private(tID) num_threads(4) tID = omp_get_thread_num(); printf("<pid %d>: tid = %d\n", __guin_rank, tID); Braces are removed
16
The Fix The trick is to put in a control structure that basically does nothing: dummy = 0; #pragma omp parallel private(tID) num_threads(4) if (dummy == 0) { tID = omp_get_thread_num(); printf ("<pid %d>: tid = %d\n", __guin_rank, tID); } “if (1)” does not work If statement will always be true. This code is basically left intact.
17
Result <pid 1>: tid = 1 <pid 1>: tid = 2 <pid 1>: tid = 3 <pid 2>: tid = 2 <pid 0>: tid = 3 <pid 1>: tid = 0 <pid 2>: tid = 3 <pid 3>: tid = 2 <pid 3>: tid = 1 <pid 0>: tid = 2 <pid 2>: tid = 0 <pid 0>: tid = 1 <pid 3>: tid = 0 <pid 3>: tid = 3 <pid 0>: tid = 0
18
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.