Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paraguin Compiler Examples.

Similar presentations


Presentation on theme: "Paraguin Compiler Examples."— Presentation transcript:

1 Paraguin Compiler Examples

2 Examples Matrix Addition (the complete program)
Traveling Salesman Problem (TSP) Sobel Edge Detection

3 Matrix Addition The complete program

4 Matrix Addition (complete)
#define N 512 #ifdef PARAGUIN typedef void* __builtin_va_list; #endif #include <stdio.h> #include <math.h> #include <sys/time.h> print_results(char *prompt, float a[N][N]); int main(int argc, char *argv[]) { int i, j, error = 0; float a[N][N], b[N][N], c[N][N]; char *usage = "Usage: %s file\n"; FILE *fd; double elapsed_time; struct timeval tv1, tv2;

5 Matrix Addition (complete)
if (argc < 2) { fprintf (stderr, usage, argv[0]); error = -1; } if ((fd = fopen (argv[1], "r")) == NULL) { fprintf (stderr, "%s: Cannot open file %s for reading.\n", argv[0], argv[1]); #pragma paraguin begin_parallel #pragma paraguin bcast error if (error) return -1; #pragma paraguin end_parallel

6 Matrix Addition (complete)
// Read input from file for matrices a and b. // The I/O is not timed because this I/O needs // to be done regardless of whether this program // is run sequentially on one processor or in // parallel on many processors. Therefore, it is // irrelevant when considering speedup. for (i = 0; i < N; i++) for (j = 0; j < N; j++) fscanf (fd, "%f", &a[i][j]); fscanf (fd, "%f", &b[i][j]); fclose (fd);

7 Matrix Addition (complete)
; #pragma paraguin begin_parallel // This barrier is here so that we can take a time stamp // Once we know all processes are ready to go. #pragma paraguin barrier #pragma paraguin end_parallel // Take a time stamp gettimeofday(&tv1, NULL); #pragma paraguin scatter a b // Parallelize the following loop nest assigning iterations // of the outermost loop (i) to different partitions. #pragma paraguin forall for (i = 0; i < N; i++) for (j = 0; j < N; j++) c[i][j] = a[i][j] + b[i][j];

8 Matrix Addition (complete)
; #pragma paraguin gather c #pragma paraguin end_parallel // Take a time stamp. This won't happen until after the master // process has gathered all the input from the other processes. gettimeofday(&tv2, NULL); elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) / ); printf ("elapsed_time=\t%lf (seconds)\n", elapsed_time); // print result print_results("C = ", c); }

9 Matrix Addition (complete)
print_results(char *prompt, float a[N][N]) { int i, j; printf ("\n\n%s\n", prompt); for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { printf(" %.2f", a[i][j]); } printf ("\n"); printf ("\n\n");

10 Matrix Addition After compiling with the command: This produces:
scc –DPARAGUIN –D__x86_64__ matrixadd.c –cc mpicc \ –o matrixadd.out This produces: matrixadd.out scc –DPARAGUIN –D__x86_64__ matrixadd.c -.out.c matrixadd.out .c (MPI source code) All on one line

11 Traveling Salesman Problem (TSP)

12 The Traveling Salesman Problem is simply to find the shortest circuit (Hamiltonian circuit) that visits every city in a set of cities at most once

13 This problem falls into the class of “NP-hard” problems
What that means is that there is no known “polynomial” time (“big-oh” of a polynomial) algorithm that can solve it The only know algorithm to solve it is to compare the distances of all possible Hamiltonian circuits. But there are N! possible circuits of N cities.

14 Yes, heuristics can be applied to find a “good” solution fast, but there’s no guarantee it is the best The “brute force” algorithm is to consider all possible permutations of the N cities First we’ll fix the first city since there are N equivalent circuits where we rotate the cities We will consider the reverse directions to be different circuits but that’s hard to account for

15 If we number the cities from 0 to N-1, and 0 is the origination city, then the possible permutations of 4 cities are: 0->1->2->3->0 0->1->3->2->0 0->2->3->1->0 0->2->1->3->0 0->3->1->2->0 0->3->2->1->0 Notice that there are some permutations that are the reverse of other. These are equivalent permutations. Since we are fixing origination city, there are (N-1)! permutations instead of N!.

16 We can compute the distances between all pairs of locations (O(N2))
This is the input City 0 City 1 City 2 City 3

17 Solution: Use a for loop to assign the first two cities
Problem: Iterating through the possible permutations is recursive, but we need a straight forward for loop to parallelize Solution: Use a for loop to assign the first two cities Since city 0 is fixed, there are n-1 choices for city 1 and n-2 choices for city 2 That means there are (n-1)(n-2) = n2 – 3n + 2 combinations of the first two cities

18 Assignment of cities 0-2 N = n*n - 3*n + 2; // (n-1)(n-2) perm[0] = 0; for (i = 0; i < N; i++) { perm[1] = i / (n-2) + 1; perm[2] = i % (n-2) + 1; ...

19 // This structure is used for the "minloc" reduction
// This structure is used for the "minloc" reduction. We want to find the // minimum distance as well as which processor found it so that we can // get the final minimum circuit. struct { float minDist; int rank; } myAnswer, resultAnswer; int main(int argc, char *argv[]) { int i, j, k, N, p; int perm[MAX_NUM_CITIES], minPerm[MAX_NUM_CITIES+1]; float D[MAX_NUM_CITIES][MAX_NUM_CITIES]; float dist, minDist, finalMinDist; int abort; To use minloc, we need a value to minimize and the location (rank)

20 abort = processArgs(argc, argv); if (
abort = processArgs(argc, argv); if (!abort) { for (i = 0; i < n; i++) { D[i][i] = 0.0f; for (j = 0; j < i; j++) { fscanf (fd, "%f", &D[i][j]); D[j][i] = D[i][j]; } } else { if (n <= 1) printf ("0 0 0\n"); else n = 0; #pragma paraguin begin_parallel #pragma paraguin bcast abort if (abort) return -1; Read in the lower triangular matrix of relative distances between cities. The values are mirrored across the diagonal.

21 #pragma paraguin bcast n D perm[0] = 0; minDist = 9
#pragma paraguin bcast n D perm[0] = 0; minDist = 9.0e10; // Near the largest value we can represent with a float if (n == 2) { perm[1] = 1; // If n = 2, the N = 0, and we are done. minPerm[0] = perm[0]; minPerm[1] = perm[1]; minDist = computeDist(D, n, perm); } N = n*n - 3*n + 2; // N = (n-1)(n-2)

22 #pragma paraguin forall for (p = 0; p < N; p++) { perm[1] = p / (n-2) + 1; perm[2] = p % (n-2) + 1; if (perm[2] >= perm[1]) perm[2]++; initialize(perm, n, 3); do { dist = computeDist(D, n, perm); if (minDist > dist) { minDist = dist; for (i = 0; i < n; i++) minPerm[i] = perm[i]; } } while (increment(perm,n)); After cities 0, 1, and 2 have been determined, initialize the rest of the cities for the first permutation. Keep the shortest circuit seen thus far. Move to the next permutation

23 myAnswer. minDist = minDist; myAnswer
myAnswer.minDist = minDist; myAnswer.rank = __guin_rank; #pragma paraguin reduce minloc myAnswer resultAnswer #pragma paraguin bcast resultAnswer if (__guin_rank == resultAnswer.rank) { printf (" %f ", minDist); for (i = 0; i < n; i++) printf ("%d ", minPerm[i]); printf ("%d\n", minPerm[0]); } #pragma paraguin end_parallel struct { float minDist; int rank; } myAnswer, resultAnswer; If the current processors is the one who found the solution, then report the solution.

24 Demonstration

25 Sobel Edge Detection

26 Sobel Edge Detection Given an image, the problem is to detect where the “edges” are in the picture

27 Sobel Edge Detection

28 Sobel Edge Detection Algorithm
/* 3x3 Sobel masks. */ GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1; GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1; for(x=0; x < N; ++x){ for(y=0; y < N; ++y){ sumx = 0; sumy = 0; // handle image boundaries if(x==0 || x==(h-1) || y==0 || y==(w-1)) sum = 0; else{

29 Sobel Edge Detection Algorithm
//x gradient approx for(i=-1; i<=1; i++) for(j=-1; j<=1; j++) sumx += (grayImage[x+i][y+j] * GX[i+1][j+1]); //y gradient approx sumy += (grayImage[x+i][y+j] * GY[i+1][j+1]); //gradient magnitude approx sum = (abs(sumx) + abs(sumy)); } edgeImage[x][y] = clamp(sum); There are no loop-carried dependencies. Therefore, this is a Scatter/Gather pattern.

30 Loop Carried Dependency
A Loop-Carried Dependency is when a value is computed in one iteration and used in another for (i = 1; i < n; i++) { A[i] = f(A[i-1]); <1>: a[1] = f(a[0]); <2>: a[2] = f(a[1]); <3>: a[3] = f(a[2]); <4>: a[4] = f(a[3]); ... If we run this loop as a forall, then inter- processors communication is needed

31 Sobel Edge Detection Algorithm
Inputs (that need to be broadcast or scattered): GX and GY arrays grayImage array w and h (width and height) There are 4 nested loops (x, y, i, and j) The final answer is the array edgeImage

32 Sobel Edge Detection Algorithm
#pragma paraguin begin_parallel /* 3x3 Sobel masks. */ GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1; GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1; #pragma paraguin bcast grayImage w h #pragma paraguin forall 1 for(x=0; x < N; ++x){ for(y=0; y < N; ++y){ sumx = 0; sumy = 0; ... These are the inputs Partition the x loop (outermost loop) Using cyclic scheduling

33 Sobel Edge Detection Algorithm
... edgeImage[x][y] = clamp(sum); } ; #pragma paraguin gather edgeImage Gather all elements of the edgeImage array

34 Questions


Download ppt "Paraguin Compiler Examples."

Similar presentations


Ads by Google