Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello.

Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

2 What is the problem? Create programs for multi-processor unit (MPU) –Multicore processors –Graphics processing units (GPU)

3 For whom is it a problem? Compiler designer Application Program Compiler Executable CPU EASY

4 For whom is it a problem? Compiler designer Application Program Compiler Executable MPU HARD

5 For whom is it a problem? Application programmer Application Program Compiler Executable MPU

6 Complex machine consequences Programmer needs to be highly skilled Programming is error-prone These consequences imply... Increased parallelism  increased development cost!

7 Amdahl’s Law The speedup of a program is bounded by its inherently sequential part. (http://en.wikipedia.org/wiki/Amdahl's_law) If –A program needs 20 hours using a CPU –1 hour cannot be parallelized Then –Minimum execution time ≥ 1 hour. –Maximum speed up ≤ 20.

8 (http://en.wikipedia.org/wiki/Amdahl's_law)

9 Parallelization opportunities Scalable parallelism resides in 2 sequential program constructs: Divide-and-conquer recursion Iterative statements (for)

10 2 schools of thought Create a general solution (Address everything somewhat well) Create a specific solution (Address one thing very well)

11 Focus on iterative statements (for) float[] x = new float[n]; float[] b = new float[n]; float[][] a = new float[n][n];... for ( int i = 0; i < n; i++ ) { b[i] = 0; for ( int j = 0; j < n; j++ ) b[i] += a[i][j]*x[j]; }

12 Matrix-Vector Product b = Ax, illustrated with a 3X3 matrix, A. _______________________________ b1 = a11*x1 + a12*x2 + a13*x3 b2 = a21*x1 + a22*x2 + a23*x3 b3 = a31*x1 + a32*x2 + a33*x3

13 a31a32a33 a21a22a23 a11a12a13 x1x2x3 x1 x2 x3 b1 b2 b3 x1x2x3

14 a31a32a33 a21a22a23 a11a12a13 x1x2x3 x1 x2 x3 TIME SPACESPACE

15 a31a32a33 a21a22a23 a11a12a13 x1x2x3 x1 x2 x3 SPACESPACE TIME

16 a31 a32 a33 a21 a22 a23 a11 a12 a13 x1x1 x2x2 x3x3 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3 SPACESPACE TIME

17 Matrix Product C = AB, illustrated with a 2X2 matrices. c11 = a11*b11 + a12*b21 c12 = a11*b12 + a12*b22 c21 = a21*b11 + a22*b21 c12 = a21*b12 + a22*b22

18 a21a22 a11a12 b11 b21 k row a21a22 a11a12b12 b21 b12 b22 col

19 a11 a21 a22 a12 b11 b21 T S a21a22 a11a12b12 b21 b12 b22 S

20 a21a22 a11a12 b11 b21 T S a21a22 a11a12b12 b21 b12 b22 S

21 Declaring an iterative computation Index set Data network Functions Space-time embedding

22 Declaring an Index set I1:I1: I2:I2: 1 ≤ i ≤ j ≤ n 1 ≤ i ≤ n1 ≤ j ≤ n i j i j

23 Declaring a Data network D 1 : x: [ -1, 0]; b: [ 0, -1]; a: [ 0, 0]; D 2 : x: [ -1, 0]; b: [ -1, -1]; a: [ 0, -1]; x b a x a b

24 I 1 : D 1 : x: [ -1, 0]; b: [ 0, -1]; a: [ 0, 0]; Declaring an Index set + Data network i j x b a 1 ≤ i ≤ j ≤ n

25 Declaring the Functions R 1 : float x’ (float x) { return x; } float b’ (float b, float x, float a) { return b + a*x; } R 2 : char x’ (char x) { return x; } boolean b’ (boolean b, char x, char a) { return b && a == x; } i j

26 Declaring a Spacetime embedding E 1 : –space = -i + j –time = i + j. E 2 : –space 1 = i –space 2 = j –time = i + j. time space time space 2 space 1

27 Declaring an iterative computation Upper triangular matrix-vector product UTMVP = (I 1,D 1,F 1,E 1 ) time space

28 Declaring an iterative computation Full matrix-vector product UTMVP = (I 2,D 1,F 1,E 1 ) time space

29 Declaring an iterative computation Convolution (polynomial product) UTMVP = (I 2,D 2,F 1,E 1 ) time space

30 Declaring an iterative computation String pattern matching UTMVP = (I 2,D 2,F 2,E 1 ) time space

31 Declaring an iterative computation Pipelined String pattern matching UTMVP = (I 2,D 2,F 2,E 2 ) time space 2 space 1

32 Iterative computation specification Declarative specification Is a 4-dimensional design space (actually 5 dimensional: space embedding is independent of time embeding) Facilitates reuse of design components.

33 Starting with an existing language … Can infer –Index set –Data network –Functions Cannot infer –Space embedding –Time embedding

34 Spacetime embedding Start with it as a program annotation More advanced: compiler optimized based on program annotated figure of merit.

35 Work Work out details of notation Implement in Java, C, Matlab, HDL, … Map virtual processor network to actual processor network Map –Java: map processors to Threads, [links to Channels] –GPU: map processors to GPU processing elements (Challenge: spacetime embedding depends on underlying architecture)

36 Work … The output of 1 iterative computation is the input to another. Develop a notation for specifying composite iterative computation?

37 Thanks for listening! Questions?

Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello.

Similar presentations

Presentation on theme: "Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello.

Similar presentations

Presentation on theme: "Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello."— Presentation transcript:

Similar presentations

About project

Feedback