Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 584 Lecture 20 n Assignment –Glenda program n Project Proposal is coming up! (March 13) »2 pages text + 1 page plan of action »3 references n No class.

Similar presentations

Presentation on theme: "CS 584 Lecture 20 n Assignment –Glenda program n Project Proposal is coming up! (March 13) »2 pages text + 1 page plan of action »3 references n No class."— Presentation transcript:

1 CS 584 Lecture 20 n Assignment –Glenda program n Project Proposal is coming up! (March 13) »2 pages text + 1 page plan of action »3 references n No class March 13 –Put your project proposal in my box. –Paper presentations on March 11 (Tom Abbott)

2 Module Compostion

3 Case Study: Matrix Multiply n Goal: Data-distribution neutral n Three basic ways to distribute –row –column –submatrix n Question? –Does our library need different algorithms?

4 Analytical Model n Compare the two algorithms n Ignore the computation costs n What are the communication costs.

5 One Dimensional Decomposition n Each processor "owns" black portion n To compute the owned portion of the answer, each processor requires all of A. n This affects data-distribution.

6 1-D Decomp.          P N ttPT ws 2 )1(

7 Two Dimensional Decomposition n Requires less data per processor n Algorithm can be performed stepwise.

8 Broadcast an A sub- matrix to the other processors in row. Compute Rotate the B sub- matrix upwards

9 Algorithm Set B' = B local for j = 0 to sqrt(P) -2 in each row I the [(I+j) mod sqrt(P)]th task broadcasts A' = A local to the other tasks in the row accumulate A' * B' send B' to upward neighbor done

10 2-D Decomp.                  P N tt P PT ws 2 1 2 log 1

11 Redistribution n If we only have one algorithm, we need to possibly redistribute the data n How much does this cost?

12 Redistribution          PP N ttPT ws 2 1

13 Analysis n Performance analysis reveals that the 2 dimensional decomposition is always better. n So our matrix multiply only needs one algorithm –Might need redistribution algorithm to be totally data distribution neutral n However, this is not the best algorithm.


15 Systolic Algorithm           P N ttPT ws 2 12

Download ppt "CS 584 Lecture 20 n Assignment –Glenda program n Project Proposal is coming up! (March 13) »2 pages text + 1 page plan of action »3 references n No class."

Similar presentations

Ads by Google