Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joe Hummel, PhD U. Of Illinois, Chicago Loyola University Chicago

Similar presentations


Presentation on theme: "Joe Hummel, PhD U. Of Illinois, Chicago Loyola University Chicago"— Presentation transcript:

1 Joe Hummel, PhD U. Of Illinois, Chicago Loyola University Chicago joe@joehummel.net

2  Class:“ Introduction to CS for Engineers ”  Lang:C/C++  Focus:programming basics, vectors, matrices  Timing:present this after introducing 2D arrays…

3  Yes, it’s boring, but… ◦ everyone understands the problem ◦ good example of triply-nested loops ◦ non-trivial computation for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) for (int k = 0; k < N; k++) C[i][j] += (A[i][k] * B[k][j]); for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) for (int k = 0; k < N; k++) C[i][j] += (A[i][k] * B[k][j]); 1500x1500 matrix: 2.25M elements  32 seconds… 1500x1500 matrix: 2.25M elements  32 seconds…

4  Matrix multiply is great candidate for multicore ◦ embarrassingly-parallel ◦ easy to parallelize via outermost loop #pragma omp parallel for for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) for (int k = 0; k < N; k++) C[i][j] += (A[i][k] * B[k][j]); #pragma omp parallel for for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) for (int k = 0; k < N; k++) C[i][j] += (A[i][k] * B[k][j]); Cores 1500x1500 matrix: Quad-core CPU  8 seconds… 1500x1500 matrix: Quad-core CPU  8 seconds…

5  Parallelism alone is not enough… HPC == Parallelism + Memory Hierarchy ─ Contention Expose parallelism Maximize data locality: network disk RAM cache core Minimize interaction: false sharing locking synchronization

6  What’s the other half of the chip?  Implications? ◦ No one implements MM this way ◦ Rewrite to use loop interchange, and access B row-wise… Cache! X #pragma omp parallel for for (int i = 0; i < N; i++) for (int k = 0; k < N; k++) for (int j = 0; j < N; j++) C[i][j] += (A[i][k] * B[k][j]); #pragma omp parallel for for (int i = 0; i < N; i++) for (int k = 0; k < N; k++) for (int j = 0; j < N; j++) C[i][j] += (A[i][k] * B[k][j]); 1500x1500 matrix: Quad-core + cache  2 seconds… 1500x1500 matrix: Quad-core + cache  2 seconds…


Download ppt "Joe Hummel, PhD U. Of Illinois, Chicago Loyola University Chicago"

Similar presentations


Ads by Google