Presentation is loading. Please wait.

Presentation is loading. Please wait.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.

Similar presentations


Presentation on theme: "© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming."— Presentation transcript:

1 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming Lecture 24: Application Case Study – Electrostatic Potential Calculation Part 2

2 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign Objective To learn how to apply parallel programming techniques to an application –Thread coarsening for more work efficiency –Data structure padding for reduced divergence –Memory access locality and pre-computation techniques

3 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign Outline of A Fast Sequential Code for all z { for all atoms {pre-compute dz 2 } for all y { for all atoms {pre-compute dy 2 (+ dz 2 ) } for all x { for all atoms { compute contribution to current x,y,z point using pre-computed dy 2 + dz 2 } 3

4 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign More Thoughts on Fast Sequential Code Need temporary arrays for pre-calculated dz 2 and dy 2 + dz 2 values So, why does this code has better cache behaior on CPUs? 4

5 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign Reuse Distance Calculation for More Computation Efficiency

6 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign Thread Coarsening

7 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign A Compute Efficient Gather Kernel

8 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign Thread Coarsening for More Computation Efficiency

9 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign Performance Comparison

10 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign More Work is Needed to Feed a GPU

11 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ANY QUESTIONS?


Download ppt "© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming."

Similar presentations


Ads by Google