Download presentation
Presentation is loading. Please wait.
Published byJordan Mills Modified over 9 years ago
1
Ohio State Univ Effective Automatic Parallelization of Stencil Computations * Sriram Krishnamoorthy 1 Muthu Baskaran 1, Uday Bondhugula 1, Atanas Rountev 1, J. Ramanujam 2, P. Sadayappan 1 1 The Ohio State University 2 Lousiana State University * Work supported by NSF
2
Ohio State Univ Introduction Stencil computations Sweep through large data set Multiple time iterations Simple load balanced schedule Tiling – essential to improve data locality Dependences between tiles Pipelined execution Skewed iteration spaces – load imbalance Solution: Adjust tiling – re-enable concurrent execution
3
Ohio State Univ Motivation FOR t = 0 TO T-1 FOR i = 1 TO N-1 A[t,i]=(A[t,i-1]+A[t,i]+A[t,i+1])/3 t i
4
Ohio State Univ Notation Iteration space B: n-dim polyhedron Dependences D: n-dim vectors Hyperplanes H: n-dim normal vectors Tile bounded by pairs of hyperplanes
5
Ohio State Univ Approach Concurrent start in non-tiled iteration space Identify hyperplanes inhibiting concurrent start in tiled space Replace one face for each inhibiting pair Overlapped Tiling – Replace “back-face” Split Tiling – Replace “front-face”
6
Ohio State Univ Concurrent Start: Before Tiling Condition: A boundary that does not carry any dependence
7
Ohio State Univ Inter-tile Dependences Shift vectors Tile traversal order Normal to all other hyperplanes Hyperplane carries dependence A dependence “pokes” through Inter-tile dependence vector Shift vector Corresponding hyperplane carries dependence
8
Ohio State Univ Concurrent Start Inhibition Concurrent start in original iteration space along a boundary But that boundary carries an inter-tile dependence A boundary has concurrent start S_j is an inter-tile dependence That boundary carries Inter-tile dependence
9
Ohio State Univ Companion Hyperplane Hyperplane that destroys the inter-tile dependence Swivel a hyperplane “backward” Dependences carried by original hyperplane are “neutralized” Incoming dependences become non-incoming Outgoing dependences become non-outgoing
10
Ohio State Univ Overlapped Tiling Replace “back face” with companion hyperplane Additional region is shared with preceding tile Region of preceding tile that caused the dependence Each new tile independent of preceding tile (“do-all” parallelism) Increased computation cost; communication volume
11
Ohio State Univ Split Tiling Replace “front face” with companion hyperplane Tile split into independent and dependent regions Execute independent region followed by dependent region Increased #communications
12
Ohio State Univ Experimental Evaluation Cluster 2.8 GHz dual-processor Opteron 254 1MB L2 cache; 4GB RAM Linux 2.6.9; Intel compiler (icc) –O3 Comparison Two pipelined schedules – along space and time 1000 time steps 1 – 32 processors
13
Ohio State Univ Pipelined Execution: Parameters Space tile size: 1000 Time tile size: 16 64000 elements; 32 processors
14
Ohio State Univ Performance with Problem Size
15
Ohio State Univ Weak Scaling Problem size = #procs * 20000 Horizontal line – Linear Scaling
16
Ohio State Univ Conclusion Time tiling stencils – crucial for data locality Might inhibit concurrent execution Presented: Two approaches to enabling concurrent execution Ongoing work: Modeling relative benefits of the two approaches
17
Ohio State Univ Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.