Presentation is loading. Please wait.

Presentation is loading. Please wait.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering,

Similar presentations


Presentation on theme: "Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering,"— Presentation transcript:

1 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 1 2011 18th International Conference on High Performance Computing (HiPC) Presented by Po-Ting Liu 2013/02/21

2 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Outline Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 2

3 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 3

4 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Trend of heterogeneous architectures 4

5 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction 5

6 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Challenges – Irregular applications – Dividing work between CPU and GPU 6

7 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 7

8 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Irregular Reductions Regular ReductionIrregular Reduction 8

9 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Irregular Reductions Codes from many scientific and engineering domains contain loops with Irregular Reductions Application – Computational Fluid Dynamics (CFD) – Molecular Dynamics (MD) 9

10 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Irregular Reductions Irregular → Indirection access 10 Input Output Index

11 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 11

12 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning Computation space (edge) – Coalesced accesses – No data reuse – Ex: IA, Y Reduction space (node) – Data reuse – No coalesced accesses – Ex: RA, X 12

13 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning Two partitioning choices Computation Space – Partition on edges Reduction Space – Partition on nodes 13

14 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning Computation Space Partitioning (CSP) 14 16 nodes 20 nodes

15 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning From Scatter of viewpoint to see CSP 15 18 2 Partition 1Partition 2 … In Out

16 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning Reduction Space Partitioning (RSP) 16 White node: Output Black node: Input 16 edges 25 edges

17 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning From Gather of viewpoint to see RSP 17 79 1511 121316 Partition2Partition 4 … In Out

18 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning CSP Advantage: – Load Balance on Computation Disadvantage: – Unequal output size in each partition – Replicated elements – Combination cost RSP Advantage: – Balanced output elements – Independent between each partition – Avoid combination cost Disadvantage : – Imbalance on computation – Replicated work 18

19 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 19

20 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Multi-level Partitioning Framework 20 RSP

21 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Detail work of partition level 21

22 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Runtime Support And Schemes 22 Task Scheduling Second-level Partitioning Computation Output

23 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 23

24 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Experimental Environment – CPU Two Intel 2.27 GHz Quad core Xeon E5520 CPU (8 cores, 8 threads) – GPU NVIDIA Tesla C2050 GPU – Fermi – 1.15 GHz, 448 cores (14 SM x 32 cores) – Applications Euler (EU), base on Computational Fluid Dynamics (CFD) Molecular Dynamics (MD) 24

25 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Scalability of IrregularApplications Molecular Dynamics (MD) Euler (EU) 25 0.3 GB 2.6 GB 5.3 GB 1.8 GB 2.7 GB 3.4 GB

26 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Trade-offs between CSP and RSP – MD on CPUs 26

27 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Trade-offs between CSP and RSP – MD on GPU 27

28 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Benefits From Pipelining – MD on CPUs + GPU 28

29 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Benefits From Pipelining – EU on CPUs + GPU 29

30 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Benefits From Work Stealing Strategy 30

31 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Performance benefits from using CPU and GPU simultaneity 31 8 21 26 5 14 16

32 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 32

33 Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Conclusions Porting irregular reduction applications on heterogeneous architectures Multi-level Partitioning Framework – Reduction space partitioning – Pipeline scheme – Work stealing An efficient and good scalability framework 33


Download ppt "Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering,"

Similar presentations


Ads by Google