Download presentation
Presentation is loading. Please wait.
Published byDale Shaw Modified over 9 years ago
1
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 1 2011 18th International Conference on High Performance Computing (HiPC) Presented by Po-Ting Liu 2013/02/21
2
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Outline Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 2
3
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 3
4
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Trend of heterogeneous architectures 4
5
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction 5
6
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Challenges – Irregular applications – Dividing work between CPU and GPU 6
7
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 7
8
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Irregular Reductions Regular ReductionIrregular Reduction 8
9
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Irregular Reductions Codes from many scientific and engineering domains contain loops with Irregular Reductions Application – Computational Fluid Dynamics (CFD) – Molecular Dynamics (MD) 9
10
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Irregular Reductions Irregular → Indirection access 10 Input Output Index
11
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 11
12
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning Computation space (edge) – Coalesced accesses – No data reuse – Ex: IA, Y Reduction space (node) – Data reuse – No coalesced accesses – Ex: RA, X 12
13
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning Two partitioning choices Computation Space – Partition on edges Reduction Space – Partition on nodes 13
14
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning Computation Space Partitioning (CSP) 14 16 nodes 20 nodes
15
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning From Scatter of viewpoint to see CSP 15 18 2 Partition 1Partition 2 … In Out
16
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning Reduction Space Partitioning (RSP) 16 White node: Output Black node: Input 16 edges 25 edges
17
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning From Gather of viewpoint to see RSP 17 79 1511 121316 Partition2Partition 4 … In Out
18
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Single-Level Partitioning CSP Advantage: – Load Balance on Computation Disadvantage: – Unequal output size in each partition – Replicated elements – Combination cost RSP Advantage: – Balanced output elements – Independent between each partition – Avoid combination cost Disadvantage : – Imbalance on computation – Replicated work 18
19
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 19
20
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Multi-level Partitioning Framework 20 RSP
21
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Detail work of partition level 21
22
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Runtime Support And Schemes 22 Task Scheduling Second-level Partitioning Computation Output
23
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 23
24
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Experimental Environment – CPU Two Intel 2.27 GHz Quad core Xeon E5520 CPU (8 cores, 8 threads) – GPU NVIDIA Tesla C2050 GPU – Fermi – 1.15 GHz, 448 cores (14 SM x 32 cores) – Applications Euler (EU), base on Computational Fluid Dynamics (CFD) Molecular Dynamics (MD) 24
25
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Scalability of IrregularApplications Molecular Dynamics (MD) Euler (EU) 25 0.3 GB 2.6 GB 5.3 GB 1.8 GB 2.7 GB 3.4 GB
26
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Trade-offs between CSP and RSP – MD on CPUs 26
27
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Trade-offs between CSP and RSP – MD on GPU 27
28
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Benefits From Pipelining – MD on CPUs + GPU 28
29
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Benefits From Pipelining – EU on CPUs + GPU 29
30
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Benefits From Work Stealing Strategy 30
31
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Experimental Results Performance benefits from using CPU and GPU simultaneity 31 8 21 26 5 14 16
32
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Introduction Irregular Reductions Single-Level Partitioning Multi-level Partitioning Framework Experimental Results Conclusions 32
33
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Conclusions Porting irregular reduction applications on heterogeneous architectures Multi-level Partitioning Framework – Reduction space partitioning – Pipeline scheme – Work stealing An efficient and good scalability framework 33
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.