Download presentation
Presentation is loading. Please wait.
Published byDorothy Watson Modified over 8 years ago
1
Automatic CPU-GPU Communication Management and Optimization Thomas B. Jablin,Prakash Prabhu. James A. Jablin, Nick P. Johnson, Stephen R.Breard David I, August Princeton University, Brown University SCIE. SCOPUS 2011
2
2 Contents Introduction Runtime Library Communication Management Evaluation
3
3 Introduction Copy data CPU to GPU Weakness of Maunal Delay latence Error Unnecessary computation
4
4 Introduction Application __kernel void opencl_kernel() Int main() Introduction
5
5 Cyclic communication pattern Repeat a loop Limit of performance Inspector-executor system - Management system in clusters with distributed memory - Break loop into inspector, a scheduler and executor - Which array offsets the programs read or write during iteration - Executor compute loop iterations in parallel
6
6 Introduction
7
7 Motivation Weakness of Manual communication management Time-consuming Error-prone Limit applicability Repeating a loop Goal : CGCM improve performance Implementation for automatic communication management Remove cyclic dependencies Supporting library for users
8
8 Runtime library CGCM Run-time library enables automatic CPU-GPU communication Optimization for programs Determining decision which byte to transfer Two parts of CGCM Memory allocation Mapping semantics
9
9 Runtime library CGCM Memory allocation Malloc(), calloc() Runtime library store - Saved the impormation of heap, stack, pointer.. - Size of allocation units in a block using binary balance tree - Realized difference unit if it is in a distribution block Live-data transfer pointer in GPU execution
10
10 Runtime library CPU-GPU Mapping Semantics Using three functions - Map(), unmap(), release() Mapping - Memory copy from CPU to GPU - Allocating memory necessary - Return pointer GPU Unmapping - A CPU pointer update the CPU allocation unith which is corresponded the GPU allocation unit - Retrun Pointer with GPU Releasing - A CPU pointer relieve the corresponding GPU allocation unit - Free it necessary - Return Pointer with CPU
11
11 Runtime library
12
12 Communication Management Maunal communication management A common source of errors for parallelization Limits the applicability Communication Management in CGCM Storing live-data list such as global variables, pointer with CPU called “flows” Labeling “flows” when load the operations it has the “flows” After checking flows, transfer data from the CPU to the GPU by map() or mapArray() After function call, compiler call unmap() or unmapArray() for each “flows”
13
13 Optimizing CPU-GPU Community Map promotion Scaned all the region which is function or loop body in GPU Captured all calls to CGCM run-time library featuring Call map() function befor loop region and copies all map() call Acyclic communication pattern with information during loop Alloca promotion Similar logic to map promotion Management local variables with their parents stack frame Glue kernel Optimization small code region of CPU between GPU regions Improve the applicability of map promotion
14
14 Evaluation CPUIntel Core 2 Quad clocked at 2.40GHz GPUNVIDIA GeForce GTX 480 OpenCLVersion 3.2 CUDAVersion 2.0 Platform benchmark PolyBench Rodinia DOALL PARSEC … etc
15
15 Evaluation Result Program speedup over sequential CPU only execution for inspector-executor, opti CGCM, unopti CGCM, manual
16
16 Evaluation
17
17 Conculsion CGCM is first fully automatic system for managing and optimizing CPU-GPU communication CGCM has twoparts, one is run-time library the other is optimizing compiler CGCM performs 5.3x over best sequential CPU-only
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.