Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji, et al.
Overview Problem: Want to use the GPU for things other than graphics, however the costs can be high Solution: Improve the CUDA drivers Results: As compared to node of a supercomputer, worth it Conclusion These improvements make using GPGPU’s more feasible
Problem: Need to computation power Why GPU’s? GPU’s are not being fully realized as a resource, often sitting idle when not being used for graphics Better performance for less power as compared to CPU’s What’s the issue? Cost. Efficient scheduling – timing data loads with its uses Memory management – using the small amount of memory available effectively Loads and stores – waiting for memory transfers, taking 100’s of cycles
Solutions Brook+ by AMD, Larrabee by Intel CUDA by NVIDA Greatest technological maturity at the time Paper investigating existing technology and suggested improvements 30 Multi- Processors 8 Streaming Processors 16kb
NVIDA’s Tesla C1060 GPU vs. Hitachi HA8000-tc/RS425 (T2K) Super Computer T2K – fastest supercomputer in Japan T2KC1060 Cores/MPs1630 Clock frequency2.3 GHz1.3 GHz Single SIMD vector length 432 Single peak294 Gflops933 Gflops Main memory32 GB4 GB Memory single peak Cost~$40,000~$2,500 Power300 W200 W
Issues to Overcome High SIMD vector length Small main memory size High register spill cost No L2 cache but rather read-only texture caches
Methods to Hide Away Latency
Computation time between communications > Communication latency Worth sending the data over to the GPU Increasing bandwidth and size of messages makes the constant term in overhead latency seem smaller Efficient use of registers to prevent spills Deciding what work to do where, GPU vs. CPU, work sharing Minimizing divergent warps using atomic operations found in CUDA Divergent warp occur when threads must follow both paths
Results Variable-sized multi-round data transfer scheduling Number of rounds
Results Use of atomic instructions in CUDA to minimize latency
Conclusion CUDA gives programmers the ability to harness the power of the GPU for general uses. The improvements presented allow this option to be more feasible. Strategic use of GPGPU’s as a resource will improve speed and efficiency. However, presented material mainly theoretical, not much strong data to back up More suggestions than implementations, promoting GPGPU use