Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,

Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji, et al.

Overview  Problem:  Want to use the GPU for things other than graphics, however the costs can be high  Solution:  Improve the CUDA drivers  Results:  As compared to node of a supercomputer, worth it  Conclusion  These improvements make using GPGPU’s more feasible

Problem: Need to computation power  Why GPU’s?  GPU’s are not being fully realized as a resource, often sitting idle when not being used for graphics  Better performance for less power as compared to CPU’s  What’s the issue? Cost.  Efficient scheduling – timing data loads with its uses  Memory management – using the small amount of memory available effectively  Loads and stores – waiting for memory transfers, taking 100’s of cycles

Solutions  Brook+ by AMD, Larrabee by Intel  CUDA by NVIDA  Greatest technological maturity at the time  Paper investigating existing technology and suggested improvements 30 Multi- Processors 8 Streaming Processors 16kb

NVIDA’s Tesla C1060 GPU vs. Hitachi HA8000-tc/RS425 (T2K) Super Computer  T2K – fastest supercomputer in Japan T2KC1060 Cores/MPs1630 Clock frequency2.3 GHz1.3 GHz Single SIMD vector length 432 Single peak294 Gflops933 Gflops Main memory32 GB4 GB Memory single peak.109.004 Cost~$40,000~$2,500 Power300 W200 W

Issues to Overcome  High SIMD vector length  Small main memory size  High register spill cost  No L2 cache but rather read-only texture caches

Methods to Hide Away Latency

 Computation time between communications > Communication latency  Worth sending the data over to the GPU  Increasing bandwidth and size of messages makes the constant term in overhead latency seem smaller  Efficient use of registers to prevent spills  Deciding what work to do where, GPU vs. CPU, work sharing  Minimizing divergent warps using atomic operations found in CUDA  Divergent warp occur when threads must follow both paths

Results  Variable-sized multi-round data transfer scheduling Number of rounds

Results  Use of atomic instructions in CUDA to minimize latency

Conclusion  CUDA gives programmers the ability to harness the power of the GPU for general uses.  The improvements presented allow this option to be more feasible.  Strategic use of GPGPU’s as a resource will improve speed and efficiency.  However, presented material mainly theoretical, not much strong data to back up  More suggestions than implementations, promoting GPGPU use

Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,

Similar presentations

Presentation on theme: "Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,

Similar presentations

Presentation on theme: "Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,"— Presentation transcript:

Similar presentations

About project

Feedback