Download presentation
Presentation is loading. Please wait.
Published byClaude Higgins Modified over 9 years ago
1
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3
2
GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical path of application ● Data parallel algorithms leverage GPU attributes – Large data arrays, streaming throughput – Fine-grain SIMD parallelism – Low-latency floating point (FP) computation
3
GPGPU Constraints ● Dealing with graphics API – Working with the corner cases of the graphics API ● Addressing modes – Limited texture size/dimension ● Shader capabilities – Limited outputs ● Instruction sets – Lack of Integer & bit ops ● Communication limited – Between pixels – Scatter a[i] = p
4
CUDA ● General purpose programming model – User kicks off batches of threads on the GPU – GPU = dedicated super-threaded, massively data parallel co-processor ● Targeted software stack – Compute oriented drivers, language, and tools
5
CUDA ● Driver for loading computation programs into GPU – Standalone Driver - Optimized for computation – Interface designed for compute - graphics free API – Data sharing with OpenGL buffer objects – Guaranteed maximum download & readback speeds – Explicit GPU memory management
6
Parallel Computing on a GPU NVIDIA GPU Computing Architecture – Via a separate HW interface – In laptops, desktops, workstations, servers 8-series GPUs deliver 50 to 200 GFLOPS on compiled parallel C applications
7
Parallel Computing on a GPU GPU parallelism is doubling every year Programming model scales transparently Programmable in C with CUDA tools Multithreaded SPMD model uses application-data parallelism and thread parallelism
8
CPU vs GPU
11
● GPU Baseline speedup is approximately 60x ● For 500,000 particles that is a reduction in calculation time from 33 minutes to 33 seconds!
12
Conclusion ● Without optimization we already got an amazing speedup on CUDA ● N 2 algorithm is “made” for CUDA ● Optimizations are hard to predict in advance tradeoffs
13
Conclusion ● There are ways to dynamically distribute workloads across a fixed number of blocks ● Biggest problem: how to handle dynamic results in global memory
14
Uses – CUDA provided benefit for many applications. Here list of some: ● Seismic Database - 66x to 100x speedup http://www.headwave.com. ● Molecular Dynamics - 21x to 100x speedup http://www.ks.uiuc.edu/Research/vmd ● MRI processing - 245x to 415x speedup ● http://bic-test.beckman.uiuc.edu ● Atmospheric Cloud Simulation - 50x speedup http://www.cs.clemson.edu/~jesteel/clouds.html.
15
References – CUDA, Supercomputing for the Masses by Rob Farber. ● http://www.ddj.com/architect/207200659. – CUDA, Wikipedia. ● http://en.wikipedia.org/wiki/CUDA. – Cuda for developers, Nvidia. ● http://www.nvidia.com/object/cuda_home.html#. – Download CUDA manual and binaries. ● http://www.nvidia.com/object/cuda_get.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.