Download presentation
Published byMyra Strickland Modified over 7 years ago
1
General Purpose computing on Graphics Processing Units
GPU and CUDA General Purpose computing on Graphics Processing Units
2
Introduction GPGPU is a technique for using GPUs to do work that is traditionally handled by the CPU Why? Programmability Precision Performance
3
Motivation? Computational Power! GPUs are FAST!
CPUs follow Moore's Law GPUs speed increases faster than Moore's Law It has been shown that while CPUs follow Moore's Law, the increase in speed that is obtained by GPUs is much better than CPU. GPUs gain 2x speed increase/year while CPUs is only 1.5x speed increas per year.
4
GPUs Getting Faster, Fast!
Specialized nature of the GPUs Transistors not cache Economics Huge video game industry means more money for development
5
GPUs are Flexible and Precise
Modern GPUs are programmable Programmable pixel and vertex engines High Level Language Support Modern GPUs support high precision 32-bit floating-point throughout the pipeline High enough for many(not all) applications
6
Awesome Potential The performance and flexibility of GPUs makes them an attractive platform for general purpose computation. Clusters Cheaper high performance computers for schools Insane number of Gigaflops
7
CUDA Developed by NVIDIA
An architecture the enables the use of standard programming languages on their graphics cards C for CUDA Third Party Wrappers Python Fortan Java MATLAB
8
CUDA Allows latest NVIDIA cards to have open architecture like a normal CPU But a GPU is a parallel “many-core” architecture Each core is capable of running thousands of threads Enables huge performance benefits
9
The Latest CUDA works with all NVIDIA GPUs from the G8x series onwards including GeForce, Quatro, and Tesla. Programs written for G8x series will work on all future GPUs Tesla chip designed specifically for CUDA programming
10
CUDA - Advantages Scattered Reads Shared Memory
Faster downloads and readbacks Full support for integer and bitwise operations
11
Limitations Deviation from IEEE standards
Bottleneck between CPU and GPU Threads should be running in groups of at least 32 for best performance
12
Process Flow 1. Copy data from main mem to GPU mem
2. CPU instructs the process to GPU 3. GPU execute parallel in each core 4. Copy the result from GPU mem to main mem
13
Threaded A multithreaded program is partitioned into blocks of threads that execute independently from each other, so that a GPU with more cores will automatically execute the program in less time than a GPU with fewer cores.
14
My GPU Way better than a CPU NVIDIA GeForce 8800 GTS
Number of Multiprocessors: 12 Number of Cores: 96 Total Memory : 320 MB
15
Installation Started by installing Fedora 9
Download the latest NVIDIA Driver, the SDK and the toolkit Install the Driver Install Toolkit Set paths Install the SDK If all goes well, your computer should be ready to compile and run CUDA programs Of course it doesn't
16
Setting the Paths Set the $PATH Set the $LD_LIBRARY
Export PATH=$PATH:/usr/local/cuda/bin All add this line to the ~/.bashrc file Set the $LD_LIBRARY Change the /etc/ls.so.conf file Add: /usr/local/cuda/lib
17
Problems First: needed to install binutils Second: install gcc
Third: needed to install make Fourth: needed to install freeglut-devel Fifth: Wasn't seeing the nvcc binary file Everything was installed Path was right Architecture? Used arch command and got chip architecture (64-bit) But my OS architecture is 34-bit Everything I downloaded and installed was for 64-bit
18
More Problems Once I fixed the architecture problem I was able to run some of the sample programs. Some I would get errors on the ones with a graphical representation Had to install libXi-devel and the libXmu-devel Now Everything works perfectly
19
Using CUDA Compiler driver that simplifies the process of compiling C
Simple and Familiar The SDK and comes with several sample test programs to run Stored in the projects folder of the SDK
20
Using CUDA (continued)
Pick a desired test project Use the make command to compile the project results stored in bin/linux/release/ Run the program ./program The result printed to the screen are just the time it too to run the program
21
Current Future Uses Accelerated Rendering of 3D glasses
Real time cloth simulation Distributed Calculations Medical analysis simulations Physical Simulations Accelerated Encryption/Decryption and compression
22
Some Examples ssh to fang ssh to gw (ie gateway)
ssh to cd to the NVIDIA/projects directory Pick a project to make Run the project
23
My future Work For my masters project: Make a HPC with many Tesla GPUs
Make a cluster of GPU computers Make a GPU server...?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.