General Purpose computing on Graphics Processing Units GPU and CUDA General Purpose computing on Graphics Processing Units
Introduction GPGPU is a technique for using GPUs to do work that is traditionally handled by the CPU Why? Programmability Precision Performance
Motivation? Computational Power! GPUs are FAST! CPUs follow Moore's Law GPUs speed increases faster than Moore's Law It has been shown that while CPUs follow Moore's Law, the increase in speed that is obtained by GPUs is much better than CPU. GPUs gain 2x speed increase/year while CPUs is only 1.5x speed increas per year.
GPUs Getting Faster, Fast! Specialized nature of the GPUs Transistors not cache Economics Huge video game industry means more money for development
GPUs are Flexible and Precise Modern GPUs are programmable Programmable pixel and vertex engines High Level Language Support Modern GPUs support high precision 32-bit floating-point throughout the pipeline High enough for many(not all) applications
Awesome Potential The performance and flexibility of GPUs makes them an attractive platform for general purpose computation. Clusters Cheaper high performance computers for schools Insane number of Gigaflops
CUDA Developed by NVIDIA An architecture the enables the use of standard programming languages on their graphics cards C for CUDA Third Party Wrappers Python Fortan Java MATLAB
CUDA Allows latest NVIDIA cards to have open architecture like a normal CPU But a GPU is a parallel “many-core” architecture Each core is capable of running thousands of threads Enables huge performance benefits
The Latest CUDA works with all NVIDIA GPUs from the G8x series onwards including GeForce, Quatro, and Tesla. Programs written for G8x series will work on all future GPUs Tesla chip designed specifically for CUDA programming
CUDA - Advantages Scattered Reads Shared Memory Faster downloads and readbacks Full support for integer and bitwise operations
Limitations Deviation from IEEE standards Bottleneck between CPU and GPU Threads should be running in groups of at least 32 for best performance
Process Flow 1. Copy data from main mem to GPU mem 2. CPU instructs the process to GPU 3. GPU execute parallel in each core 4. Copy the result from GPU mem to main mem
Threaded A multithreaded program is partitioned into blocks of threads that execute independently from each other, so that a GPU with more cores will automatically execute the program in less time than a GPU with fewer cores.
My GPU Way better than a CPU NVIDIA GeForce 8800 GTS Number of Multiprocessors: 12 Number of Cores: 96 Total Memory : 320 MB
Installation Started by installing Fedora 9 Download the latest NVIDIA Driver, the SDK and the toolkit Install the Driver Install Toolkit Set paths Install the SDK If all goes well, your computer should be ready to compile and run CUDA programs Of course it doesn't
Setting the Paths Set the $PATH Set the $LD_LIBRARY Export PATH=$PATH:/usr/local/cuda/bin All add this line to the ~/.bashrc file Set the $LD_LIBRARY Change the /etc/ls.so.conf file Add: /usr/local/cuda/lib
Problems First: needed to install binutils Second: install gcc Third: needed to install make Fourth: needed to install freeglut-devel Fifth: Wasn't seeing the nvcc binary file Everything was installed Path was right Architecture? Used arch command and got chip architecture (64-bit) But my OS architecture is 34-bit Everything I downloaded and installed was for 64-bit
More Problems Once I fixed the architecture problem I was able to run some of the sample programs. Some I would get errors on the ones with a graphical representation Had to install libXi-devel and the libXmu-devel Now Everything works perfectly
Using CUDA Compiler driver that simplifies the process of compiling C Simple and Familiar The SDK and comes with several sample test programs to run Stored in the projects folder of the SDK
Using CUDA (continued) Pick a desired test project Use the make command to compile the project results stored in bin/linux/release/ Run the program ./program The result printed to the screen are just the time it too to run the program
Current Future Uses Accelerated Rendering of 3D glasses Real time cloth simulation Distributed Calculations Medical analysis simulations Physical Simulations Accelerated Encryption/Decryption and compression
Some Examples ssh to fang ssh to gw (ie gateway) ssh to root@192.168.166.218 cd to the NVIDIA/projects directory Pick a project to make Run the project
My future Work For my masters project: Make a HPC with many Tesla GPUs Make a cluster of GPU computers Make a GPU server...?