General Purpose computing on Graphics Processing Units

Slides:



Advertisements
Similar presentations
GPU Programming using BU Shared Computing Cluster
Advertisements

Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
A Complete GPU Compute Architecture by NVIDIA Tamal Saha, Abhishek Rawat, Minh Le {ts4rq, ar8eb,
Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
A many-core GPU architecture.. Price, performance, and evolution.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Havok. ©Copyright 2006 Havok.com (or its licensors). All Rights Reserved. HavokFX Next Gen Physics on ATI GPUs Andrew Bowell – Senior Engineer Peter Kipfer.
Computer Graphics Graphics Hardware
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Multi-core architectures. Single-core computer Single-core CPU chip.
Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
GPU Architecture and Programming
ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
CUDA. Assignment  Subject: DES using CUDA  Deliverables: des.c, des.cu, report  Due: 12/14,
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Computer Engg, IIT(BHU)
Computer Graphics Graphics Hardware
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
Applied Operating System Concepts
GPU Architecture and Its Application
CS5100 Advanced Computer Architecture
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Mobile App Development
What is GPU? how does it work?
Constructing a system with multiple computers or processors
AES on GPU using CUDA Choi dae soon.
Graphics Processing Unit
Real-Time Ray Tracing Stefan Popov.
Chapter III Desktop Imaging Systems & Issues
From Turing Machine to Global Illumination
Lecture 2: Intro to the simd lifestyle and GPU internals
NVIDIA Fermi Architecture
Operating System Concepts
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
The Free Lunch Ended 7 Years Ago
Constructing a system with multiple computers or processors
Introduction to Operating Systems
Computer Graphics Graphics Hardware
Graphics Processing Unit
Operating System Concepts
Types of Parallel Computers
6- General Purpose GPU Programming
CSE 502: Computer Architecture
Multicore and GPU Programming
Presentation transcript:

General Purpose computing on Graphics Processing Units GPU and CUDA General Purpose computing on Graphics Processing Units

Introduction GPGPU is a technique for using GPUs to do work that is traditionally handled by the CPU Why? Programmability Precision Performance

Motivation? Computational Power! GPUs are FAST! CPUs follow Moore's Law GPUs speed increases faster than Moore's Law It has been shown that while CPUs follow Moore's Law, the increase in speed that is obtained by GPUs is much better than CPU. GPUs gain 2x speed increase/year while CPUs is only 1.5x speed increas per year.

GPUs Getting Faster, Fast! Specialized nature of the GPUs Transistors not cache Economics Huge video game industry means more money for development

GPUs are Flexible and Precise Modern GPUs are programmable Programmable pixel and vertex engines High Level Language Support Modern GPUs support high precision 32-bit floating-point throughout the pipeline High enough for many(not all) applications

Awesome Potential The performance and flexibility of GPUs makes them an attractive platform for general purpose computation. Clusters Cheaper high performance computers for schools Insane number of Gigaflops

CUDA Developed by NVIDIA An architecture the enables the use of standard programming languages on their graphics cards C for CUDA Third Party Wrappers Python Fortan Java MATLAB

CUDA Allows latest NVIDIA cards to have open architecture like a normal CPU But a GPU is a parallel “many-core” architecture Each core is capable of running thousands of threads Enables huge performance benefits

The Latest CUDA works with all NVIDIA GPUs from the G8x series onwards including GeForce, Quatro, and Tesla. Programs written for G8x series will work on all future GPUs Tesla chip designed specifically for CUDA programming

CUDA - Advantages Scattered Reads Shared Memory Faster downloads and readbacks Full support for integer and bitwise operations

Limitations Deviation from IEEE standards Bottleneck between CPU and GPU Threads should be running in groups of at least 32 for best performance

Process Flow 1. Copy data from main mem to GPU mem 2. CPU instructs the process to GPU 3. GPU execute parallel in each core 4. Copy the result from GPU mem to main mem

Threaded A multithreaded program is partitioned into blocks of threads that execute independently from each other, so that a GPU with more cores will automatically execute the program in less time than a GPU with fewer cores.

My GPU Way better than a CPU NVIDIA GeForce 8800 GTS Number of Multiprocessors: 12 Number of Cores: 96 Total Memory : 320 MB

Installation Started by installing Fedora 9 Download the latest NVIDIA Driver, the SDK and the toolkit Install the Driver Install Toolkit Set paths Install the SDK If all goes well, your computer should be ready to compile and run CUDA programs Of course it doesn't

Setting the Paths Set the $PATH Set the $LD_LIBRARY Export PATH=$PATH:/usr/local/cuda/bin All add this line to the ~/.bashrc file Set the $LD_LIBRARY Change the /etc/ls.so.conf file Add: /usr/local/cuda/lib

Problems First: needed to install binutils Second: install gcc Third: needed to install make Fourth: needed to install freeglut-devel Fifth: Wasn't seeing the nvcc binary file Everything was installed Path was right Architecture? Used arch command and got chip architecture (64-bit) But my OS architecture is 34-bit Everything I downloaded and installed was for 64-bit

More Problems Once I fixed the architecture problem I was able to run some of the sample programs. Some I would get errors on the ones with a graphical representation Had to install libXi-devel and the libXmu-devel Now Everything works perfectly

Using CUDA Compiler driver that simplifies the process of compiling C Simple and Familiar The SDK and comes with several sample test programs to run Stored in the projects folder of the SDK

Using CUDA (continued) Pick a desired test project Use the make command to compile the project results stored in bin/linux/release/ Run the program ./program The result printed to the screen are just the time it too to run the program

Current Future Uses Accelerated Rendering of 3D glasses Real time cloth simulation Distributed Calculations Medical analysis simulations Physical Simulations Accelerated Encryption/Decryption and compression

Some Examples ssh to fang ssh to gw (ie gateway) ssh to root@192.168.166.218 cd to the NVIDIA/projects directory Pick a project to make Run the project

My future Work For my masters project: Make a HPC with many Tesla GPUs Make a cluster of GPU computers Make a GPU server...?