Lecture 1: Introduction

Slides:

Advertisements

Similar presentations

Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel.

Advertisements

GPU Programming using BU Shared Computing Cluster

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

CS 179: Lecture 2 Lab Review 1. The Problem  Add two arrays  A[] + B[] -> C[]

Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Lecture 1: Introduction

CS5500 Computer Graphics © Chun-Fa Chang, Spring 2007 CS5500 Computer Graphics April 19, 2007.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013 © Barry Wilkinson CUDAIntro.ppt.

COMP4070 Computer Graphics Dr. Amy Zhang. Welcome! 2  Introductions  Administrative Matters  Course Outline  What is Computer Graphics?

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.

GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.

Computer Graphics Graphics Hardware

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

COMP 175 | COMPUTER GRAPHICS Remco Chang1/ Introduction Lecture 01: Introduction COMP 175: Computer Graphics January 15, 2015.

Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.

GPU Programming and Architecture: Course Overview Patrick Cozzi University of Pennsylvania CIS Spring 2012.

1 Dr. Scott Schaefer Programmable Shaders. 2/30 Graphics Cards Performance Nvidia Geforce 6800 GTX 1  6.4 billion pixels/sec Nvidia Geforce 7900 GTX.

CS179: GPU Programming Lecture 16: Final Project Discussion.

Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.

Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.

GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.

고급 컴퓨터 그래픽스 중앙대학교 컴퓨터공학부 손 봉 수. Course Overview Level : CSE graduate course No required text. We will use lecture notes and on-line materials This course.

CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.

GPU Programming and Architecture: Course Overview Patrick Cozzi University of Pennsylvania CIS Fall 2012.

Introduction to OpenGL  OpenGL is a graphics API  Software library  Layer between programmer and graphics hardware (and software)  OpenGL can fit in.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.

고급 컴퓨터 그래픽스 중앙대학교 컴퓨터공학부 손 봉 수. Course Overview Level : CSE graduate course No required text. We will use lecture notes and on-line materials This course.

Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.

GPUs – Graphics Processing Units Applications in Graphics Processing and Beyond COSC 3P93 – Parallel ComputingMatt Peskett.

Programming with CUDA WS 08/09 Lecture 1 Tue, 21 Oct, 2008.

고급 컴퓨터 그래픽스 (Advanced Computer Graphics)

3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.

GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.

An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.

COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.

CS 179: GPU Computing LECTURE 2: MORE BASICS. Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ ◦Separate CUDA.

NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.

Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.

General Purpose computing on Graphics Processing Units

Computer Engg, IIT(BHU)

Computer Graphics Graphics Hardware

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.

GPU Architecture and Its Application

COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE

CS 179: GPU Programming Lecture 1: Introduction 1

Programmable Shaders Dr. Scott Schaefer.

고급 컴퓨터 그래픽스 (Advanced Computer Graphics)

Our Graphics Environment

Graphics Processing Unit

CS 179: GPU Programming Lecture 1: Introduction 1

CS 179: GPU Programming Lecture 1: Introduction 1

Lecture 2: Intro to the simd lifestyle and GPU internals

CS 179: GPU Programming Lecture 19: Projects 1

Graphics Processing Unit

Computer Graphics Graphics Hardware

CS5500 Computer Graphics April 17, 2006 CS5500 Computer Graphics

Computer Graphics Introduction to Shaders

CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders

Graphics Processing Unit

CIS 6930: Chip Multiprocessor: GPU Architecture and Programming

Presentation transcript:

Lecture 1: Introduction CS179: GPU Programming Lecture 1: Introduction

Today Course summary Administrative details Brief history of GPU computing Introduction to CUDA

Course Summary GPU Programming What: Why: GPU: Graphics processing unit -- highly parallel APIs for accelerated hardware Why: Parallel processing

Course Summary: Why GPU?

Course Summary: Why GPU? How many cores, exactly? GeForce 8800 Ultra (2007) - 128 GeForce GTX 260 (2008) - 192 GeForce GTX 295 (2009) - 480* GeForce GTX 480 (2010) - 480 GeForce GTX 590 (2011) - 1024* GeForce GTX 690 (2012) - 3072* GeForce GTX Titan Z (2014) - 5760* * indicates these are cards shipped with 2 GPUs in them, effectively doubling the cores

Course Summary: Why GPU?

Course Summary: Why GPU?

Course Summary: Why GPU?

Course Summary: Why GPU? What kinds of speedups do we get?

Course Summary: Overview What will you learn? CUDA Parallelizing problems Optimizing GPU code CUDA libraries What will we not cover? OpenGL C/C++

Administrative: Course Details CS179: GPU Programming Website: http://courses.cms.caltech.edu/cs179/ Course Instructors/TA’s: Connor DeFanti (cdefanti@caltech.edu) Kevin Yuh (kyuh@caltech.edu) Overseeing Instructor: Al Barr (barr@cs.caltech.edu) Class time: MWF 5:00-5:55PM

Administrative: Assignments Homework: 8 assignments Each worth 10% of your grade (100 pts. each) Final Project: 2 weeks for a custom final project Details are up to you! 20% of your grade (200 pts.)

Administrative: Assignments Assignments will be due Wednesday, 5PM Extensions may be granted… Talk to TA’s beforehand! Office Hours: located in 104 ANB Connor: Tuesday, 8-10PM Kevin: Monday, 8-10PM

Administrative: Assignments Doing the assignments: CUDA-capable machine required! Must have NVIDIA GPU Setting up environment can be tricky Three options: DIY with your own setup Use provided instructions with given environment Use lab machines

Administrative: Assignments Submitting assignments: Due date: Wednesday 5PM Submit assignment as .tar/.zip, or similar Include README file! Name, compilation instructions, answers to conceptual questions on sets, etc. Submit all assignments to cdefanti@caltech.edu Receiving graded assignments: Assignments should get back 1 week after submission We will email you back with grade and comments

GPU History: Early Days Before GPUs: All graphics run on the CPU Each pixel drawn in series Super slow! (CS171, anyone?) Early GPUs: 1980s: Blitters (fixed image sprites) allowed fast image memory transfer 1990s: Introduction of DirectX and OpenGL Brought fixed function pipeline for rendering

GPU History: Early Days Fixed Function Pipeline: “Fixed” OpenGL states Phong or Gouraud shading? Render as wireframe or solid? Very limiting, made early games look similar

GPU History: Shaders Early 2000’s: shaders introduced Allow for much more interesting shading models

GPU History: Shaders Shaders: expanded world of rendering greatly Vertex shaders: apply operations per-vertex Fragment shaders: apply operations per-pixel Geometry shaders: apply operations to add new geometry

GPU History: Shaders These are great when dealing with graphics data… Vertices, faces, pixels, etc. What about general purpose? Can trick GPU DirectX “compute” shader may be an option Anything slicker?

GPU History: CUDA 2007: NVIDIA introduces CUDA C-style programming API for GPU Easier to do GPGPU Easier memory handling Better tools, libraries, etc.

GPU History: CUDA New advantages on the table: Scattered reads Shared memory Faster memory transfer to/from the GPU

GPU History: Other APIs Plenty of other API’s exist for GPGPU OpenCL/WebCL DirectX Compute Shader Other

Using the GPU Highly parallelizable parts of computational problems

A simple problem… Add two arrays On the CPU: A[] + B[] -> C[] (allocate memory for C) For (i from 1 to array length) C[i] <- A[i] + B[i] Operates sequentially… can we do better?

A simple problem… On the CPU (multi-threaded): (allocate memory for C) Create # of threads equal to number of cores on processor (around 2, 4, perhaps 8) (Allocate portions of A, B, C to each thread...) ... In each thread, For (i from beginning region of thread) C[i] <- A[i] + B[i] //lots of waiting involved for memory reads, writes, ... Wait for threads to synchronize... Slightly faster – 2-8x (slightly more with other tricks)

A simple problem… How many threads? How does performance scale? Context switching: High penalty on the CPU!

A simple problem… On the GPU: Speedup: Very high! (e.g. 10x, 100x) (allocate memory for A, B, C on GPU) Create the “kernel” – each thread will perform one (or a few) additions Specify the following kernel operation: For (all i‘s assigned to this thread) C[i] <- A[i] + B[i] Start ~20000 (!) threads Wait for threads to synchronize... Speedup: Very high! (e.g. 10x, 100x)

GPU: Strengths Revealed Parallelism Low context switch penalty! We can “cover up” performance loss by creating more threads!

GPU Computing: Step by Step Setup inputs on the host (CPU-accessible memory) Allocate memory for inputs on the GPU Copy inputs from host to GPU Allocate memory for outputs on the host Allocate memory for outputs on the GPU Start GPU kernel Copy output from GPU to host (Copying can be asynchronous)

GPU: Internals Blocks: Groups of threads Can cooperate via shared memory Can synchronize with each other Max size: 512, 1024 threads (hardware-dependent) Warps: Subgroups of threads within block Execute “in-step” Size: 32 threads

GPU: Internals Block SIMD processing unit Warp

The Kernel Our “parallel” function Simple implementation (won’t work for lots of values)

Indexing Can get a block ID and thread ID within the block: Unique thread ID!

Calling the Kernel …

Calling the Kernel (2)