CIS 565: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider and Patrick Cozzi.

Slides:

Advertisements

Similar presentations

Lecture 1: Introduction

Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.

CP411 Computer Graphics, Wilfrid Laurier University Introduction # 1 Welcome to CP411 Computer Graphics 2012 Instructor: Dr. Hongbing Fan Introduction.

Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

A many-core GPU architecture.. Price, performance, and evolution.

CS5500 Computer Graphics © Chun-Fa Chang, Spring 2007 CS5500 Computer Graphics April 19, 2007.

Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.

CIS 665: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.

ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.

Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.

CIS 665: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013 © Barry Wilkinson CUDAIntro.ppt.

CSE 690 General-Purpose Computation on Graphics Hardware (GPGPU) Courtesy David Luebke, University of Virginia.

COMP4070 Computer Graphics Dr. Amy Zhang. Welcome! 2  Introductions  Administrative Matters  Course Outline  What is Computer Graphics?

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.

GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.

(1) ECE 8823: GPU Architectures Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology NVIDIA Keplar.

Computer Graphics Graphics Hardware

GPU Programming and Architecture: Course Overview Patrick Cozzi University of Pennsylvania CIS Fall 2013.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.

GPU Programming and Architecture: Course Overview Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.

Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.

Multicore Computing Lecture 1 : Course Overview Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.

CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.

GPU Programming and Architecture: Course Overview Patrick Cozzi University of Pennsylvania CIS Fall 2012.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

GPU Programming Shirley Moore CPS 5401 Fall 2013

고급 컴퓨터 그래픽스 중앙대학교 컴퓨터공학부 손 봉 수. Course Overview Level : CSE graduate course No required text. We will use lecture notes and on-line materials This course.

Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.

고급 컴퓨터 그래픽스 (Advanced Computer Graphics)

Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,

David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.

From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.

Mapping Computational Concepts to GPUs Mark Harris NVIDIA.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.

3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.

COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.

Multicore Computing Lecture 1 : Course Overview Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.

Computer Graphics Graphics Hardware

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.

GPU Architecture and Its Application

고급 컴퓨터 그래픽스 (Advanced Computer Graphics)

Graphics Processing Unit

From Turing Machine to Global Illumination

Computer Graphics Graphics Hardware

ECE 8823: GPU Architectures

Human Media Multicore Computing Lecture 1 : Course Overview

Human Media Multicore Computing Lecture 1 : Course Overview

Ray Tracing on Programmable Graphics Hardware

Graphics Processing Unit

6- General Purpose GPU Programming

Presentation transcript:

CIS 565: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider and Patrick Cozzi

Administrivia Meeting Monday and Wednesday Monday and Wednesday 1:30-3:00pm 1:30-3:00pm Towne 309 Towne 309 Recorded lectures upon request Recorded lectures upon request Website: Website:

Administrivia Instructor: Joseph Kider Instructor: Joseph Kider

Administrivia Teaching Assistant Qing Sun Qing Sun

Administrivia Prerequisites Prerequisites CIS 460: Introduction to Computer Graphics CIS 460: Introduction to Computer Graphics CIS 501: Computer Architecture CIS 501: Computer Architecture Most important: Most important: C/C++ and OpenGL C/C++ and OpenGL

CIS 534: Multicore Programming and Architecture Course Description Course Description This course is a pragmatic examination of multicore programming and the hardware architecture of modern multicore processors. Unlike the sequential single-core processors of the past, utilizing a multicore processor requires programmers to identify parallelism and write explicitly parallel code. Topics covered include: the relevant architectural trends and aspects of multicores, approaches for writing multicore software by extracting data parallelism (vectors and SIMD), thread-level parallelism, and task-based parallelism, efficient synchronization, and program profiling and performance tuning. The course focuses primarily on mainstream shared- memory multicores with some coverage of graphics processing units (GPUs). Cluster-based supercomputing is not a focus of this course. Several programming assignments and a course project will provide students first-hand experience with programming, experimentally analyzing, and tuning multicore software. Students are expected to have a solid understanding of computer architecture and strong programming skills (including experience with C/C++). This course is a pragmatic examination of multicore programming and the hardware architecture of modern multicore processors. Unlike the sequential single-core processors of the past, utilizing a multicore processor requires programmers to identify parallelism and write explicitly parallel code. Topics covered include: the relevant architectural trends and aspects of multicores, approaches for writing multicore software by extracting data parallelism (vectors and SIMD), thread-level parallelism, and task-based parallelism, efficient synchronization, and program profiling and performance tuning. The course focuses primarily on mainstream shared- memory multicores with some coverage of graphics processing units (GPUs). Cluster-based supercomputing is not a focus of this course. Several programming assignments and a course project will provide students first-hand experience with programming, experimentally analyzing, and tuning multicore software. Students are expected to have a solid understanding of computer architecture and strong programming skills (including experience with C/C++). We will not overlap very much We will not overlap very much

What is GPU (Parallel) Computing Parallel computing: using multiple processors to… Parallel computing: using multiple processors to… More quickly perform a computation, or More quickly perform a computation, or Perform a larger computation in the same time Perform a larger computation in the same time PROGRAMMER expresses parallelism PROGRAMMER expresses parallelism Slide curiosity of Milo Martin Clusters of Computers : MPI, networks, cloud computing …. Shared memory Multiprocessor Called “multicore” when on the same chip GPU: Graphics processing units NOT COVERED CIS 534 MULTICORE COURSE FOCUS CIS 565

Administrivia Course Overview Course Overview System and GPU architecture System and GPU architecture Real-time graphics programming with Real-time graphics programming with OpenGL and GLSL OpenGL and GLSL General purpose programming with General purpose programming with CUDA and OpenCL CUDA and OpenCL Problem domain: up to you Problem domain: up to you Hands-on Hands-on

Administrivia Goals Goals Program massively parallel processors: Program massively parallel processors: High performance High performance Functionality and maintainability Functionality and maintainability Scalability Scalability Gain Knowledge Gain Knowledge Parallel programming principles and patterns Parallel programming principles and patterns Processor architecture features and constraints Processor architecture features and constraints Programming API, tools, and techniques Programming API, tools, and techniques

Administrivia Grading Homeworks (4-5) 40% Homeworks (4-5) 40% Paper Presentation 10% Paper Presentation 10% Final Project 40% + 5% Final Project 40% + 5% Final10% Final10%

Administrivia Bonus days: five per person Bonus days: five per person No-questions-asked one-day extension No-questions-asked one-day extension Multiple bonus days can be used on the same assignment Multiple bonus days can be used on the same assignment Can be used for most, but not all assignments Can be used for most, but not all assignments Strict late policy: not turned by: Strict late policy: not turned by: 11:59pm of due date: 25% deduction 11:59pm of due date: 25% deduction 2 days late: 50% 2 days late: 50% 3 days late: 75% 3 days late: 75% 4 or more days: 100% 4 or more days: 100% Add a Readme when using bonus days Add a Readme when using bonus days

Administrivia Academic Honesty Academic Honesty Discussion with other students, past or present, is encouraged Discussion with other students, past or present, is encouraged Any reference to assignments from previous terms or web postings is unacceptable Any reference to assignments from previous terms or web postings is unacceptable Any copying of non-trivial code is unacceptable Any copying of non-trivial code is unacceptable Non-trivial = more than a line or so Non-trivial = more than a line or so Includes reading someone else’s code and then going off to write your own. Includes reading someone else’s code and then going off to write your own.

Administrivia Academic Honesty Academic Honesty Penalties for academic dishonesty: Penalties for academic dishonesty: Zero on the assignment for the first occasion Zero on the assignment for the first occasion Automatic failure of the course for repeat offenses Automatic failure of the course for repeat offenses

Administrivia Textbook: None Textbook: None Related graphics books: Related graphics books: Graphics Shaders Graphics Shaders OpenGL Shading Language OpenGL Shading Language GPU Gems GPU Gems Related general GPU books: Related general GPU books: Programming Massively Parallel Processors Programming Massively Parallel Processors Patterns for Parallel Programming Patterns for Parallel Programming

Administrivia Do I need a GPU? Do I need a GPU? Yes: NVIDIA GeForce 8 series or higher Yes: NVIDIA GeForce 8 series or higher No No Moore 100b - NVIDIA GeForce 9800s Moore 100b - NVIDIA GeForce 9800s SIG Lab - NVIDIA GeForce 8800s, two GeForce 480s, and one Fermi Tesla SIG Lab - NVIDIA GeForce 8800s, two GeForce 480s, and one Fermi Tesla

Administrivia Demo: What GPU do I have? Demo: What GPU do I have? Demo: What version of OpenGL/CUDA/OpenCL does it support? Demo: What version of OpenGL/CUDA/OpenCL does it support? Demo

Aside: This class is about 3 things PERFORMANCE PERFORMANCE Ok, not really Ok, not really Also about correctness, “-abilities”, etc. Also about correctness, “-abilities”, etc. Nitty Gritty real world wall-clock performance Nitty Gritty real world wall-clock performance No Proofs! No Proofs! Slide curiosity of Milo Martin

Exercise Parallel Sorting Parallel Sorting

Credits David Kirk (NVIDIA) David Kirk (NVIDIA) Wen-mei Hwu (UIUC) Wen-mei Hwu (UIUC) David Lubke David Lubke Wolfgang Engel Wolfgang Engel Etc. etc. Etc. etc.

What is a GPU? GPU: Graphics Processing Unit Processor that resides on your graphics card. GPUs allow us to achieve the unprecedented graphics capabilities now available in games

What is a GPU? Demo: NVIDIA GTX 400 Demo: NVIDIA GTX 400NVIDIA GTX 400NVIDIA GTX 400 Demo: Triangle throughput Demo: Triangle throughput

Why Program the GPU ? Chart from:

Why Program the GPU ? Compute Compute Intel Core i7 – 4 cores – 100 GFLOP Intel Core i7 – 4 cores – 100 GFLOP NVIDIA GTX280 – 240 cores – 1 TFLOP NVIDIA GTX280 – 240 cores – 1 TFLOP Memory Bandwidth Memory Bandwidth System Memory – 60 GB/s System Memory – 60 GB/s NVIDIA GT200 – 150 GB/s NVIDIA GT200 – 150 GB/s Install Base Install Base Over 200 million NVIDIA G80s shipped Over 200 million NVIDIA G80s shipped

How did this happen? Games demand advanced shading Games demand advanced shading Fast GPUs = better shading Fast GPUs = better shading Need for speed = continued innovation Need for speed = continued innovation The gaming industry has overtaken the defense, finance, oil and healthcare industries as the main driving factor for high performance processors. The gaming industry has overtaken the defense, finance, oil and healthcare industries as the main driving factor for high performance processors.

GPU = Fast co-processor ? GPU speed increasing at cubed-Moore’s Law. GPU speed increasing at cubed-Moore’s Law. This is a consequence of the data-parallel streaming aspects of the GPU. This is a consequence of the data-parallel streaming aspects of the GPU. GPUs are cheap! Put a couple together, and you can get a super-computer. GPUs are cheap! Put a couple together, and you can get a super-computer. NYT May 26, 2003: TECHNOLOGY; From PlayStation to Supercomputer for $50,000: National Center for Supercomputing Applications at University of Illinois at Urbana-Champaign builds supercomputer using 70 individual Sony Playstation 2 machines; project required no hardware engineering other than mounting Playstations in a rack and connecting them with high-speed network switch So can we use the GPU for general-purpose computing ?

Yes ! Wealth of applications Voronoi Diagrams Data AnalysisMotion Planning Geometric Optimization Physical Simulation Matrix Multiplication Conjugate Gradient Sorting and Searching Force-field simulation Particle Systems Molecular Dynamics Graph Drawing Signal Processing Database queries Range queries … and graphics too !! Image Processing Radar, Sonar, Oil ExplorationFinance Planning Optimization

When does “GPU=fast co-processor” work ? Real-time visualization of complex phenomena The GPU (like a fast parallel processor) can simulate physical processes like fluid flow, n-body systems, molecular dynamics In general: Massively Parallel Tasks

When does “GPU=fast co- processor” work ? Interactive data analysis For effective visualization of data, interactivity is key

When does “GPU=fast co-processor” work ? Rendering complex scenes Procedural shaders can offload much of the expensive rendering work to the GPU. Still not the Holy Grail of “80 million triangles at 30 frames/sec*”, but it helps. * Alvy Ray Smith, Pixar. Note: NVIDIA Quadro 5000 is calculated to push 950 million triangles per second

Stream Programming A stream is a sequence of data (could be numbers, colors, RGBA vectors,…) A stream is a sequence of data (could be numbers, colors, RGBA vectors,…) A kernel is a (fragment) program that runs on each element of a stream, generating an output stream (pixel buffer). A kernel is a (fragment) program that runs on each element of a stream, generating an output stream (pixel buffer).

Stream Programming Kernel = vertex/fragment shader Kernel = vertex/fragment shader Input stream = stream of vertices, primitives, or fragments Input stream = stream of vertices, primitives, or fragments Output stream = frame buffer or other buffer (transform feedback) Output stream = frame buffer or other buffer (transform feedback) Multiple kernels = multi-pass rendering sequence on the GPU. Multiple kernels = multi-pass rendering sequence on the GPU.

To program the GPU, one must think of it as a (parallel) stream processor.

What is the cost of a stream program ? Number of kernels Number of kernels Readbacks from the GPU to main memory are expensive, and so is transferring data to the GPU. Readbacks from the GPU to main memory are expensive, and so is transferring data to the GPU. Complexity of kernel Complexity of kernel More complexity takes longer to move data through a rendering pipeline More complexity takes longer to move data through a rendering pipeline Number of memory accesses Number of memory accesses Non-local memory access is expensive Non-local memory access is expensive Number of branches Number of branches Divergent branches are expensive Divergent branches are expensive

What will this course cover ?

1. Stream Programming Principles OpenGL programmable pipeline OpenGL programmable pipeline The principles of stream hardware The principles of stream hardware How do we program with streams? How do we program with streams?

2. Shaders and Effects How do we compute complex effects found in today’s games? Examples: How do we compute complex effects found in today’s games? Examples: Parallax Mapping Parallax Mapping Reflections Reflections Skin and Hair Skin and Hair Particle Systems Particle Systems Deformable Mesh Deformable Mesh Morphing Morphing Animation Animation

3. GPGPU / GPU Computing How do we use the GPU as a fast co-processor? How do we use the GPU as a fast co-processor? GPGPU Languages: CUDA and OpenCL GPGPU Languages: CUDA and OpenCL High Performance Computing High Performance Computing Numerical methods and linear algebra: Numerical methods and linear algebra: Inner products Inner products Matrix-vector operations Matrix-vector operations Matrix-Matrix operations Matrix-Matrix operations Sorting Sorting Fluid Simulations Fluid Simulations Fast Fourier Transforms Fast Fourier Transforms Graph Algorithms Graph Algorithms And More… And More… At what point does the GPU become faster than the CPU for matrix operations ? For other operations ? At what point does the GPU become faster than the CPU for matrix operations ? For other operations ?

4. Optimizations How do we use the full potential of the GPU? How do we use the full potential of the GPU? What tools are there to analyze the performance of our algorithms? What tools are there to analyze the performance of our algorithms?

What we want you to get out of this course! 1. Understanding of the GPU as a graphics pipeline 2. Understanding of the GPU as a high performance compute device 3. Understanding of GPU architectures 4. Programming in GLSL, CUDA, and OpenCL 5. Exposure to many core graphics effects performed on GPUs 6. Exposure to many core parallel algorithms performed on GPUs