Lecture 1: Introduction

Slides:



Advertisements
Similar presentations
Lecture 1: Introduction
Advertisements

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
CS 179: Lecture 2 Lab Review 1. The Problem  Add two arrays  A[] + B[] -> C[]
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
CS 179: GPU Computing Lecture 2: The Basics. Recap Can use GPU to solve highly parallelizable problems – Performance benefits vs. CPU Straightforward.
L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Network coding on the GPU Péter Vingelmann Supervisor: Frank H.P. Fitzek.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
COMP4070 Computer Graphics Dr. Amy Zhang. Welcome! 2  Introductions  Administrative Matters  Course Outline  What is Computer Graphics?
C O M P U T E R G R A P H I C S Guoying Zhao 1 / 16 Computer Graphics Course Introduction.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
1. 2 Plan Introduction Overview of the semester Administrivia Iterated Function Systems (fractals)
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/ Introduction Lecture 01: Introduction COMP 175: Computer Graphics January 15, 2015.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
1 Angel: Interactive Computer Graphics 5E © Addison-Wesley 2009 CS4610/7610: Introduction to Computer Graphics.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
CS179: GPU Programming Lecture 16: Final Project Discussion.
COP3502: Introduction to Computer Science Yashas Shankar.
Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.
GPU Architecture and Programming
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
OpenCL Sathish Vadhiyar Sources: OpenCL quick overview from AMD OpenCL learning kit from AMD.
11/21/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Sam King and Andrew.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
OpenCL Programming James Perry EPCC The University of Edinburgh.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.
Programming with CUDA WS 08/09 Lecture 1 Tue, 21 Oct, 2008.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.
My Coordinates Office EM G.27 contact time:
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
Programming with CUDA WS 08/09 Lecture 2 Tue, 28 Oct, 2008.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
CS 179: GPU Computing LECTURE 2: MORE BASICS. Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ ◦Separate CUDA.
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Computer Engg, IIT(BHU)
ENCM 369 Computer Organization
CS 179: GPU Programming Lecture 1: Introduction 1
CS 179: GPU Programming Lecture 1: Introduction 1
CS 179: GPU Programming Lecture 1: Introduction 1
CS 179: GPU Programming Lecture 7.
CS 179 Project Intro.
NVIDIA Fermi Architecture
CS 0007 Spring Lory Al Moakar.
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
GPU Scheduling on the NVIDIA TX2:
Presentation transcript:

Lecture 1: Introduction CS 179: GPU Programming Lecture 1: Introduction Images: http://en.wikipedia.org http://www.pcper.com http://northdallasradiationoncology.com/ GPU Gems (Nvidia)

The Problem Are our computers fast enough? Source: XKCD Comics (http://xkcd.com/676/)

The Problem Are our computers really fast enough? http://lauraskelton.github.io/images/posts/5deepnetworklayer.png http://www.dmi.unict.it/nicosia/research/proteinFolding3.png http://www.cnet.com/

The Problem What does it mean to “solve” a computational problem?

The CPU The “central processing unit” Traditionally, applications use CPU for primary calculations Powerful, general-purpose capabilities R+D -> Moore’s Law! Established technology Wikimedia commons: Intel_CPU_Pentium_4_640_Prescott_bottom.jpg

The GPU Designed for our “graphics” For “graphics problems”, much faster than the CPU! What about other problems?

This course in 30 seconds For certain problems, use instead of Images: http://www.nvidia.com, Wikimedia commons: Intel_CPU_Pentium_4_640_Prescott_bottom.jpg

This course in 60 seconds GPU: Hundreds of cores! vs. 2,4,8 cores on CPU Good for highly parallelizable problems: Increasing speed by 10x, 100x+

Questions What is a GPU? What is a parallelizable problem? What does GPU-accelerated code look like? Who cares?

Outline Motivations Brief history “A simple problem” “A simple solution” Administrivia

GPUs – The Motivation Screens! Refresh rate: ~60 Hz 1e5 – 1e7 pixels Refresh rate: ~60 Hz Total: ~1e7-1e9 pixels/sec ! (Very approximate – orders of magnitude)

GPUs – The Motivation Lots of calculations are “the same”! e.g. Raytracing: Goal: Trace light rays, calculate object interaction to produce realistic image f Superquadric Cylinders, exponent 0.1, yellow glass balls, Barr, 1981 Watt, 3D Computer Graphics (from http://courses.cms.caltech.edu/cs171/)

GPUs – The Motivation Lots of calculations are “the same”! e.g. Raytracing: for all pixels (i,j): Calculate ray point and direction in 3d space if ray intersects object: calculate lighting at closest object store color of (i,j) Superquadric Cylinders, exponent 0.1, yellow glass balls, Barr, 1981

GPUs – The Motivation Lots of calculations are “the same”! e.g. Simple shading: for all pixels (i,j): replace previous color with new color according to rules "Example of a Shader" by TheReplay - Taken/shaded with YouFX webcam software, composited next to each other in Photoshop. Licensed under CC BY-SA 3.0 via Wikipedia - http://en.wikipedia.org/wiki/File:Example_of_a_Shader.png#/media/File:Example_of_a_Shader.png

GPUs – The Motivation Lots of calculations are “the same”! e.g. Transformations (camera, perspective, …): for all vertices (x,y,z) in scene: Obtain new vertex (x’,y’,z’) = T(x,y,z) h

Outline Motivations Brief history “A simple problem” “A simple solution” This course

GPUs – Brief History Fixed-function pipelines Pre-set functions, limited options http://gamedevelopment.tutsplus.com/articles/the-end-of-fixed-function-rendering-pipelines-and-how-to-move-on--cms-21469 Source: Super Mario 64, by Nintendo

GPUs – Brief History Shaders Could implement one’s own functions! GLSL (C-like language) Could “sneak in” general-purpose programming! http://minecraftsix.com/glsl-shaders-mod/

GPUs – Brief History CUDA (Compute Unified Device Architecture) General-purpose parallel computing platform for NVIDIA GPUs OpenCL (Open Computing Language) General heterogenous computing framework … Accessible as extensions to C! (and other languages…)

GPUs Today “General-purpose computing on GPUs” (GPGPU)

Demonstrations

Outline Motivations Brief history “A simple problem” “A simple solution” This course

A simple problem… Add two arrays On the CPU: A[] + B[] -> C[] float *C = malloc(N * sizeof(float)); for (int i = 0; i < N; i++) C[i] = A[i] + B[i]; Operates sequentially… can we do better?

A simple problem… On the CPU (multi-threaded, pseudocode): (allocate memory for C) Create # of threads equal to number of cores on processor (around 2, 4, perhaps 8) (Indicate portions of A, B, C to each thread...) ... In each thread, For (i from beginning region of thread) C[i] <- A[i] + B[i] //lots of waiting involved for memory reads, writes, ... Wait for threads to synchronize... Slightly faster – 2-8x (slightly more with other tricks)

A simple problem… How many threads? How does performance scale? Context switching: High penalty on the CPU Low penalty on the GPU

A simple problem… On the GPU: (allocate memory for A, B, C on GPU) Create the “kernel” – each thread will perform one (or a few) additions Specify the following kernel operation: For (all i‘s assigned to this thread) C[i] <- A[i] + B[i] Start ~20000 (!) threads Wait for threads to synchronize...

GPU: Strengths Revealed Parallelism / lots of cores Low context switch penalty! We can “cover up” performance loss by creating more threads!

Outline Motivations Brief history “A simple problem” “A simple solution” This course

GPU Computing: Step by Step Setup inputs on the host (CPU-accessible memory) Allocate memory for inputs on the GPU Allocate memory for outputs on the host Allocate memory for outputs on the GPU Copy inputs from host to GPU Start GPU kernel Copy output from GPU to host (Copying can be asynchronous)

The Kernel Our “parallel” function Simple implementation

Indexing Can get a block ID and thread ID within the block: Unique thread ID!

Calling the Kernel

Calling the Kernel (2)

Summary For many highly parallelizable problems… GPU offers massive performance increase! Making difficult problems easy Putting impossible problems within reach

Outline Motivations Brief history “A simple problem” “A simple solution” This course

This Course General topics: Prerequisites: GPU computing /parallelization Audio, linear algebra, medical engineering, machine learning, finance, … CUDA (parallel computing platform) Libraries, optimizations, etc Prerequisites: C/C++ knowledge

Administrivia Course Instructors/TA’s: CS179: GPU Programming Kevin Yuh (kyuh@caltech.edu) Eric Martin (emartin@caltech.edu) CS179: GPU Programming Website: http://courses.cms.caltech.edu/cs179/ Overseeing Instructor: Al Barr (barr@cs.caltech.edu) Class time: ANB 107, MWF 3:00 PM

Course Requirements Option 1: Homework: Final project: 7 assignments Each worth 10% of grade Due Wednesdays, 5 PM 3 PM (chg’d 4/3/2015) Final project: 3-week project 30% of grade

Course Requirements Option 2: Homework: Final project: 5 assignments Each worth 10% of grade Due Wednesdays, 5 PM 3 PM (chg’d 4/3/2015) Final project: 5-week project 50% of grade Difference: Exchange sets 6,7 for more time on project

Projects Topic – your choice! Project scale Solo or pairs 5-week projects: Significantly more extensive Solo or pairs Expectations set accordingly Idea generation: Keep eyes open! Talk to us We hope to bring guests!

Administrivia Collaboration policy: Discuss ideas and strategies freely, but all code must be your own “50 foot rule” (in spirit) – don’t consult your code when helping others with their code

Administrivia Office Hours: Located in ANB 104 Extensions on request Kevin: Mondays, 9-11 PM Eric: Tuesdays, 7-9 PM Extensions on request Talk to TAs

Machines Primary machines (multi-GPU, remote access): haru.caltech.edu mako.caltech.edu (pending) E-mail me your preferred username! Change your password Separately on each machine (once mako is up) Use passwd command

Machines Secondary (CMS) machines: Use your CMS login mx.cms.caltech.edu minuteman.cms.caltech.edu Use your CMS login Not all assignments work here!

Machines Alternative: Use your own! (Harder): Must have an NVIDIA CUDA-capable GPU Virtual machines won’t work! Exception: Machines with I/O MMU virtualization and certain GPUs Special requirements for: Hybrid/optimus systems Mac/OS X Setup is difficult! (But we have some instructions) May need to modify assignment makefiles

Final remarks for the day… "Three RAAF FA-18 Hornets in formation after refueling" by U.S. Air Force photo by Senior Airman Matthew Bruch - http://www.flickr.com/photos/pacificairforces/8472331771/in/photostream. Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Three_RAAF_FA-18_Hornets_in_formation_after_refueling.jpg#/media/File:Three_RAAF_FA-18_Hornets_in_formation_after_refueling.jpg

Welcome to the course!