CS 179: GPU Programming Lecture 1: Introduction 1

Slides:

Advertisements

Similar presentations

Lecture 1: Introduction

Advertisements

Slide 2-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 2 Using the Operating System 2.

CS 179: Lecture 2 Lab Review 1. The Problem  Add two arrays  A[] + B[] -> C[]

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.

CS 179: GPU Computing Lecture 2: The Basics. Recap Can use GPU to solve highly parallelizable problems – Performance benefits vs. CPU Straightforward.

Lecture 1: Introduction

Operating Systems CS451 Brian Bershad

L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

OS Fall ’ 02 Introduction Operating Systems Fall 2002.

OS Spring’04 Introduction Operating Systems Spring 2004.

Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.

1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.

GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.

(1) ECE 8823: GPU Architectures Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology NVIDIA Keplar.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.

CS179: GPU Programming Lecture 16: Final Project Discussion.

GPU Architecture and Programming

C o n f i d e n t i a l 1 Course: BCA Semester: III Subject Code : BC 0042 Subject Name: Operating Systems Unit number : 1 Unit Title: Overview of Operating.

GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.

Programming with CUDA WS 08/09 Lecture 1 Tue, 21 Oct, 2008.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

Operating Systems CMPSC 473 Introduction and Overview August 24, Lecture 1 Instructor: Bhuvan Urgaonkar.

My Coordinates Office EM G.27 contact time:

GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.

Programming with CUDA WS 08/09 Lecture 2 Tue, 28 Oct, 2008.

CS 179: GPU Computing LECTURE 2: MORE BASICS. Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ ◦Separate CUDA.

CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.

Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.

GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.

General Purpose computing on Graphics Processing Units

Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.

Computer Engg, IIT(BHU)

Computer Graphics Graphics Hardware

ENCM 369 Computer Organization

CS 179: GPU Programming Lecture 1: Introduction 1

CS427 Multicore Architecture and Parallel Computing

COMPSCI 110 Operating Systems

Chapter 4: Multithreaded Programming

Programming Massively Parallel Graphics Processors

CS 179: GPU Programming Lecture 1: Introduction 1

Lecture 2: Intro to the simd lifestyle and GPU internals

Operating Systems and Systems Programming

Pipeline parallelism and Multi–GPU Programming

CS 179: GPU Programming Lecture 7.

EEL4930/5934 Reconfigurable Computing

Lecture 5: GPU Compute Architecture for the last time

CS 179 Project Intro.

NVIDIA Fermi Architecture

CS 0007 Spring Lory Al Moakar.

Programming Massively Parallel Graphics Processors

Computer Graphics Graphics Hardware

ECE 8823: GPU Architectures

CS510 Operating System Foundations

Graphics Processing Unit

Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.

GPU Scheduling on the NVIDIA TX2:

6- General Purpose GPU Programming

Sarah Diesburg Operating Systems CS 3430

EEL4930/5934 Reconfigurable Computing

Presentation transcript:

CS 179: GPU Programming Lecture 1: Introduction 1 Images: http://en.wikipedia.org http://www.pcper.com http://northdallasradiationoncology.com/ GPU Gems (Nvidia) 1

Administration Covered topics: (GP)GPU computing/parallelization C++ CUDA (parallel computing platform) TAs: cs179.ta@gmail.com for set submission and extension requests Aadyot Bhatnagar(abhatnag@caltech.edu) Tyler Port (tport@caltech.edu) Website: http://courses.cms.caltech.edu/cs179/ Overseeing Instructor: Al Barr (barr@cs.caltech.edu) Class time: ANB 107, MWF 3:00 PM Recitations on Fridays

Fill out this survey: https://goo.gl/forms/RZiUFBGYs2GKYEFA2 Course Requirements Fill out this survey: https://goo.gl/forms/RZiUFBGYs2GKYEFA2 Fill out this when2meet for office hours ASAP: https://www.when2meet.com/?6806202-GXLXT Homework: 6 weekly assignments Each worth 10% of grade Final project: 4-week project 40% of grade total P/F Students must receive at least 60% on every assignment AND the final project

Due on Wednesdays before class (3PM) Homework Due on Wednesdays before class (3PM) First set out April 4th, due April 11th Collaboration policy: Discuss ideas and strategies freely, but all code must be your own Do not look up prior years solutions or reference solution code from github without prior TA approval Office Hours: Located in ANB 104 Times: TBA (will be announced before first set is out) Extensions Ask a TA for one if you have a valid reason

Topic of your choice Teams of up to 2 people Requirements Projects We will also provide many options Teams of up to 2 people 2-person teams will be held to higher expectations Requirements Project Proposal Progress report(s) and Final Presentation More info later…

Primary GPU machines available Currently being setup. You will receive a user account after emailing cs179.ta@gmail.com Titan: titan.cms.caltech.edu (SSH and Mosh available) Haru: haru.cms.caltech.edu Maki: maki.caltech.edu Secondary machines mx.cms.caltech.edu minuteman.cms.caltech.edu These use your CMS login NOTE: Not all assignments work on these machines Change your password from the temp one we send you Use passwd command

Alternative: Use your own machine: Machines Alternative: Use your own machine: Must have an NVIDIA CUDA-capable GPU Virtual machines won’t work Exception: Machines with I/O MMU virtualization and certain GPUs Special requirements for: Hybrid/optimus systems Mac/OS X Setup guide on the website is outdated. Do not follow 2016 instructions

The “Central Processing Unit” The CPU The “Central Processing Unit” Traditionally, applications use CPU for primary calculations General-purpose capabilities Established technology Usually equipped with 8 or less powerful cores Optimal for concurrent processes but not large scale parallel computations Wikimedia commons: Intel_CPU_Pentium_4_640_Prescott_bottom.jpg

The GPU The "Graphics Processing Unit" Relatively new technology designed for parallelizable problems Initially created specifically for graphics Became more capable of general computations

GPUs – The Motivation for all pixels (i,j): Raytracing: for all pixels (i,j): Calculate ray point and direction in 3d space if ray intersects object: calculate lighting at closest object store color of (i,j) Superquadric Cylinders, exponent 0.1, yellow glass balls, Barr, 1981

EXAMPLE Add two arrays On the CPU: A[ ] + B[ ] -> C[ ] float *C = malloc(N * sizeof(float)); for (int i = 0; i < N; i++) C[i] = A[i] + B[i]; return C; This operates sequentially… can we do better?

A simple problem… On the CPU (multi-threaded, pseudocode): (allocate memory for C) Create # of threads equal to number of cores on processor (around 2, 4, perhaps 8) (Indicate portions of A, B, C to each thread...) ... In each thread, For (i from beginning region of thread) C[i] <- A[i] + B[i] //lots of waiting involved for memory reads, writes, ... Wait for threads to synchronize... This is slightly faster – 2-8x (slightly more with other tricks)

A simple problem… How many threads? How does performance scale? Context switching: The action of switching which thread is being processed High penalty on the CPU Not an issue on the GPU

A simple problem… On the GPU: (allocate memory for A, B, C on GPU) Create the “kernel” – each thread will perform one (or a few) additions Specify the following kernel operation: For all i‘s (indices) assigned to this thread: C[i] <- A[i] + B[i] Start ~20000 (!) threads Wait for threads to synchronize...

GPU: Strengths Revealed Emphasis on parallelism means we have lots of cores This allows us to run many threads simultaneously with no context switches

GPU Computing: Step by Step Setup inputs on the host (CPU-accessible memory) Allocate memory for outputs on the host Allocate memory for inputs on the GPU Allocate memory for outputs on the GPU Copy inputs from host to GPU Start GPU kernel (function that executed on gpu) Copy output from GPU to host NOTE: Copying can be asynchronous, and unified memory management is available

Our “parallel” function Given to each thread Simple implementation: The Kernel Our “parallel” function Given to each thread Simple implementation:

Indexing Can get a block ID and thread ID within the block: Unique thread ID! https://cs.calvin.edu/courses/cs/374/CUDA/CUDA-Thread-Indexing-Cheatsheet.pdf https://en.wikipedia.org/wiki/Thread_block

Calling the Kernel

Calling the Kernel (2)

Questions?

GPUs – Brief History Initially based on graphics focused fixed-function pipelines Pre-set functions, limited options http://gamedevelopment.tutsplus.com/articles/the-end-of-fixed-function-rendering-pipelines-and-how-to-move-on--cms-21469 Source: Super Mario 64, by Nintendo

GPUs – Brief History Shaders Could implement one’s own functions! GLSL (C-like language) Could “sneak in” general-purpose programming! Vulkan/OpenCL is the modern multiplatform general purpose GPU compute system, but we won’t be covering it in this course http://minecraftsix.com/glsl-shaders-mod/

GPUs – Brief History “General-purpose computing on GPUs” (GPGPU) Hardware has gotten good enough to a point where it’s basically having a mini-supercomputer CUDA (Compute Unified Device Architecture) General-purpose parallel computing platform for NVIDIA GPUs Vulkan/OpenCL (Open Computing Language) General heterogenous computing framework Both are accessible as extensions to various languages If you’re into python, checkout Theano, pyCUDA. Supercomputers are tons of processing cores sharing the workload of a parallelized task. GPUs are in fact often at the core of most modern supercomputers