© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2011 ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel.

Slides:



Advertisements
Similar presentations
ECE 598HK Computational Thinking for Many-core Computing Lecture 2: Many-core GPU Performance Considerations © Wen-mei W. Hwu and David Kirk/NVIDIA,
Advertisements

Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Intro to GPU’s for Parallel Computing. Goals for Rest of Course Learn how to program massively parallel processors and achieve – high performance – functionality.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
Back-Projection on GPU: Improving the Performance Wenlay “Esther” Wei Advisor: Jeff Fessler Mentor: Yong Long April 29, 2010.
Processor Design 5Z032 Henk Corporaal Eindhoven University of Technology 2011.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Introduction Course Overview and Basic understanding of Computer Architecture.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 7: Threading Hardware in G80.
Programming Massively Parallel Processors Using CUDA & C++AMP Lecture 1 - Introduction Wen-mei Hwu, Izzat El Hajj CEA-EDF-Inria Summer School 2013.
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
GPU Programming and Architecture: Course Overview Patrick Cozzi University of Pennsylvania CIS Spring 2012.
© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 18-22, 2008 VSCSE Summer School 2008 Accelerators for Science and Engineering Applications:
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
GPU computing and CUDA Marko Mišić
©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CS 395 Winter 2014 Lecture 17 Introduction to Accelerator.
1 ECE408/CS483 Applied Parallel Programming Lecture 10: Tiled Convolution Analysis © David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al University.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.
FIGURE 11.1 Mapping between OpenCL and CUDA data parallelism model concepts. KIRK CH:11 “Programming Massively Parallel Processors: A Hands-on Approach.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CMPS 5433 Dr. Ranette Halverson Programming Massively.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Chun-Yuan Lin Introduction to Computer Graphics 2015/11/11 1 Ch00.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
1 Ceng 545 GPU Computing. Grading 2 Midterm Exam: 20% Homeworks: 40% Demo/knowledge: 25% Functionality: 40% Report: 35% Project: 40% Design Document:
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 8: Threading Hardware in G80.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
GPU Programming Shirley Moore CPS 5401 Fall 2013
Chun-Yuan Lin Brief of GPU&CUDA. What is GPU? Graphics Processing Units 2015/12/16 2 GPU.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483/498AL, University of Illinois, Urbana-Champaign 1 ECE 408/CS483 Applied Parallel Programming.
© David Kirk/NVIDIA and Wen-mei W. Hwu, University of Illinois, Urbana-Champaign 1 CS/EE 217 GPU Architecture and Parallel Programming Project.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 15: Basic Parallel Programming Concepts.
Lecture 8 : Manycore GPU Programming with CUDA Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture note.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 GPU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
1 CS/EE 217 GPU Architecture and Parallel Programming Lecture 9: Tiled Convolution Analysis © David Kirk/NVIDIA and Wen-mei W. Hwu,
Graphic Processing Units Presentation by John Manning.
Introduction to Computer Graphics
CS/EE 217 – GPU Architecture and Parallel Programming
ECE408 / CS483 Applied Parallel Programming Lecture 1: Introduction
CS/EE 217 – GPU Architecture and Parallel Programming
Programming Massively Parallel Processors Lecture Slides for Chapter 9: Application Case Study – Electrostatic Potential Calculation © David Kirk/NVIDIA.
Introduction to Heterogeneous Parallel Computing
EE 147 – GPU Computing and Programming
CSE 502: Computer Architecture
Presentation transcript:

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel Programming Lecture 1: Introduction

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 2 Goals Learn how to program massively parallel processors and achieve –high performance –functionality and maintainability –scalability across future generations of such processors Acquire technical knowledge required to achieve the above goals –principles and patterns of parallel algorithms –processor architecture features and constraints –programming API, tools and techniques

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 3 Web Resources Web site: –Handouts and origional lecture slides/recordings –Textbook updates, documentation, software resources

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 4 Textbooks 1.D. Kirk and W. Hwu, “Programming Massively Parallel Processors – A Hands-on Approach,” Morgan Kaufman Publisher, 2010, ISBN NVIDIA, NVidia CUDA C Programming Guide, version 4.0, NVidia, 2011 (reference book)

Refernce: GPU Architecture John L. Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers Inc., fifth edition, 2012, Section 4.4, “Graphics Processing Units”, pp © David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 5

An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and workstation – massive volume and potential impact Performance Advantage of GPUs Courtesy: John Owens ©Wen-mei W. Hwu and David Kirk/NVIDIA, Barcelona, July ,

GPU computing is catching on. 280 submissions to GPU Computing Gems book –110 articles included in two volumes ©Wen-mei W. Hwu and David Kirk/NVIDIA, Barcelona, July 18-22, 2011 Financial Analysis Scientific Simulation Engineering Simulation Data Intensive Analytics Medical Imaging Digital Audio Processing Computer Vision Digital Video Processing Biomedical Informatics Electronic Design Automation Statistical Modeling Ray Tracing Rendering Interactive Physics Numerical Methods 7

©Wen-mei W. Hwu and David Kirk/NVIDIA, Barcelona, July , 2011 DRAM Cache ALU Control ALU DRAM CPU GPU CPUs and GPUs have fundamentally different design philosophies. 8

CPUs: Latency Oriented Design Large caches –Convert long latency memory accesses to short latency cache accesses Sophisticated control –Branch prediction for reduced branch latency –Data forwarding for reduced data latency Powerful ALU –Reduced operation latency ©Wen-mei W. Hwu and David Kirk/NVIDIA, Barcelona, July 18-22, Cache ALU Control ALU DRAM CPU

GPUs: Throughput Oriented Design Small caches –To boost memory throughput Simple control –No branch prediction –No data forwarding Energy efficient ALUs –Many, long latency but heavily pipelined for high throughput Require massive number of threads to tolerate latencies ©Wen-mei W. Hwu and David Kirk/NVIDIA, Barcelona, July 18-22, DRAM GPU

Winning Applications Use Both CPU and GPU CPUs for sequential parts where latency matters –CPUs can be 10+X faster than GPUs for sequential code GPUs for parallel parts where throughput wins –GPUs can be 10+X faster than CPUs for parallel code ©Wen-mei W. Hwu and David Kirk/NVIDIA, Barcelona, July 18-22,

A Common GPU Usage Pattern ©Wen-mei W. Hwu and David Kirk/NVIDIA, Barcelona, July , 2011 A desirable approach considered impractical –Due to excessive computational requirement –But demonstrated to achieve domain benefit –Convolution filtering (e.g. bilateral Gaussian filters), De Novo gene assembly, etc. Use GPUs to accelerate the most time-consuming aspects of the approach –Kernels in CUDA or OpenCL –Refactor host code to better support kernels Rethink the domain problem 12