Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

Slides:

Advertisements

Similar presentations

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters

Advertisements

Sven Woop Computer Graphics Lab Saarland University

Lecture 6: Multicore Systems

Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.

Physically Based Real-time Ray Tracing Ryan Overbeck.

GPU System Architecture Alan Gray EPCC The University of Edinburgh.

Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

GPU Processing for Distributed Live Video Database Jun Ye Data Systems Group.

Programming with CUDA WS 08/09 Lecture 12 Tue, 02 Dec, 2008.

A many-core GPU architecture.. Price, performance, and evolution.

GPU Computing with CUDA as a focus Christie Donovan.

Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

Interactive Visualization of Volumetric Data on Consumer PC Hardware: Introduction Daniel Weiskopf Graphics Hardware Trends Faster development than Moore’s.

Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.

MACHINE VISION GROUP Graphics hardware accelerated panorama builder for mobile phones Miguel Bordallo López*, Jari Hannuksela*, Olli Silvén* and Markku.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.

Computer Graphics Graphics Hardware

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

CIS4930/CDA5125 Parallel and Distributed Systems Florida State University CIS4930/CDA5125: Parallel and Distributed Systems Instructor: Xin Yuan, 168 Love,

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Project Raytracing. Content Goals Idea of Raytracing Ray Casting – Therory – Practice Raytracing – Theory – Light model – Practice Output images Conclusion.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

Gregory Fotiades.  Global illumination techniques are highly desirable for realistic interaction due to their high level of accuracy and photorealism.

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Computer Graphics Using Direct 3D Introduction. 2 What are we doing here? Simply, learning how to make the computer draw.

NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.

Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.

Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.

GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.

May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.

Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.

GPUs – Graphics Processing Units Applications in Graphics Processing and Beyond COSC 3P93 – Parallel ComputingMatt Peskett.

Programming with CUDA WS 08/09 Lecture 1 Tue, 21 Oct, 2008.

Central Processing Unit (CPU)

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.

COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.

3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.

My Coordinates Office EM G.27 contact time:

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Computer Graphics Graphics Hardware

GPU Architecture and Its Application

Photorealistic Rendering vs. Interactive 3D Graphics

Graphics Processing Unit

Lecture 2: Intro to the simd lifestyle and GPU internals

Lecture 5: GPU Compute Architecture

Ray-Cast Rendering in VTK-m

What is Parallel and Distributed computing?

Lecture 5: GPU Compute Architecture for the last time

NVIDIA Fermi Architecture

Chapter 1 Introduction.

Introduction to Operating Systems

6- General Purpose GPU Programming

CSE 502: Computer Architecture

Presentation transcript:

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Organization People Waqar Saleem, Jens Mueller, Room 3335, Ernst-Abbe-Platz 2 The course will be conducted in English 6 points Wahl/Wahlpflicht Theoretical/Practical

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Organization Meetings, before winter break Tue 12-14, CZ 129 Thu 16-18, CZ 129 Every second week Starting next week Exercises: Wed 8-10, CZ 125 Starting tomorrow in the pool

Programming with CUDA, WS09 Waqar Saleem, Jens Müller The course 2 parts Before winter break: Lectures and assignments Need at least 50% in assignments to qualify for... After the break: Group projects Project chosen by or assigned to each group Regular meetings Presentation of each project on semester end

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Assignments Build up a minimal ray tracer on GPU Implement basic ray tracer on CPU Port to GPU Make ray tracer more interesting/efficient Utilize CUDA concepts Basic framework will be provided Scene format and scenes Introduction to ray tracing concepts

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Requirements Strong background in C programming Familiarity with your OS Modifying default settings Writing/understanding Makefiles Compiler flags and options

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Course content Parallel programming models and platforms GPGPU GPGPU on NVIDIA cards: CUDA Architecture and programming model OpenCL

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Today Organization Brief introduction to parallel programming and CUDA Short introduction to Ray tracing

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Growth of Compute Capability Moore’s law: the number of transistors that can be placed... on an integrated circuit [doubles] approximately every two years source: wikipedia

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Growth of Compute Capability Moore’s law source: wikipedia

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Need for increasing compute capability Problems are getting more complex e.g. Text editing to Image editing to Video editing Current hardware complexity is never enough Impractical to stop development at current state of the art

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Barriers to growth Natural limit on transistor size: the size of an atom More transistors per unit area lead to higher power consumption and heat dissipation

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Solution: Parallel architectures

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Parallel architectures Multiple Instructions Multiple Data (MIMD) multi-threaded, multi-core architectures, clusters, grids Single Instruction Multiple Data (SIMD) Cell processor, GPUs, clusters, grids GPU: Graphics Processing Unit Parallel programming allows to program for parallel architectures

Programming with CUDA, WS09 Waqar Saleem, Jens Müller GPU architecture Simpler architecture than MIMD Little overhead for instruction scheduling, branch prediction etc. Subsequent figures from NVIDIA CUDA Programming Guide unless mentioned otherwise

Programming with CUDA, WS09 Waqar Saleem, Jens Müller GPU architecture Simpler architecture leads to higher performance (compared to CPUs)

Programming with CUDA, WS09 Waqar Saleem, Jens Müller General Purpose computing on GPU, GPGPU Attractive because of raw GPU power Traditionally hard because GPU programming was closely associated to graphics Simplicity of GPU architecture limits the kind of problems suitable for GPGPU or at least requires some problems to be reformulated

Programming with CUDA, WS09 Waqar Saleem, Jens Müller GPGPU for the masses* Freeing the GPU from graphics: Nvidia CUDA, ATI Stream C-like programming interface to the GPU * - knowledge of underlying architecture required to achieve peak performance

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Freeing Parallel Programming OpenCL: code once, run anywhere single core, multi core, GPU,... platform details transparent to the user supported by major vendors: Apple, Intel, AMD, Nvidia,... OpenCL drivers made available by ATI and Nvidia for their cards

Programming with CUDA, WS09 Waqar Saleem, Jens Müller This course chiefly CUDA: Nvidia specific, mature, well documented, easily available literature some OpenCL: open standard, very new, limited documentation available, very similar concepts to CUDA no ATI Stream

Programming with CUDA, WS09 Waqar Saleem, Jens Müller CUDA, Compute Unified Device Architecture Software: C like programming interface to the GPU Hardware: the hardware that supports the above programming model

Programming with CUDA, WS09 Waqar Saleem, Jens Müller CUDA hardware model

Programming with CUDA, WS09 Waqar Saleem, Jens Müller CUDA programming model CPU=host, GPU=device, work unit=thread

Programming with CUDA, WS09 Waqar Saleem, Jens Müller

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Ray tracing A method to render a given scene Cast rays from a camera into the scene Compute ray intersections with scene geometry Render pixel image source: wikipedia

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Ray tracer complexity A ray tracer can be arbitrarily complex Recursively compute intersections for reflected, refracted and shadow rays Account for diffuse lighting Consider multiple light sources Consider light sources other than point lights Account for textures: object materials

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Coding a ray tracer Relatively easy to code on the CPU Call the same intersection function recursively on secondary rays CPU code is not so complex Tricky to code on the GPU as recursion is not yet supported in GPGPU models

Programming with CUDA, WS09 Waqar Saleem, Jens Müller This course Build a trivial ray tracer on the CPU compute view rays only part of tomorrow’s exercise Port to GPU Add complexity to your GPU ray tracer

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Reminders Exercise session tomorrow Register on CAJ

Programming with CUDA, WS09 Waqar Saleem, Jens Müller See you next time!