Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel.

Slides:



Advertisements
Similar presentations
A Real Time Radiosity Architecture for Video Games
Advertisements

Exploration of advanced lighting and shading techniques
Lecture 1: Introduction
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
GPU and PC System Architecture UC Santa Cruz BSoE – March 2009 John Tynefield / NVIDIA Corporation.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 6: Multicore Systems
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Chimera: Collaborative Preemption for Multitasking on a Shared GPU
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Dynamic Global Illumination from many Lights GDC 2012 by Wolfgang Engel, Igor Lobanchikov and Timothy Martin.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
IMGD 4000: Computer Graphics in Games Emmanuel Agu.
A many-core GPU architecture.. Price, performance, and evolution.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Tools for Investigating Graphics System Performance
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
GPGPU platforms GP - General Purpose computation using GPU
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
CSU0021 Computer Graphics © Chun-Fa Chang CSU0021 Computer Graphics September 10, 2014.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
Havok. ©Copyright 2006 Havok.com (or its licensors). All Rights Reserved. HavokFX Next Gen Physics on ATI GPUs Andrew Bowell – Senior Engineer Peter Kipfer.
Computer Graphics Graphics Hardware
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
09/09/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Event management Lag Group assignment has happened, like it or not.
Cg Programming Mapping Computational Concepts to GPUs.
Diane Marinkas CDA 6938 April 30, Outline Motivation Algorithm CPU Implementation GPU Implementation Performance Lessons Learned Future Work.
Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.
GPU Architecture and Programming
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
GPUs – Graphics Processing Units Applications in Graphics Processing and Beyond COSC 3P93 – Parallel ComputingMatt Peskett.
David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
My Coordinates Office EM G.27 contact time:
Postmortem: Deferred Shading in Tabula Rasa Rusty Koonce NCsoft September 15, 2008.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.
Computer Graphics Graphics Hardware
Introduction to CUDA Li Sung-Chi Taiwan Evolutionary Intelligence Laboratory 2016/12/14 Group Meeting Presentation.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
GPU Architecture and Its Application
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Chapter 1 An overview on Computer Graphics
CS 179: GPU Programming Lecture 1: Introduction 1
From Turing Machine to Global Illumination
CS 179: GPU Programming Lecture 1: Introduction 1
NVIDIA Fermi Architecture
Computer Graphics Graphics Hardware
Current Research in VR Display Technology
Graphics Processing Unit
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel

What this talk is about

Plan for the talk Who we are and what we do Why its a GPGPU problem Technical details and results Videos

Who we are and what we do Geomerics is a middleware company We licence our technology to game developers Our first product is Enlighten, a real-time radiosity solution

Real-time radiosity pipeline Goal: compute light bounces in real time Scene geometry is fixed Light sources can move Small objects can move and are lit correctly but dont bounce light

Real-time radiosity pipeline 1. Sample the direct lighting

Real-time radiosity pipeline 2. Compute one bounce

Real-time radiosity pipeline 3. Resample the bounce as input and repeat

A very brief history of console hardware...?

Game console observations Custom hardware is the norm GPUs yield more FLOPS / $ PC games already use GPGPU

Why is this a GPGPU problem? Fastest platform available today Future-proof?

Technical details How CUDA and DirectX share the GPU Optimisations Performance

How to share memory resources DirectX device must be specified when creating CUDA context DirectX resources can be registered for use by CUDA cu[da]GraphicsD3D9RegisterResource cu[da]GraphicsD3D10RegisterResource cu[da]GraphicsD3D11RegisterResource

How to share memory resources Graphics resources can be mapped and unmapped using cu[da]GraphicsMapResources cu[da]GraphicsUnmapResources These are analogous to the DirectX calls device->Lock device->Unlock

How to share computational resources CUDA and rendering cant run simultaneously Context switch between DirectX and CUDA is expensive

Timeshare the GPU Frame startFrame end time Render shadow maps context switch Compute radiosity using CUDA context switch Render scene with radiosity results

Advantages of running on the GPU Read from graphics resources directly Write graphics resources directly Zero latency

Disadvantage of running on the GPU Taking time away from rendering

Optimisations Light samples and sum in the same kernel

Compute Capability Confession The input lighting kernel is too big and spills registers to global memory This kills performance on pre-Fermi cards

Optimisations Radiosity solver: usual CUDA techniques Pack threads into blocks Reorder precomputed data Use shared memory

Compute Capability Confession #2 Radiosity solver has to gather large amount of lighting data from memory L1/L2 cache key to performance Pre-Fermi cards only have texture cache

Results Test scene: Arches with high output resolution (~91,000 pixels)

Results CPU Intel Core i7 Hand-optimised vector intrinsics Single thread: 33ms Whole CPU (8 cores): 4.1ms Doesnt measure CPU-GPU copies GPU NVIDIA GTX ms (+ 0.5ms CPU) Measures all of the required work

Videos Dockyard demo Current research

Thank you for listening! Questions?