Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.

Slides:



Advertisements
Similar presentations
Sven Woop Computer Graphics Lab Saarland University
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Graphics Pipeline.
Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
GPUs and GPU Programming Bharadwaj Subramanian, Apollo Ellis Imagery taken from Nvidia Dawn Demo Slide on GPUs, CUDA and Programming Models by Apollo Ellis.
High-Performance Software
GCAFE 28 Feb Real-time REYES Jeremy Sugerman.
Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
Damon Rocco.  Tessellation: The filling of a plane with polygons such that there is no overlap or gap.  In computer graphics objects are rendered as.
Real-Time Reyes-Style Adaptive Surface Subdivision
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.
GRAMPS Overview and Design Decisions Jeremy Sugerman February 26, 2009 GCafe.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
TEMPLATE DESIGN © Sum() is now a Shader stage: An N:1 shader and a graph cycle reduce in place, in parallel. 'Barrier'
GRAMPS: A Programming Model For Graphics Pipelines Jeremy Sugerman, Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan.
GRAMPS: A Programming Model for Graphics Pipelines and Heterogeneous Parallelism Jeremy Sugerman March 5, 2009 EEC277.
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.
Pixel Shader Vertex Shader The Real-time Graphics Pipeline Input Assembler Rasterizer Output Merger.
GRAMPS Beyond Rendering Jeremy Sugerman 11 December 2009 PPL Retreat.
Anjul Patney University of California, Davis Real-Time Reyes Programmable Pipelines and Research Challenges.
Hybrid PC architecture Jeremy Sugerman Kayvon Fatahalian.
Many-Core Programming with GRAMPS Jeremy Sugerman Stanford PPL Retreat November 21, 2008.
Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.
Doing More With GRAMPS Jeremy Sugerman 10 December 2009 GCafe.
Further Developing GRAMPS Jeremy Sugerman FLASHG January 27, 2009.
FLASHG 15 Oct Graphics on GRAMPS Jeremy Sugerman Kayvon Fatahalian.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
CHAPTER 4 Window Creation and Control © 2008 Cengage Learning EMEA.
Computer graphics & visualization REYES Render Everything Your Eyes Ever Saw.
Interactive Rendering of Meso-structure Surface Details using Semi-transparent 3D Textures Vision, Modeling, Visualization Erlangen, Germany November 16-18,
Cg Programming Mapping Computational Concepts to GPUs.
Matrices from HELL Paul Taylor Basic Required Matrices PROJECTION WORLD VIEW.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Piko: A Framework for Authoring Programmable Graphics Pipelines Anjul Patney and Stanley Tzeng UC Davis and NVIDIA Kerry A. Seitz, Jr. and John D. Owens.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward.
1 by: Ilya Melamed Supervised by: Eyal Sarfati High Speed Digital Systems Lab.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
GPU Based Sound Simulation and Visualization Torbjorn Loken, Torbjorn Loken, Sergiu M. Dascalu, and Frederick C Harris, Jr. Department of Computer Science.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
1 Saarland University, Germany 2 DFKI Saarbrücken, Germany.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA.
Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more.
Veysi ISLER, Department of Computer Engineering, Middle East Technical University, Ankara, TURKEY Spring
Siggraph 2009 RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo JAEHYUN CHO.
Graphics Pipeline Bringing it all together. Implementation The goal of computer graphics is to take the data out of computer memory and put it up on the.
GPU Architecture and Its Application
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Chapter 1 An overview on Computer Graphics
Chapter 1 An overview on Computer Graphics
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
Graphics Processing Unit
From Turing Machine to Global Illumination
The Graphics Rendering Pipeline
Graphics Processing Unit
Chapter XVIII Surface Tessellation
Ray Tracing on Programmable Graphics Hardware
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008

2 Background, Outline  Stanford Graphics / Architecture Research  CPU, GPU trends  And collision?  Two research areas: –HW/SW Interface, Programming Model –Future Graphics API

3 Problem Statement  Drive efficient development and execution in many- /multi-core systems.  Support homogeneous, heterogeneous cores.  Inform future hardware Status Quo:  GPU Pipeline (Good for GL, otherwise hard)  CPU (No guidance, fast is hard)

4  Software defined graphs  Producer-consumer, data-parallelism  Initial focus on rendering GRAMPS Input Fragment Queue Output Fragment Queue Rasterization Pipeline Ray Tracing Pipeline = Thread Stage = Shader Stage = Fixed-func Stage = Queue = Stage Output Frame Buffer Ray Queue Ray Hit Queue Fragment Queue CameraIntersect Shade FB Blend Frame Buffer Shade FB Blend Rasterize

5 As a GPU Evolution  Not (too) radical for ‘graphics’  Like fixed → programmable shading –Pipeline undergoing massive shake up –Diversity of new parameters and use cases  Bigger picture than ‘graphics’ –Rendering is more than GL/D3D –Compute is more than rendering –Larrabee has no innate pipeline

6 As a Compute Evolution  Sounds like streaming: Execution graphs, kernels, data-parallelism  Streaming: “squeeze out every FLOP” –Goals: bulk transfer, arithmetic intensity –Intensive static analysis, custom chips (mostly) –Bounded space, data access, execution time  GRAMPS: “interesting apps are irregular” –Goals: Dynamic, data-dependent code –Aggregate work at run-time –Heterogeneous commodity platforms –Naturally supports streaming when applicable

7 GRAMPS’ Role  A ‘graphics pipeline’ is now an app!  GRAMPS models parallel state machines.  Compared to status quo: –More flexible than a GPU pipeline –More guidance than bare metal –Portability in between –Not domain specific

8 GRAMPS Interfaces  Host/Setup: Create execution graph  Thread: Stateful, singleton  Shader: Data-parallel, auto-instanced

9 What We’ve Built (System)

10 GRAMPS Scheduler  Tiered Scheduler  ‘Fat’ cores: per-thread, per-core  ‘Micro’ cores: shared hw scheduler  Top level: tier N

11 What We’ve Built (Apps) Direct3D Pipeline (with Ray-tracing Extension) Ray-tracing Pipeline IA 1 VS 1 RO Rast Trace IA N VS N PS Frame Buffer Vertex Buffers Sample Queue Set Ray Queue Primitive Queue Input Vertex Queue 1 Primitive Queue 1 Input Vertex Queue N … … OM PS2 Fragment Queue Ray Hit Queue Ray-tracing Extension Primitive Queue N Tiler Shade FB Blend Frame Buffer Sample Queue Tile Queue Ray Queue Ray Hit Queue Fragment Queue Camera Sampler Intersect = Thread Stage = Shader Stage = Fixed-func = Queue = Stage Output = Push Output

12 Initial Results  Queues are small, utilization is good

13 GRAMPS Visualization

14 GRAMPS Visualization

15 GRAMPS Portability  Portability really means performance.  Less portable than GL/D3D –GRAMPS graph is hardware sensitive  More portable than bare metal –Enforces modularity –Best case, just works –Worst case, saves boilerplate

16 High-level Challenges  Is GRAMPS a suitable GPU evolution? –Enable pipeline competitive with bare metal? –Enable innovation: advanced / alternative methods?  Is GRAMPS a good parallel compute model? –Map well to hardware, hardware trends? –Support important apps? –Concepts influence developers?

17 What’s Next for GRAMPS?  Implementation: scheduling, simulation details  Model: Graph modification (state change) Blocking calls (join) Intra/inter-stage synchronization primitives Data sharing / ref-counting  Workloads: REYES, physics, others?  Develop new graphics pipelines…

“Real-Time REYES” 18

19 Just Build It Build a real-time REYES pipeline... … that is tightly integrated with ray tracing for global effects.

20 What does real-time REYES mean? (to us)  Smooth surfaces via adaptive tessellation –Everything is a displaced subdivision surface  Shade on surface, prior to rasterization  Stochastic rasterization for motion blur and DOF  Order-independent transparency

21 Split Dice Shade Rasterize Z Test Blend/Resolve Displace Early Z Tessellate (xbox) Early Z Frag Shade Z Test Blend/Resolve Vertex Shade Rasterize REYES OpenGL/Direct3D

22 Split primitive into smaller primitives until a “GOOD” grid can be created. REYES Tessellation

23

24

25

26 Grids GOOD GRID = - Max polygon area < 1 pixel - All polys about the same size - Bounded # polys per grid Regular parametric sampling of primitive surface (like XBox360). Compact representation for many adjacent polygons. Grids provide SIMD efficiency and bulk processing benefits.

27 Split Dice Shade Rast/Crack Fix Z Test Blend/Resolve Displace Early Z Tessellate (xbox) Early Z Frag Shade Z Test Blend/Resolve Vertex Shade Rast REYESOpenGL/Direct3D

28 What does real-time REYES mean? (to us)  Smooth surfaces via adaptive tessellation –Splitting is irregular (and serial) –Crack fixing  Shade on surface, prior to rasterization –We feel confident about this –But most “work” done before moving to raster space… hmm  Stochastic rasterization for motion blur and DOF –Many tiny polygons  parallel rasterization –SIMD tricky  Order-independent transparency –Not unique to REYES

29 Shading in a Hybrid System  Evaluate displacement (due to REYES or on demand for ray tracing)  Shade grids  Shade ray hits  Looking forward… shade quads too? One shading system or two or three?

This Project is Really About  Re-architecting REYES pipeline for real-time performance (for throughput architectures like LRB)  Hybrid rendering: study interoperability of advanced techniques (REYES + ray tracing + maybe Direct3D ) –Hybrid shading system –Understand workload balance  Hybrid pipeline interface: real-time, retained mode  Pursuit of more flexible, advanced graphics pipelines

31 Questions?