Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture.

Slides:



Advertisements
Similar presentations
Graphics on a Stream Processor
Advertisements

COMPUTER GRAPHICS SOFTWARE.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Graphics Pipeline.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Real-Time Rendering TEXTURING Lecture 02 Marina Gavrilova.
9/25/2001CS 638, Fall 2001 Today Shadow Volume Algorithms Vertex and Pixel Shaders.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
CS5500 Computer Graphics © Chun-Fa Chang, Spring 2007 CS5500 Computer Graphics April 19, 2007.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
Oct 2, 2001 SSS: 1 Stanford Streaming Supercomputer (SSS) Project Meeting Bill Dally, Pat Hanrahan, and Ron Fedkiw Computer Systems Laboratory Stanford.
Technische universiteit eindhoven ‘Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.’ Jorge Luis Borges (Argentine.
Technische universiteit eindhoven ‘Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.’ Jorge Luis Borges (Argentine.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
The Imagine Stream Processor Flexibility with Performance March 30, 2001 William J. Dally Computer Systems Laboratory Stanford University
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
1 Angel: Interactive Computer Graphics 4E © Addison-Wesley 2005 Introduction to Computer Graphics Ed Angel Professor of Computer Science, Electrical and.
Parallel Rendering Ed Angel
The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU Mauricio Breternitz Jr, Herbert Hum, Sanjeev.
1Hot Chips 2000Imagine IMAGINE: Signal and Image Processing Using Streams William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong,
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Parallel Graphics Rendering Matthew Campbell Senior, Computer Science
GPU Tutorial 이윤진 Computer Game 2007 가을 2007 년 11 월 다섯째 주, 12 월 첫째 주.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
COOL Chips IV A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-Hee Choi, Won-Jong.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
CSE 690: GPGPU Lecture 4: Stream Processing Klaus Mueller Computer Science, Stony Brook University.
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
CHAPTER 4 Window Creation and Control © 2008 Cengage Learning EMEA.
Enhancing GPU for Scientific Computing Some thoughts.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Over View of the GPU Architecture CS7080 Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad &
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
1 Dr. Scott Schaefer Programmable Shaders. 2/30 Graphics Cards Performance Nvidia Geforce 6800 GTX 1  6.4 billion pixels/sec Nvidia Geforce 7900 GTX.
OpenGL Conclusions OpenGL Programming and Reference Guides, other sources CSCI 6360/4360.
1 Introduction to Computer Graphics SEN Introduction to OpenGL Graphics Applications.
Computer Graphics I, Fall 2008 Introduction to Computer Graphics.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
A Reconfigurable Architecture for Load-Balanced Rendering Graphics Hardware July 31, 2005, Los Angeles, CA Jiawen Chen Michael I. Gordon William Thies.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Subject Name: Computer Graphics Subject Code: Textbook: “Computer Graphics”, C Version By Hearn and Baker Credits: 6 1.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
The Imagine Stream Processor Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany Presenter: Lu Hao.
Graphics Pipeline Bringing it all together. Implementation The goal of computer graphics is to take the data out of computer memory and put it up on the.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
- Introduction - Graphics Pipeline
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
CS427 Multicore Architecture and Parallel Computing
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
The Graphics Rendering Pipeline
Stream Architecture: Rethinking Media Processor Design
Graphics Processing Unit
Lecture 13 Clipping & Scan Conversion
RADEON™ 9700 Architecture and 3D Performance
Computer Graphics Introduction to Shaders
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture Group Computer Systems Laboratory Stanford University

Today’s Best Hardware Commercial hardware:  Fast  Cheap  Ubiquitous Flexibility limited OpenGL scenes: Programmable streams deliver comparable performance. Frame from Quake 3 Arena, © id Software

Today’s Best Software Today’s software solutions:  Powerful and flexible  Slow OpenGL scenes: Streams deliver 20x performance. Frame from A Bug’s Life, © Pixar Animation Studios, 1998

The Vision + Performance of a special-purpose processor Programmability of a general-purpose processor “Real-Time Renderman”

Outline What is stream processing? The Imagine architecture Polygon rendering on a stream architecture Results Conclusions

Kernels and Streams A stream is a set of elements of an arbitrary datatype. A computational kernel operates on streams. Kernel Streams Transform

Stream Processing All data is streams! 2 levels of programming:  Stream-level code  Kernel-level code Transform Shader Z Buffer Zcompare Color Buffer z, color z z, color offset

Media Apps and Streams Producer-consumer locality High arithmetic requirements Homogeneous computation  Efficient control  Data parallelism … poor match for microprocessors Transform Shader Z Buffer Zcompare Color Buffer z, color z z, color offset

The Imagine Architecture

Bandwidth Hierarchy 4GB/s32GB/s SDRAM Stream Register File ALU Cluster 544GB/s ALU Cluster SIMD/VLIW Control Peak BW:

Cluster Organization

Imagine Stats & Status 0.59 cm 2 CMOS chip  500 MHz Circuits/Logic: expected completion 9/15/00 Tapeout: expected Q4/2000  Fab: TI GS30KA process (0.15  m drawn)

Polygon Rendering Outline Overview of OpenGL pipeline How we map OpenGL into streams & kernels How stream operations are sequenced How kernels are mapped onto Imagine  Use of stream recirculation  Detail of 3 steps in the pipeline:  Matrix transformation  Scan conversion  Enforcing ordering in composite stage

OpenGL Pipeline Overview Application Geometry Rasterization Image Composite OpenGL: Has state Requires immediate mode Respects ordering

Pipeline Detail Transform GLShader Primitive Assembly Cull Project Geometry Spanprep Spangen Spanrast Texture Lookup Rasterization Hash Z Lookup Zcompare Compact Color, Z Write Composite Image Input Data Sort / Merge

Pipeline Stream Datatypes Transform GLShader Primitive Assembly Cull Project Geometry Spanprep Spangen Spanrast Texture Lookup Rasterization Hash Z Lookup Zcompare Compact Color, Z Write Composite Image Sort / Merge vertices trianglesspansfragments offsets depths Most data is floating point.

Stream Recirculation Transform Memory SRFClusters Shader Z Buffer Zcompare Color Buffer z, color z z, color offset Strip-mining Memory accesses:  Initial load of vertices  Lookup of color/z/texture  Writeback of color/z All other data accesses are local to the SRF

Stream and Kernel Flow xform project assemble rasterize zcompare Z load Z store Color store Texture load Vertex load for next batch xform CLUSTERS MEM STR 0MEM STR 1 Excerpt from ADVS-1 run

Mapping Xform to Imagine RAM SRF Cluster Transform Memory SRFClusters

SRF Cluster Mapping Spanrast to Imagine Spanrast Memory SRFClusters

Enforcing ordering General sort possible  But too expensive Hash much cheaper!  Hash function: 12 bits  Low 6 bits of x, low 6 bits of y  Hash table: 2 12 entries  2 bits/entry  16 words/scratchpad/ cluster Compact: Enforces ordering constraint Compact Sort Hash Merge Zcompare

Image Composition RAM SRF Cluster Memory SRFClusters Z Buffer Zcompare Offset, z, color z z, color offset Color Buffer

Benchmarks ADVS-1: 62k vertices as point-sampled polygons (SPECviewperf Advanced Visualizer) ADVS-8: mipmapped version of ADVS-1 Sphere: 82k lit, Gouraud- shaded triangles; 3 positional lights Fill: 20k mipmapped 25- pixel triangles ADVS Sphere

Experimental setup Comparison systems:  Microsoft opengl32.dll (sustained)  NVIDIA Quadro (sustained)  NVIDIA Quadro (peak) Test system: 450 MHz PIII Xeon, NT 4.0 For comparison:  Low overhead trace player (no appn. overhead)  Average over 100s of frames (no startup costs)  Disabled vsync

Results Summary

Stream-level Performance Computation, not memory, bound  Highest memory system occupancy: 58.7% Cluster occupancy: 94.3% %  Reuse 5.6 GOPS on Sphere CLUSTERS MEM STR 0MEM STR 1

Imagine Kernel Breakdown Majority of time is in rasterization  ADVS-8 has 2.5x ops/frame than ADVS-1 ADVS-8

Future Directions Extend generality of OpenGL pipeline  Add more complex scenes Programmable shading and lighting  Straightforward to add per-vertex/per-fragment ops  Eliminate multipass  Goal: “Toolbox” of flexible elements Non-polygon rendering: raytracing, IBR, … Scalability: multi-Imagine implementations

Conclusions Streams: Powerful primitive Stream architectures: Enable high performance Flexibility of general-purpose processor  20x better frame rates than commercial software Performance of special-purpose processor  Comparable frame rates to commercial hardware

Acknowledgements DARPA Industrial sponsors  Texas Instruments  Intel Corporation Matthew Eldridge and Kekoa Proudfoot Brian Towles and Brucek Khailany Anonymous reviewers for helpful comments The US Passport Office  same-day turnaround!