Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.

Slides:



Advertisements
Similar presentations
Graphics on a Stream Processor
Advertisements

COMPUTER GRAPHICS SOFTWARE.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
GCAFE 28 Feb Real-time REYES Jeremy Sugerman.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
Status – Week 277 Victor Moya.
Anjul Patney University of California, Davis Real-Time Reyes Programmable Pipelines and Research Challenges.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.
1 Angel: Interactive Computer Graphics 4E © Addison-Wesley 2005 Models and Architectures Ed Angel Professor of Computer Science, Electrical and Computer.
Shading Languages By Markus Kummerer. Markus Kummerer 2 / 19 State of the Art Shading.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Computer Graphics: Programming, Problem Solving, and Visual Communication Steve Cunningham California State University Stanislaus and Grinnell College.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
Computer graphics & visualization REYES Render Everything Your Eyes Ever Saw.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
09/09/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Event management Lag Group assignment has happened, like it or not.
Surface displacement, tessellation, and subdivision Ikrima Elhassan.
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
Graphics Systems and OpenGL. Business of Generating Images Images are made up of pixels.
CSC 461: Lecture 3 1 CSC461 Lecture 3: Models and Architectures  Objectives –Learn the basic design of a graphics system –Introduce pipeline architecture.
OpenGL Conclusions OpenGL Programming and Reference Guides, other sources CSCI 6360/4360.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
1Computer Graphics Lecture 4 - Models and Architectures John Shearer Culture Lab – space 2
COMPUTER GRAPHICS CSCI 375. What do I need to know?  Familiarity with  Trigonometry  Analytic geometry  Linear algebra  Data structures  OOP.
Shadow Mapping Chun-Fa Chang National Taiwan Normal University.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture.
Introduction to OpenGL  OpenGL is a graphics API  Software library  Layer between programmer and graphics hardware (and software)  OpenGL can fit in.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
1 Angel: Interactive Computer Graphics5E © Addison- Wesley 2009 Image Formation Fundamental imaging notions Fundamental imaging notions Physical basis.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 GPU.
1cs426-winter-2008 Notes. 2 Atop operation  Image 1 “atop” image 2  Assume independence of sub-pixel structure So for each final pixel, a fraction alpha.
1 E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley 2012 Models and Architectures 靜宜大學 資訊工程系 蔡奇偉 副教授 2012.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
Siggraph 2009 RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo JAEHYUN CHO.
GPU Architecture and Its Application
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
CS427 Multicore Architecture and Parallel Computing
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
From Turing Machine to Global Illumination
The Graphics Rendering Pipeline
CS451Real-time Rendering Pipeline
Understanding Theory and application of 3D
Models and Architectures
Models and Architectures
Models and Architectures
Introduction to Computer Graphics with WebGL
Graphics Processing Unit
Models and Architectures
Models and Architectures
RADEON™ 9700 Architecture and 3D Performance
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department of Computer Science, University of Virginia

The Stream Programming Model Programmable Kernel Stream 4 data Stream 3 data Stream 2 data Stream 1 data  The Main Idea

The Stream Programming Model Programmable Kernel Stream 4 data Stream 3 data Stream 2 data Stream 1 transformed data  The Main Idea

The Stream Programming Model Programmable Kernel Stream 4 data Stream 3 data Stream 2 data Stream 1 transformed data  The Main Idea

The Stream Programming Model Programmable Kernel Stream 4 data Stream 3 data Stream 2 data Stream 1 transformed data  The Main Idea

The Stream Programming Model Programmable Kernel Stream 4 data Stream 3 data Stream 2 data Stream 1 transformed data  The Main Idea

The Stream Programming Model Transform  Chaining Kernels  Example: The Geometry Stage of the OpenGL Pipeline Input Vertexes ShadeAssemble CullProject Toward Rasterization Stage

The Stream Programming Model  Hardware Implementation: the Imagine Stream Processor Communicate with host and issue operations.

The Stream Programming Model  Hardware Implementation: the Imagine Stream Processor Transfer data between parts of the chip.

The Stream Programming Model  Hardware Implementation: the Imagine Stream Processor Local storage and reuse of intermediate streams.

The Stream Programming Model  Hardware Implementation: the Imagine Stream Processor Store kernel code.

The Stream Programming Model  Hardware Implementation: the Imagine Stream Processor Execute one kernel at a time.

The Stream Programming Model  Hardware Implementation: the Imagine Stream Processor Connection with other Imagine chips.

The Stream Programming Model Programmable Kernel Stream 5 data type 1  Homogeneous Data Type for Efficiency Stream 6 data type 2 Code: if (data type== data type 1) {...} if (data type==data type 2) {...}

The Stream Programming Model Programmable Kernel Stream 5 data type 1 Stream 6 data type 2 Code: if (data type== data type 1) {...} if (data type==data type 2) {...}  Homogeneous Data Type for Efficiency

The Stream Programming Model Programmable Kernel 1 Stream 5 data type 1 Stream 6 data type 2 Programmable Kernel 2  Homogeneous Data Type for Efficiency Stream 5 data type 1 Stream 5 data type 1 Stream 7 data type 1 DATASORTDATASORT

Advantages of a Stream Processor Programmability Efficient Shading Example: OpenGL Inefficiency

Advantages of a Stream Processor Programmability Efficient Shading Example: OpenGL Inefficiency 1. Draw the plane.

Advantages of a Stream Processor Programmability Efficient Shading Example: OpenGL Inefficiency 1. Draw the plane. 2. Draw the cube.

Advantages of a Stream Processor Programmability Efficient Shading Example: OpenGL Inefficiency 1. Draw the plane. 2. Draw the cube. 3. Redraw the cube.

Advantages of a Stream Processor Programmability Efficient Shading Example: OpenGL Inefficiency 1. Draw the plane. 2. Draw the cube. 3. Redraw the cube. Redraw the complete scene to obtain correct shadow on one object.

Advantages of a Stream Processor Programmability Efficient Shading Hardware Implementation of New API API Example: Pixar’s Renderman (Reyes Image Rendering Architecture)

Advantages of a Stream Processor Producer - Consumer Locality Capture Example: OpenGL Pipeline Inefficiency Geometry Stage Rasterization Stage Composite Stage Vertexes

Advantages of a Stream Processor Producer - Consumer Locality Capture Example: OpenGL Pipeline Inefficiency Geometry Stage Rasterization Stage Composite Stage Vertexes Assembled Triangles Fragments Pixels

Advantages of a Stream Processor Producer - Consumer Locality Capture Example: OpenGL Pipeline Inefficiency Geometry Stall Rasterization Stage Composite Stage Vertexes Assembled Triangles Fragments Pixels

Advantages of a Stream Processor Producer - Consumer Locality Capture Example: OpenGL Stream Inplementation Vertex Streams Fragment Streams Pixel Streams Rasterization Kernels Composite Kernels Geometry Kernels Triangle Streams

Advantages of a Stream Processor Producer - Consumer Locality Capture Example: OpenGL Stream Inplementation Vertex Streams Fragment Streams Pixel Streams Rasterization Kernels Composite Kernels Geometry Kernels Triangle Streams

Advantages of a Stream Processor Flexible Resource Allocation Example: OpenGL Pipeline Inefficiency Geometry Stage Rasterization Stall Composite Stall Vertexes Waste of hardware capacity.

Advantages of a Stream Processor Flexible Resource Allocation Example: OpenGL Stream Implementation Vertex Streams Rasterization Kernels Composite Kernels Geometry Kernels No waste: kernels are pieces of code running on the same hardware!

Advantages of a Stream Processor Pipeline Reordering Example: Blending off in the OpenGL Pipeline Part of Rasterization - Composite Stage Texture Kernel Blending Kernel Depth Kernel Fragments

Advantages of a Stream Processor Pipeline Reordering Example: Blending off in the OpenGL Pipeline Part of Rasterization - Composite Stage Texture Kernel Blending Kernel Depth Kernel Fragments Many fragments are needlessly textured

Advantages of a Stream Processor Pipeline Reordering Example: Blending off in the OpenGL Pipeline Part of the Rasterization/Composite Stage Texture Kernel Depth Kernel Fragments We can reorder the pipeline.

Advantages of a Stream Processor Obvious Scalability Data Level Parallelism Texture Kernel Texture Kernel Texture Kernel Fragments

Advantages of a Stream Processor Obvious Scalability Functional Parallelism Texture Kernel Blending Kernel Depth Kernel

Imagine’s Performance That looks great!

Imagine’s Performance “Interaction between host processor and graphics subsystem not modeled” in Imagine. “Many hardware-accelerated systems are limited by the bus between the processor and the graphics subsystem”.

Imagine’s Performance “Imagine clocks rate is also significantly higher (500MHz vs. 120 MHz)”.

Imagine’s Performance

But the comparison is still “instructive”. “Running our tests on commercial systems gives a sens of relative complexity”. Frame Rate Normalized to the Sphere Test NVIDIA Quadro and Imagine Relative Performance

Conclusions on Imagine Performance Year 2000 “Implementing polygon rendering on a stream processor allows performance approaching that of special-purpose graphics hardware while at the same time providing the flexibility traditionally associated with a software-only implementation”

Conclusions on Imagine Performance Year 2000 “Implementing polygon rendering on a stream processor allows performance approaching that of special-purpose graphics hardware while at the same time providing the flexibility traditionally associated with a software-only implementation”

Conclusions on Imagine Performance Year 2002 “The lack of specialization hurts Imagine’s performance compared to modern graphics processors”.

Conclusions on Imagine Performance Year 2002 “The lack of specialization hurts Imagine’s performance compared to modern graphics processors”. “When comparing graphics algorithms, [the lack of specialization] does make Imagine performance-neutral to the algorithms employed”.

Comparing Reyes and OpenGL on a Stream Architecture Why? Frame Speed Frame Complexity/ Quality OpenGLReyes Speed: Interactive (50 frames per second) Speed: Allowing to compute the pictures of a 2 hours movie in one year (1 frame every 3 minutes or frames per second)

Comparing Reyes and OpenGL on a Stream Architecture Why? Frame Speed Frame Complexity/ Quality OpenGLReyes Quality/ Complexity: Variable... Quality/ Complexity: Indistinguishable from live action motion picture photography. As complex as real scenes.

Comparing Reyes and OpenGL on a Stream Architecture Why? Frame Speed Frame Complexity/ Quality OpenGLReyes

The OpenGL Pipeline Command Specification glBegin(GL_TRIANGLES) glColor3f(0.5,0.8,0.9); glVertex3f(5.,0.4,100.); glVertex3f(0.6,101.,102.); glVertex3f(2.,5.,6.); glEnd() etc... Object Space

The OpenGL Pipeline Per Vertex Operation Eye Space

The OpenGL Pipeline Per Vertex Operation: Lighting, Shading Eye Space Programmable Stage

The OpenGL Pipeline Assembly Eye Space

The OpenGL Pipeline Per Primitive Operation: Clip and Project Eye Space

The OpenGL Pipeline Per Primitive Operation: Clip and Project Eye Space

The OpenGL Pipeline Rasterization: Interpolation Screen Space

The OpenGL Pipeline Rasterization: Fragment Generation Screen Space

The OpenGL Pipeline Rasterization: Fragment Generation Screen Space

The OpenGL Pipeline Per Fragment Operation: Texturing and Blending Screen Space Programmable Stage

The OpenGL Pipeline Composite: visibility filter Screen Space

The Reyes Pipeline Command specification Fractals Graftals Bezier surfaces etc... Object Space

The Reyes Pipeline Tessellation. Splitting of big primitives in smaller ones. Dicing in micropolygones. Eye Space  Sphere split into patches.  Patches split into grids of micropolygones. 1/2 pixel Knowledge of Screen Space

The Reyes Pipeline Flat shading, texturing, blending. Eye Space 1/2 pixel Programmable Stage

The Reyes Pipeline Jittering or stochastic sampling to eliminate any artifact. Screen Space 1 Pixel 16 subpixels

The Reyes Pipeline Jittering or stochastic sampling. Screen Space 1 Pixel Random displacement

The Reyes Pipeline Jittering or stochastic sampling. Screen Space

The Reyes Pipeline Depth filtering to obtain final image. Screen Space

Difference between OpenGL and Reyes OpenGLReyes Two programming stages.One programming stage. Coherent access texture.Mipmapping (non coherent texture access). Primitives are triangles.Primitives are micropolygons. Does not support high order data type. Support high order data type (e.g.: Bezier surfaces). Reyes Hardware Implementation Easier.

Difference between OpenGL and Reyes OpenGLReyes Two programming stages.One programming stage. Mipmapping (non coherent texture access). Coherent access texture. Primitives are triangles.Primitives are micropolygons. Does not support high order data type. Support high order data type (e.g.: Bezier surfaces). Reyes saves in computation and memory bandwidth.

Difference between OpenGL and Reyes OpenGLReyes Two programming stages.One programming stage. Mipmapping (non coherent texture access). Coherent access texture. Primitives are triangles.Primitives are micropolygons. Does not support high order data type. Support high order data type (e.g.: Bezier surfaces). Reyes advantages: Easy storage of primitives. Load balance. Parallelization. OpenGL advantages: Work Factorization for shading and lighting.

Difference between OpenGL and Reyes OpenGLReyes Two programming stages.One programming stage. Mipmapping (non coherent texture access). Coherent access texture. Primitives are triangles.Primitives are micropolygons. Does not support high order data type. Support high order data type (e.g.: Bezier surfaces). Reyes advantages: Easy storage of primitives. Load balance. Parallelization. Triangle size gets smaller and smaller in modern graphics scenes.

Difference between OpenGL and Reyes OpenGLReyes Two programming stages.One programming stage. Mipmapping (non coherent texture access). Coherent access texture. Primitives are triangles.Primitives are micropolygons. Does not support high order data type. Support high order data type (e.g.: Bezier surfaces). Reyes reduces the necessary bandwidth between host CPU and graphics card.

Implementation on the Stream Processor OpenGL modifications: Programmable shader added. Barycentric rasterizer algorithm instead of scanline algorithm. Reyes modifications: No supersampling. Micropolygon size is not half a pixel anymore.

Implementation on the Stream Processor Frame Speed Frame Complexity/ Quality OpenGLReyes

Implementation on the Stream Processor Frame Speed Frame Complexity/ Quality Enhanced OpenGL Implementation Degraded Reyes Implementation

Implementation on the Stream Processor OpenGL Implementation Reyes Implementation Isim Simulator  Models complete Imagine architecture. Idebug Simulator  Do not model kernel stalls  Do not model cluster occupancy effects  Increased size of dynamically addressable memory How to compare the results?

Implementation on the Stream Processor OpenGL Implementation Reyes Implementation Isim Simulator  Models complete Imagine architecture. Idebug Simulator  Do not model kernel stalls  Do not model cluster occupancy effects  Increased size of dynamically addressable memory Results of Idebug multiplied by 20%

Results

Conclusion “When comparing graphics algorithms, [the lack of specialization] does make Imagine performance-neutral to the algorithms employed”. “Our Reyes implementation made slight changes to the simulated Imagine hardware [...] Having a larger [size of addressable memory] was vital for kernel efficiency”.

Conclusion “Imagine is an appropriate platform for comparing different rendering algorithms toward an eventual goal of high- performance hardware implementation.”

Conclusion “Continued work in the area of efficient and powerful subdivision algorithm is necessary to allow a Reyes pipeline to demonstrate comparable performance to its OpenGL counterpart.”