A Sorting Classification of Parallel Rendering Molnar et al., 1994.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS SOFTWARE.
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
CS 352: Computer Graphics Chapter 7: The Rendering Pipeline.
Graphics Pipeline.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Graphics Graphics Korea University cgvr.korea.ac.kr 1 Hidden Surface Removal 고려대학교 컴퓨터 그래픽스 연구실.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
HCI 530 : Seminar (HCI) Damian Schofield.
Vertices and Fragments I CS4395: Computer Graphics 1 Mohan Sridharan Based on slides created by Edward Angel.
School of Computer Science and Software Engineering A Networked Virtual Environment Communications Model using Priority Updating Monash University Yang-Wai.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.
1Notes. 2Atop  The simplest (useful) and most common form of compositing: put one image “atop” another  Image 1 (RGB) on top of image 2 (RGB)  For.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.
Introduction to Parallel Rendering: Sorting, Chromium, and MPI Mengxia Zhu Spring 2006.
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
Parallel Rendering Ed Angel
Graphic Architecture introduction and analysis
1 Angel: Interactive Computer Graphics 4E © Addison-Wesley 2005 Models and Architectures Ed Angel Professor of Computer Science, Electrical and Computer.
Parallel Graphics Rendering Matthew Campbell Senior, Computer Science
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
Graphics Pipeline Hidden Surface CMSC 435/634. Visibility We can convert simple primitives to pixels/fragments How do we know which primitives (or which.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
Pipelines are for Whimps Raycasting, Raytracing, and Hardcore Rendering.
Electronic Visualization Laboratory University of Illinois at Chicago “Sort-First, Distributed Memory Parallel Visualization and Rendering” by E. Wes Bethel,
CHAPTER 4 Window Creation and Control © 2008 Cengage Learning EMEA.
Chep06 1 High End Visualization with Scalable Display System By Dinesh M. Sarode, S.K.Bose, P.S.Dhekne, Venkata P.P.K Computer Division, BARC, Mumbai.
Parallel Rendering 1. 2 Introduction In many situations, standard rendering pipeline not sufficient ­Need higher resolution display ­More primitives than.
WireGL A Scalable Graphics System for Clusters Greg Humphreys, Matthew Eldridge, Ian Buck, Gordon Stoll, Matthew Everett, and Pat Hanrahan Presented by.
Cg Programming Mapping Computational Concepts to GPUs.
Large-Scale Polygon Rendering. Solutions Decimation Visibility Culling Parallel Rendering Others.
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
Matrices from HELL Paul Taylor Basic Required Matrices PROJECTION WORLD VIEW.
Week 2 - Friday.  What did we talk about last time?  Graphics rendering pipeline  Geometry Stage.
Seminar II: Rendering Architectures Yan Cui Love Joy Mendoza Oscar Kozlowski John Tang.
CSC 461: Lecture 3 1 CSC461 Lecture 3: Models and Architectures  Objectives –Learn the basic design of a graphics system –Introduce pipeline architecture.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Parallel Rendering. 2 Introduction In many situations, a standard rendering pipeline might not be sufficient ­Need higher resolution display ­More primitives.
A Reconfigurable Architecture for Load-Balanced Rendering Graphics Hardware July 31, 2005, Los Angeles, CA Jiawen Chen Michael I. Gordon William Thies.
1 The Rendering Pipeline. CS788 Topic of HCI 2 Outline  Introduction  The Graphics Rendering Pipeline  Three functional stages  Example  Bottleneck.
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
Partitioning Screen Space 1 (An exciting presentation) © 2002 Brenden Schubert A New Algorithm for Interactive Graphics on Multicomputers * The Sort-First.
Hidden Surface Removal
Computer Graphics II University of Illinois at Chicago Volume Rendering Presentation for Computer Graphics II Prof. Andy Johnson By Raj Vikram Singh.
Parallel Rendering Ed Angel Professor Emeritus of Computer Science University of New Mexico 1 E. Angel and D. Shreiner: Interactive Computer Graphics 6E.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Partitioning Screen Space 2 Rui Wang. Architectural Implications of Hardware- Accelerated Bucket Rendering on the PC (97’) Dynamic Load Balancing for.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
The Graphics Pipeline Revisited Real Time Rendering Instructor: David Luebke.
An interleaved Parallel Volume Renderer with PC-clusters Antonio Garcia and Han-Wei Shen Dept of Computer Science Ohio State University Eurographics Workshop.
1cs426-winter-2008 Notes. 2 Atop operation  Image 1 “atop” image 2  Assume independence of sub-pixel structure So for each final pixel, a fraction alpha.
Chapter 1 Graphics Systems and Models Models and Architectures.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Veysi ISLER, Department of Computer Engineering, Middle East Technical University, Ankara, TURKEY Spring
Siggraph 2009 RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo JAEHYUN CHO.
GPU Architecture and Its Application
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
CMSC 611: Advanced Computer Architecture
Week 2 - Friday CS361.
Graphics Processing Unit
The Graphics Rendering Pipeline
Introduction to Computer Graphics with WebGL
Presentation transcript:

A Sorting Classification of Parallel Rendering Molnar et al., 1994

Exploiting Parallelism Functional parallelism (pipelining) Data parallelism Object-parallelism Image-parallelism

Adapting the pipeline to parallel rendering Geometry processing T&L Clipping Rasterization Scan conversion Shading Visibility etc.

Parallel rendering as a sorting problem Sort-first during geometry processing distributes “raw” primitives Sort-middle between geom. processing and rasterization distributes screen-space primitives Sort-last during rasterization distributes pixels/fragments

Sort-first Primitives initially assigned arbitrarily Pre-transformation is done to determine which screen regions are covered Primitives are then redistributed over the network to the correct renderer Renderer performs the work of the entire pipeline for that primitive from that point on

Sort-first analysis Overhead: Overlap factor Pre-transform cost Primitive distribution cost depends on fraction of primitives being redistributed

Sort-first analysis Pros: Low communication requirements when tesselation or oversampling are high, or when inter-frame coherence exploited Processors implement entire rendering pipeline for a given screen region Cons: Susceptible to load imbalance (clumping) Exploiting coherence is difficult

Sort-middle Primitives initially assigned arbitrarily Primitives fully transformed, lit, etc., by the geometry processor to which they are initially assigned Transformed primitives are distributed over the network to the rasterizer assigned to their region of the screen

Sort-middle analysis Overhead: Display primitive distribution cost Tessellation factor

Sort-middle analysis Pros: Redistribution occurs at a “natural” place Cons: High communication cost if T is high Susceptible to load imbalance in the same way as sort-first

Sort-last Defers sorting until the end (imagine that) Renderers operate independently until the visibility stage Fragments transmitted over network to compositing processors to resolve visibility

Sort-last “flavors” SL-sparse Only distributes pixels actually produced by rasterization SL-full Stores and transfers a full image from each renderer

Sort-last analysis Overhead: Boils down to the number of fragments that cross the wire times their size SL-sparse overhead depends on number of pixels generated per frame SL-full overhead depends on number of processors used (which is only indirectly related to number of primitives)

Sort-last analysis Pros: Renderers implement full pipeline and are independent until pixel merging Less prone to load imbalance Very scalable (with SL-full at least) Cons: Pixel traffic can be extremely high

So which is better? It depends. (Surprise, surprise.) Which ones can be best matched to hardware capabilities? Number of primitives, tesselation factor, coherence, etc., are all considerations. Many tradeoffs.

Future work Hybrid approaches seem promising

Designing Graphics Architectures Around Scalability and Communication Eldridge, 2001

Much of the same Things I’m going to skim over: Graphics pipeline overview Types of parallelism similar to Molnar’s description note though that it calls out memory as a communications medium over time Pomegranate architecture to be presented (by me) later this term

What’s new So what do we get out of this paper that interests us in today’s discussion? Further subdivision in our taxonomy Some pretty graphs to look at ;)

Revised Taxonomy: Overview MolnarRevisedPipeline- centric Data- centric Sort-first Sort-middle SL-sparse SL-full Sort-first Sort-middle SL-fragment SL-image Geometry Rasterization Fragment Display Vertex Primitive Fragment Sample

Sort-last revisited Claims that “lumping together” SL-sparse and SL-full is unwarranted as they are not closely related Fragment sorting does not suffer the same disadvantages as image-composition sorting

Sort-first Sort-first retained-mode Exploits frame-to-frame coherence Challenging to implement Sensitive to the amount of motion, amount of instancing, etc. Example: Princeton’s Scalable Display Wall

Sort-first Sort-first immediate-mode Can use unmodified commodity graphics accelerators as rendering pipes All that changes is a software layer is inserted between the app and the many-in-place-of- one renderers Example: WireGL (cough =-)

Sort-middle Sort-middle interleaved Uses interleaved image parallelism Each primitive is broadcast to all rasterizers Example: SGI’s InfiniteReality Problem: geometry performance not scalable

Sort-middle Sort-middle tiled Uses tiled image parallelism Primitives are routed rather than broadcast Further subdivided: Primitive Order (e.g., Stanford’s Argus, which uses one thread per screen tile) Tile Order (e.g., Pixel-Planes 5, where tiles are processed sequentially)

Sort-last Sort-last fragment Overlap factor guaranteed to be 1 But because geom. processing and rasterization are tied, there is no control over load balancing for rasterization Examples: Evans & Sutherland Freedom Series, Kubota Denali

Sort-last Sort-last image composition Forego ordered semantics Load imbalance is extended to the fragment processors However, fixed bandwidth is required (depends on size of output rather than input)

Hybrids WireGL + Lightning-2 VC-1

Parallel Rendering Crockett, 1995

What’s left to say? This paper provides mostly a conceptual overview Looks at other factors such as hardware details (shared vs. distributed memory, SIMD vs. MIMD, etc) Mentions binary swap Looks at non-rasterizing rendering methods (i.e., raytracing)