Parallel Graphics Rendering Matthew Campbell Senior, Computer Science

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS SOFTWARE.
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
CS 352: Computer Graphics Chapter 7: The Rendering Pipeline.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
CAP4730: Computational Structures in Computer Graphics Visible Surface Determination.
Graphics Hardware and Software Architectures
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
Adapted from: CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Naga K. Govindaraju, Stephane.
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
Introduction to Parallel Rendering: Sorting, Chromium, and MPI Mengxia Zhu Spring 2006.
Parallel Rendering Ed Angel
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
1 Angel: Interactive Computer Graphics 4E © Addison-Wesley 2005 Models and Architectures Ed Angel Professor of Computer Science, Electrical and Computer.
3D Rendering & Algorithms__ Sean Reichel & Chester Gregg a.k.a. “The boring stuff happening behind the video games you really want to play right now.”
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Raghu Machiraju Slides: Courtesy - Prof. Huamin Wang, CSE, OSU
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
Chep06 1 High End Visualization with Scalable Display System By Dinesh M. Sarode, S.K.Bose, P.S.Dhekne, Venkata P.P.K Computer Division, BARC, Mumbai.
Parallel Rendering 1. 2 Introduction In many situations, standard rendering pipeline not sufficient ­Need higher resolution display ­More primitives than.
Computer Graphics Graphics Hardware
Large-Scale Polygon Rendering. Solutions Decimation Visibility Culling Parallel Rendering Others.
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
Matrices from HELL Paul Taylor Basic Required Matrices PROJECTION WORLD VIEW.
Seminar II: Rendering Architectures Yan Cui Love Joy Mendoza Oscar Kozlowski John Tang.
The Graphics Rendering Pipeline 3D SCENE Collection of 3D primitives IMAGE Array of pixels Primitives: Basic geometric structures (points, lines, triangles,
Graphics Systems and OpenGL. Business of Generating Images Images are made up of pixels.
CSC 461: Lecture 3 1 CSC461 Lecture 3: Models and Architectures  Objectives –Learn the basic design of a graphics system –Introduce pipeline architecture.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Image Compositing Hardware The Metabuffer: A Scalable Multiresolution Multidisplay 3-D Graphics System Using Commodity Rendering Engines Lightning-2: A.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Parallel Rendering. 2 Introduction In many situations, a standard rendering pipeline might not be sufficient ­Need higher resolution display ­More primitives.
1Computer Graphics Lecture 4 - Models and Architectures John Shearer Culture Lab – space 2
2 COEN Computer Graphics I Evening’s Goals n Discuss application bottleneck determination n Discuss various optimizations for making programs execute.
Partitioning Screen Space 1 (An exciting presentation) © 2002 Brenden Schubert A New Algorithm for Interactive Graphics on Multicomputers * The Sort-First.
Hardware-accelerated Rendering of Antialiased Shadows With Shadow Maps Stefan Brabec and Hans-Peter Seidel Max-Planck-Institut für Informatik Saarbrücken,
Parallel Rendering Ed Angel Professor Emeritus of Computer Science University of New Mexico 1 E. Angel and D. Shreiner: Interactive Computer Graphics 6E.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Partitioning Screen Space 2 Rui Wang. Architectural Implications of Hardware- Accelerated Bucket Rendering on the PC (97’) Dynamic Load Balancing for.
1 Angel: Interactive Computer Graphics5E © Addison- Wesley 2009 Image Formation Fundamental imaging notions Fundamental imaging notions Physical basis.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Hierarchical Occlusion Map Zhang et al SIGGRAPH 98.
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
Chapter 1 Graphics Systems and Models Models and Architectures.
1 E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley 2012 Models and Architectures 靜宜大學 資訊工程系 蔡奇偉 副教授 2012.
Graphics Pipeline Bringing it all together. Implementation The goal of computer graphics is to take the data out of computer memory and put it up on the.
Computer Graphics Graphics Hardware
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Scalability of Intervisibility Testing using Clusters of GPUs
Graphics Processing Unit
The Graphics Rendering Pipeline
Models and Architectures
Models and Architectures
Models and Architectures
Introduction to Computer Graphics with WebGL
Models and Architectures
Computer Graphics Graphics Hardware
Models and Architectures
RADEON™ 9700 Architecture and 3D Performance
Presentation transcript:

Parallel Graphics Rendering Matthew Campbell Senior, Computer Science

Overview Motivation Three categories of parallel rendering Our approach Results Questions

Motivation PC graphics cards are getting faster at an exponential rate. PC graphics boards are much cheaper than proprietary SGI hardware. Geforce4 FX = $ (130 Mtris/sec) SGI Onyx 300 = $145,000 (80 Mtris/sec) Maintanance costs are lower Replacement parts are easy to get. PC’s are not as complicated as proprietary hardware.

Parallel Rendering String together numerous PC’s with good graphics boards and render the models in parallel. Increased performace Better technology tracking Three groups of algorithms: Sort-First Sort-Middle Sort-Last

Rendering Pipeline Transformation stage: Per-Vertex operations Primitive Assembly 3D World Space! Rasterization stage: Per-fragment operations Texture mapping 2D Image Space!

Parallel Rendering – Sort Last Sort Last Distribute polygons Round robin distribution resulting in an equal load on each processor. Pass through entire rendering pipeline. Transformation / Rasterization (see last slide) Each CPU now has the entire scene But individual scenes are incomplete Hidden polygons may be visible Solution: Image composition

Sort Last – Image Composition The scene at each CPU has a frame buffer with color values for each pixel and a depth buffer with Z values for each pixel. Composition: Given 2 scenes it computes the color of the pixel at each screen coordinate Compare the depth buffer values at each pixel location. The resultant color value is the color of the pixel corresponding to a lower z axis value. Alpha blending is more complex. Why?

Sort Last – Image Composition Time complexity of the previous sort algorithm is O(n), which is pretty bad. Can we improve it? Alternate algorithms: Tree composition. Rotating rings. Binary composition.

Sort-Last Performance Sort-Last has very high communication bandwidth requirement. Each processor needs to send and receive an entire frame 1280x1024 resolution, 24-bits for color, 16-bits for depth, 30fps = (3.9MB + 2.6MB) * 30 = 196MB/sec bidirectional! Need a very fast network interconnecting the CPUs in the cluster. In actuality, we need more bandwidth, because we haven’t taken into account, the time it takes to render the scene! But.. No overhead for rendering the actual scene!

Parallel Rendering – Sort Middle Sort Middle Distribute polygons in a round robin fashion Trap polygons between geometry and rasterization phases Each CPU in the cluster is responsible for a specific region in screen coordinates Calculate the bounding boxes (screen space) for the trapped polygons and redistribute them to the appropriate CPU responsible for the region. Collate Images

Parallel Rendering – Sort Middle How do you divide the screen into regions? Strips (either horizontal or vertical) Squares What is the mapping ratio between CPUs and regions? One-to-One: Each CPU manages 1 region One-to-Many: Each CPU manages many regions What about polygons that cross region boundaries? Multiple CPUs render the same polygon.

Sort-Middle Performance Load-balancing can be poor. The slowest CPU will block the system from rendering the next scene. Load balancing is highly scene and view dependent. Need adaptive load-balancing schemes. In high polygon count scenes, the size of each polygon can be very small (~1 – 2 pixels). In this case, sort middle requires more bandwidth than sort- last. Communication bandwidth required is dependent on the scene complexity. (Bad)

Parallel Rendering – Sort First Sort First Distribute polygons round-robin to all CPUs. Calculate bounding volumes for each polygon Remember, we are still in the world coordinate system. Each CPU is responsible for 1 volume. Redistribute polygons based on bounding volumes. Pass through complete rendering pipeline In the end we have sub-images at each processor. Designate a coordinator node, which receives sub- images from all other processors. Coordinator collates sub-images into the final image.

Sort First - Performance Communication bandwidth required is based only on screen space resolution. Example: 4 CPUs, 1024*1024 scene, 32 bits/color The coordinator node receives 1024*1024*24 bits/frame. ~ 3MB. Bandwidth: 90MB/sec for 30 fps. Problem: Similar to sort-middle, load balancing is scene dependent. Bigger issue: Can’t use a one-to-many CPU to region mapping. Or can you?

Parallel Rendering Issues Cannot break the rendering pipeline Pipeline is implemented in hardware Therefore, very expensive. Could lead to excessive stalls, cache misses, etc.. Modern graphics cards have large amounts of memory on the board and much faster access times. 8GB/sec vs. 1GB/sec for AGP4x Graphics driver source code is unavailable Additional cost/overhead due to framebuffer accesses.

Our Approach High Performance real-time rendering. High scene complexity and/or multiple displays as in a VE. Target: million triangles/sec. In comparison the best SGI platform – Reality Monster is capable of 80 million polygons/sec Approach: Distributed Sort-First. Two level sorting. Organize your model in a spatial tree data structure. At run-time compare bounding volumes for interior nodes of the tree. The bounding volume for an interior node is a superset of its children. This minimizes comparisons. Fine pruning based on viewing frustum.

Hardware 32 Intel Xeon processor cluster (1.5 GHz processor) 256 MB RDRAM/node (3.2 GB/sec memory bandwidth) Myrinet (4 Gbps) and Fast Ethernet (200 Mbps full-duplex) communication fabrics. 64 bit/66 MHz PCI bus (4 Gbps throughput) 4x AGP (1GB/sec throughput)

Software Extensible Parallel 3D Rendering Engine Supports large geometric databases, including standard formats such as 3D Studio Provides an extensible API. Underlying system is based on OpenGL. Based on dynamic shared object model. Dynamic Load Balancing Adaptively resizes volumes assigned to a processor for single display systems. Adaptively changes the number of processors and rendering volumes for multi-display systems.

Software Architecture Master-Slave arrangement Multi-threaded Two stage parallel rendering pipeline.

Results – Rendering Rate Figure 1: Scalability of our implementation. Actual depicts the performance taking into account triangle overlap among nodes, effective depicts what the system is capable of delivering. Left image uses a real world dataset (LIDAR data). Right image uses a generated dataset to fully exploit the overlap issue.

Results – Load Balancing Figure 2: The effects of load balancing on 4 nodes (left) and 16 nodes (right). The graph depicts the individiual frame times for first 100 frames.

?