DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Advertisements

Games, Movies and Virtual Worlds – An Introduction to Computer Graphics Ayellet Tal Department of Electrical Engineering Technion.
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
3D Graphics Rendering and Terrain Modeling
Xingfu Wu Xingfu Wu and Valerie Taylor Department of Computer Science Texas A&M University iGrid 2005, Calit2, UCSD, Sep. 29,
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
Chapter 6: Vertices to Fragments Part 2 E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley Mohan Sridharan Based on Slides.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
School of Computer Science and Software Engineering A Networked Virtual Environment Communications Model using Priority Updating Monash University Yang-Wai.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
Parallel Rendering Ed Angel
Fast Isosurface Visualization on a High-Resolution Scalable Display Wall Adam Finkelstein Allison Klein Kai Li Princeton University Sponsors: DOE, Intel,
Ch 1 Intro to Graphics page 1CS 367 First Day Agenda Best course you have ever had (survey) Info Cards Name, , Nickname C / C++ experience, EOS experience.
Parallel Graphics Rendering Matthew Campbell Senior, Computer Science
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
CLUSTER COMPUTING Prepared by: Kalpesh Sindha (ITSNS)
Low Cost Virtual Reality Platform Done by: Peter Fang, Kevin Feng & Karen Wai Supervised by: Prof. Edwin Blake & Dave Maclay.
Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.
Parallel Rendering 1. 2 Introduction In many situations, standard rendering pipeline not sufficient ­Need higher resolution display ­More primitives than.
MV-4474 Virtual Environment Network & Software Architectures Michael Zyda
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
MediaGrid Processing Framework 2009 February 19 Jason Danielson.
Real-Time Rendering SPEEDING UP RENDERING Lecture 04 Marina Gavrilova.
A High-Performance Scalable Graphics Architecture Daniel R. McLachlan Director, Advanced Graphics Engineering SGI.
A Distributed Algorithm for 3D Radar Imaging PATRICK LI SIMON SCOTT CS 252 MAY 2012.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Parallel Rendering. 2 Introduction In many situations, a standard rendering pipeline might not be sufficient ­Need higher resolution display ­More primitives.
Impact of High Performance Sockets on Data Intensive Applications Pavan Balaji, Jiesheng Wu, D.K. Panda, CIS Department The Ohio State University Tahsin.
1 The Rendering Pipeline. CS788 Topic of HCI 2 Outline  Introduction  The Graphics Rendering Pipeline  Three functional stages  Example  Bottleneck.
Commodity-SC Workshop, Mar00 Cluster-based Visualization Dino Pavlakos Sandia National Laboratories Albuquerque, New Mexico.
Real-time Graphics for VR Chapter 23. What is it about? In this part of the course we will look at how to render images given the constrains of VR: –we.
Partitioning Screen Space 1 (An exciting presentation) © 2002 Brenden Schubert A New Algorithm for Interactive Graphics on Multicomputers * The Sort-First.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
Implementation II Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts University of New Mexico.
Parallel Rendering Ed Angel Professor Emeritus of Computer Science University of New Mexico 1 E. Angel and D. Shreiner: Interactive Computer Graphics 6E.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Subject Name: Computer Graphics Subject Code: Textbook: “Computer Graphics”, C Version By Hearn and Baker Credits: 6 1.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
A Grid-enabled Multi-server Network Game Architecture Tianqi Wang, Cho-Li Wang, Francis C.M.Lau Department of Computer Science and Information Systems.
Partitioning Screen Space 2 Rui Wang. Architectural Implications of Hardware- Accelerated Bucket Rendering on the PC (97’) Dynamic Load Balancing for.
08/10/ NRL Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division Professor.
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.
Computer Graphics I, Fall 2010 Implementation II.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
Parallel Computing on Wide-Area Clusters: the Albatross Project Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema Vrije Universiteit.
Electronic visualization laboratory, university of illinois at chicago Sort Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays.
Image Fusion In Real-time, on a PC. Goals Interactive display of volume data in 3D –Allow more than one data set –Allow fusion of different modalities.
Veysi ISLER, Department of Computer Engineering, Middle East Technical University, Ankara, TURKEY Spring
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Chapter 1 An overview on Computer Graphics
Concurrent Data Structures for Near-Memory Computing
Scalability of Intervisibility Testing using Clusters of GPUs
So far we have covered … Basic visualization algorithms
Real-Time Ray Tracing Stefan Popov.
Department of Computer Science University of California, Santa Barbara
Department of Computer Science University of California, Santa Barbara
Cluster Computers.
Presentation transcript:

DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science Rutgers University John Zahorjan Department of Computer Science & Engineering University of Washington

IPDPS 2001 Overview  Improve real-time rendering performance using distributed rendering on commodity clusters Real-time rendering -> interactive rendering applications Improve performance -> Render more complex scenes at interactive rates  Why real-time rendering? A critical component of an increasing number of continuous media applications  Virtual reality, data visualization, CAD, flight simulators, etc. Rendering performance will continue to be a bottleneck  Model complexity increasing as fast (or faster) than hardware performance  Part of the challenge is to leverage increasingly powerful hardware accelerators

IPDPS 2001 Challenges  How to structure the distributed renderer to leverage hardware-assisted rendering Information that is useful for work partitioning and assignment may be hidden in the hardware rendering pipeline  How to minimize non-parallelizable overheads (avoiding Amdhal’s Law)  How to decouple bandwidth requirement from the complexity of the scene and the cluster size

IPDPS 2001 Image Layer Decomposition (ILD)  Per-frame rendering load is partitioned using ILD presented in IPDPS 2000  Briefly review ILD because it affects DDDDRRaW’s architecture and performance  Basic idea: assign scene objects such that sets of objects assigned to different nodes are not mutually occlusive  Advantages of using ILD Do not need position of polygons in 2D  This information may be hidden inside the graphics pipeline Do not need Z-buffer information  This reduces the required bandwidth by at least 50%

IPDPS 2001 Spatial partitioning Image Layer Decomposition (ILD)

IPDPS 2001  Non-mutually occlusive assignment -> legal for back-to-front compositing  Use heuristic-based algorithm to Balance load across cluster Minimize the screen real-estate covered by each assignment ILD: Work Assignment Legal

IPDPS 2001 App. DDDDRRaW Library DDDDRRaW Library DDDDRRaW Library DDDDRRaW Library DDDDRRaW Library … Display Work Assignment Partial Image VRML Scene, Display Window Viewpoint Display Node Rendering Nodes Implementation: Architecture Partitioning Assignment Decompress Compositing Rendering Compress

IPDPS 2001 Implementation Details  Implemented an optimization to ILD: dynamic selection of octants to be rendered Minimize overhead of geometric transformation due to polygon splitting (in scene decomposition)  Compression of image layers before communication Reduce bandwidth requirement to accommodate slower networks (eg., 100 Mb/s LANs)  Use dynamic clipping to enforce octant boundaries for scene with smooth shading and/or texturing Simplification to ease implementation of prototype – this clipping could/should be done statically percent overhead for 5 of our 6 test scenes that would not be present in a production system

IPDPS 2001 Performance Measurement  Application: VRML viewer VRweb –  Collected 6 VRML scenes from the web Use fix paths through scenes to measure performance in terms of average frame rate (frames/sec)  Two clusters representing different points in the technology spectrum Cluster of 5 SGI O2s  180 MHz Mips R5000, 256 MB memory, SGI Graphics Accelerator, 100 Mb/s switched Ethernet LAN  IRIX Cluster of 13 PCs  Pentium III 800 MHz, 512 MB memory, Giganet 1 Gb/s cLAN  Red Hat Linux (kernel ), Mesa 3D library version 3.2

IPDPS 2001 Two Test Scenes

IPDPS 2001 Overheads on SGI O2s OperationTime (ms) Display NodeRendering Node P=1P=2P=4P=1P=2P=4 ILD Clear Image Buffer 3.50 Decompress Display Frame 0.18 Compress

IPDPS 2001 Overheads on PCs Operation Time (ms) Display NodeRendering Node P=1P=4P=8P=12P=1P=4P=8P=12 ILD Clear Image Buffer Decompress DisplayFrame Compress

IPDPS 2001 Speed-up of Average Frame Rate on O2s

IPDPS 2001 Speed-up of Average Frame Rate on PCs

IPDPS 2001 Speed-up of Rendering Component on PCs

IPDPS 2001 Conclusions  Can build an ILD-based distributed renderer to significantly improve real-time rendering performance on commodity hardware  DDDDRRaW currently scales to modestly sized cluster This limitation is due to non-optimal hardware configurations This is NOT because more suitable hardware is not available! Expect good scalability to clusters of nodes  Overlapping communication with computation increases average frame rate but ONLY at the expense of increasing frame latency Problem is CPU contention for rendering & communication Either need dedicated hardware or can only optimize after reaching fps, the nominal interactive frame rate  Project URL:

IPDPS 2001 Overlapping Communication & Computation  Communication and compression are significant sources of overhead  Apply standard parallel optimization technique: overlap communication of rendered image layers for one frame with rendering of the next  Requires pipelining of DDDDRRaW

IPDPS 2001 The DDDDRRaw Pipeline Render Compress Receive Send Decompress Composite & Display ILDSend Receive Stage 1Stage 3 Stage 2 Display Node Rendering Nodes

IPDPS 2001 Average Frame Rates

IPDPS 2001 Average Frame Latency