Scalability of Intervisibility Testing using Clusters of GPUs

Slides:

Advertisements

Similar presentations

Graphics Pipeline.

Advertisements

Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.

GI 2006, Québec, June 9th 2006 Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Edgar Velázquez-Armendáriz Eugene Lee Bruce.

Visibility Culling. Back face culling View-frustrum culling Detail culling Occlusion culling.

Visibility Culling using Hierarchical Occlusion Maps Hansong Zhang, Dinesh Manocha, Tom Hudson, Kenneth E. Hoff III Presented by: Chris Wassenius.

Rage Fury MAXX™. The Answer to today’s 3D dilemma High performance AND High quality AND Universal application acceleration.

Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.

Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, Pat Hanrahan Stanford University DARPA Site Visit, UNC.

Adapted from: CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Naga K. Govindaraju, Stephane.

Parallel Occlusion Culling for Interactive Walkthrough using Multiple GPUs Naga K Govindaraju, Avneesh Sud, Sun-Eui Yoon, Dinesh Manocha University of.

Interactive Shadow Generation in Complex Environments Naga K. Govindaraju, Brandon Lloyd, Sung-Eui Yoon, Avneesh Sud, Dinesh Manocha Speaker: Alvin Date:

DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.

Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.

Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.

Assets and Dynamics Computation for Virtual Worlds.

ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.

Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.

Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.

Parallel Graphics Rendering Matthew Campbell Senior, Computer Science

Hidden Surface Removal

Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.

Shadows Computer Graphics. Shadows Shadows Extended light sources produce penumbras In real-time, we only use point light sources –Extended light sources.

Load Balancing Dan Priece. What is Load Balancing? Distributed computing with multiple resources Need some way to distribute workload Discreet from the.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.

© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.

CSE 381 – Advanced Game Programming Basic 3D Graphics

Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.

NVIDIA PROPRIETARY AND CONFIDENTIAL Occlusion (HP and NV Extensions) Ashu Rege.

Visibility Queries Using Graphics Hardware Presented by Jinzhu Gao.

Computer Graphics Graphics Hardware

Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.

Week 2 - Friday.  What did we talk about last time?  Graphics rendering pipeline  Geometry Stage.

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.

Diane Marinkas CDA 6938 April 30, Outline Motivation Algorithm CPU Implementation GPU Implementation Performance Lessons Learned Future Work.

Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.

Introduction: Lattice Boltzmann Method for Non-fluid Applications Ye Zhao.

Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna.

LODManager A framework for rendering multiresolution models in real-time applications J. Gumbau O. Ripollés M. Chover.

Real-Time Dynamic Shadow Algorithms Evan Closson CSE 528.

Hierarchical Occlusion Map Zhang et al SIGGRAPH 98.

CHC ++: Coherent Hierarchical Culling Revisited Oliver Mattausch, Jiří Bittner, Michael Wimmer Institute of Computer Graphics and Algorithms Vienna University.

Computer Graphics Graphics Hardware

GPU Architecture and Its Application

Graphics Processor Graphics Processing Unit

COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE

Scene Manager Creates and places movable objects like lights and cameras so as to access them efficiently, e.g. for rendering. Loads and assembles world.

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Video RAM Presented by GHOLAMREZA KAKAMANSHADI

Week 2 - Friday CS361.

Hidden Surface Removal

Graphics Processing Unit

Real-Time Ray Tracing Stefan Popov.

The Graphics Rendering Pipeline

CS451Real-time Rendering Pipeline

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Computer-Generated Force Acceleration using GPUs: Next Steps

GPU-Accelerated Route Planning for Computer Generated Forces

Parallel I/O System for Massively Parallel Processors

Real-time Rendering Shadow Maps

Graphics Processing Unit

Visibility Computations

By Brandon, Ben, and Lee Parallel Computing.

Computer Graphics Graphics Hardware

Ray Tracing on Programmable Graphics Hardware

RADEON™ 9700 Architecture and 3D Performance

Database System Architectures

Parallel computing in Computational chemistry

Interactive Sampling and Rendering for Complex and Procedural Geometry

Presentation transcript:

Scalability of Intervisibility Testing using Clusters of GPUs Dr. Guy Schiavone, Judd Tracy, Eric Woodruff, and Mathew Gerber IST/UCF University of Central Florida 3280 Progress Drive Orlando, FL 32826 Troy Dere, Julio de la Cruz RDECOM-STTC Orlando FL 32826

Commoditization of Computing Mass market economics drives “Moore’s Law” : exponential increase in performance/cost ratio. Combining commodity hardware and free-source software can provide low-cost “supercomputing”: Beowulf clusters Graphical Processing Units (GPUs) progressing even faster (Super Moore’s Law)

Intervisibility Problem in CGF Dynamic Entity Interactions a major constraint on performance in CGF systems Hypothesis: Reducing time of Line-of-sight (LOS) calls can significantly increase number of supportable entities in CGF Idea – Combine cluster computing with GPU co-processing, test scalability.

Background 1994- Becker, Stirling: Beowulf Clusters Highly successful for parallel processing problems with low communication overhead Late 1990’s – GPU’s used to solve alternative problems 1998-2000 –Accelerated point visibility queries (Z-buffer queries) UNC (Dr. Manocha) – Volume rendering, Collision detection… (Optimizing data structures, coordinating CPU/GPU processing)

Our Task Compare performance using “generic” CTDB and OpenFlight Formats High-Level API – OpenSceneGraph (OSG) Free source – Extensible, Rapid Prototyping Active Community – Well Supported, Efficient Implementation Forces the use of an Update/Cull/Render paradigm

Our Algorithm Uses OpenGL extension called NV_Occlusion_Query (NVidia, ATI, MESA 6.0) allows query of the graphics hardware of how many pixels are rendered between the time a begin/end pair occlusion query call are performed originally created to determine if an object should be rendered our algorithm takes advantage of it to see what percentage of an entity is actually rendered

Update stage Update stage of the scene graph is where all data modifications are made that affect the location and properties of objects in the scene graph entities positions and orientations are updated along with all sensor orientations scene graph is traversed and the distance between each sensor and all entities is calculated algorithm has one call to the Update stage per time step

Cull Stage all geometry is checked against a view frustum to determine if is should be rendered. We apply Area-of-Interest (AOI) to further cull entities For this algorithm the render order is critical: All terrain and static objects should be rendered first as they will always occlude. Next all entities and dynamic objects are rendered in a front to back order (visibility of entities not occluded by closer objects)

Render stage All terrain and static objects are rendered first Each entity is rendered twice in front to back order wrapped with NV_Occlusion_Query begin/end calls first time an entity is rendered the depth buffer and color buffers are disabled to obtain the amount of pixels an entity uses with out being occluded entity is rendered again with the depth and color buffers enabled to obtain the amount of pixels actually visible Intervisibility = visible pixels/total pixels per entity

Hardware Specs Compute Node – Dual AMD Athlon 1.33 GHZ, 512 MB RAM, Fast Ethernet network GPU - NVIDIA GeForce FX5900 Chipset 256MB DDR SDRAM 400 MHz engine clock 850 MHz memory clock 400 MHz internal RAMDAC 300 Million vertices/ sec 3.6 Billion texels/ sec fill rate 27.2 GB/sec memory bandwidth 8 pixels per clock rendering engine

Distributed Calculations Front end distributes entitles at start in random order Preliminary algorithm - No load-balancing Load Imbalance ranges from 4% -30 % Current approach “Embarrassingly parallel” Each Node has full database Load Balancing optimization must have minimal communication overhead (global)

Load imbalance Example – 1 sensor/screen, 1-4 Nodes

Conclusions Use of multiple GPUs a scalable approach, with potential performance on the order of OTB Parallelization/GPU effective, parallelization/screen requires geometry LOD adjustment Approach has potential employment as “Intervisibility Server”.

Future work Implement Load balancing Optimize multiple sensor/screen cases by Level-of-detail adjustments Extend GPU cluster results to 16 Design and Implement Data Structure Optimizations Greater Employment of CPU