Electronic Visualization Laboratory University of Illinois at Chicago “Sort-First, Distributed Memory Parallel Visualization and Rendering” by E. Wes Bethel, Greg Humphreys, Brian Paul, J. Dean Brederson Presented by: Allan Spale, CAVERN Viz Workshop, May 2004
Electronic Visualization Laboratory University of Illinois at Chicago Overview Permits the use of commodity computing and graphics hardware General purpose viz system using sort-first Chromium + OpenRM = parallel viz –OpenRM: Scene graph for high-performance visualization –Chromium: Rendering server that processes and routes OpenGL commands
Electronic Visualization Laboratory University of Illinois at Chicago Background and Related Work Sort-first –Transmits geometry without consuming a lot of bandwidth –Unpredictable transmission patterns and loads –Eliminates ordering constraints Sort-last –Maintains predictable loads and communication patterns –Transmits pixel data Each processor renders subset of image –Requires ordering constraints Hybrid –Distribute models in a scene graph, pre-fetch geometry
Electronic Visualization Laboratory University of Illinois at Chicago Architecture Processor tasks 1.Read, process, render part of data 2.Create geometry from data and store in local scene graph 3.Traverse scene graph and generate OpenGL commands 4.Chromium intercepts commands and routes to the appropriate render server
Electronic Visualization Laboratory University of Illinois at Chicago Distrib. Memory Parallel Scene Graph: OpenRM Features –Runs on UNIX/Linux, Windows –Together with Chromium, supports distributed memory parallel applications –Compatible with CAVElib –Threadsafe, pipelined-parallel rendering Usage –Application responsible for creating synchronization scene object –Each processor, in parallel, starts renderer of scene graph and Chromium collects and routes OpenGL commands created from the graph traversal Chromium synchronization constructs allow for synchronization of many streams
Electronic Visualization Laboratory University of Illinois at Chicago Distrib. Memory Parallel Scene Graph: Synchronization Depth-first graph traversal Chromium barrier used by OpenRM in order to synchronize global operations –Framebuffer clear –Swap buffer Synchronization needed to organize drawing constraints –Back-to-front ordering Octmesh usage preserves ordering in a single primitive; solved using render-order callbacks Each processor has metadata for all grid blocks but not its data
Electronic Visualization Laboratory University of Illinois at Chicago Dataset Used In Rendering App Turbulence flow simulation data –Floating-point values in 640 x 256 x 256 Decomposed into 64 x 64 x 64 blocks Arranged into 10 x 4 x 4 block grid –Blocks given to each processor in round-robin style This block distribution helps with load balancing
Electronic Visualization Laboratory University of Illinois at Chicago Results Using Sort-First Rendering, OpenRM, and Chromium 3D texture traffic changes with respect to the number of displays (Figure 7) More rendering servers (RS) increases total traffic (Figure 8) For a switched network, bandwidth is maximum of each inbound data stream for each of the six RS (Figure 9) Bandwidth drops with increase in RS despite increase in aggregate amt of data transferred in switched net
Electronic Visualization Laboratory University of Illinois at Chicago Results Using Sort-First Rendering, OpenRM, and Chromium (Figure 13) LOD generally consumes less bandwidth than full resolution as long as only a few views are needed (Figure 14) LOD generally sends less 3D texture data although sometimes duplicated data sent (Figure 15) Because of sort-first overhead, increasing displays and larger, narrower blocks leads to more data duplication
Electronic Visualization Laboratory University of Illinois at Chicago Summary Highlights –Sort-first distributed, parallel viz system using Chromium and OpenRM Distributed scene graph with synch render ops using Chromium Pros –Scalable performance characteristics –Supports LOD –Sort-first uses less bandwidth than sort-last Cons –Hurt by jitter between rendering and computation servers –Poor blocking results in duplication of data –Lots of changes in view results in increased bandwidth needs