Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Sorting Classification of Parallel Rendering Molnar et al., 1994.

Similar presentations


Presentation on theme: "A Sorting Classification of Parallel Rendering Molnar et al., 1994."— Presentation transcript:

1 A Sorting Classification of Parallel Rendering Molnar et al., 1994

2 What’s this about? Describes a classification scheme for comparing parallel rendering systems Based on where the sort from object coordinates to screen coordinates occurs Supports analysis of computational and communication costs and encompasses current and proposed highly parallel renderers (both hardware and software)

3 Parallel rendering as a sorting problem The classification scheme is for “Fully Parallel” systems, i.e. rendering rate is high enough that geometry processing and rasterization must be done in parallel geometry processing (transformation, clipping, lighting) rasterization (scan conversion, shading, visibility determination) Usually geometry is parallelized by assigning each processor a subset of primitives (objects) in a scene And rasterization is parallelized by assigning each processor a portion of the pixel calculations Mainly by rendering we are trying to calculate the effect each primitive (object) has on a pixel Since a primitive may be on or off the screen, rendering can be viewed as a problem of sorting primitives to the screen [Sutherland] For “fully parallel” renders, sorting involves a redistribution of data between processors because the responsibility for primitives and pixels is distributed

4 Parallel rendering (cont) The location of the sort determines the structure of the resulting parallel rendering system and in general the sort can take place anywhere in the rendering pipeline. sort-first : sort during geometry processing redistributing “raw” primitives before their screen space parameters are known sort-middle : sort between geometry processing and rasterization redistributing screen-space primitives sort-last : sort during rasterization redistributing pixels, samples or pixel fragments

5 Sort-first No known implementations? Goal is to distribute primitives early in the rendering pipeline to processors that can do the remaining rendering calculations. Done by dividing the screen into disjoints regions and making processors (renderers) responsible for all rendering calculations that affect their screen region. Initially, primitives are assigned to renderers in an arbitrary fashion. At the start of rendering, each renderer does enough transformation to determine which region(s) each primitive falls. This is pre- transformation. If a renderer contains a primitive which does not belong in it’s screen region, the primitive is redistributed over an interconnect network to the appropriate renderer(s) which perform the remaining geometry processing and rasterization for the primitive. Redistribution (at the beginning of rendering) is the distinguishing feature of sort-first.

6 Analysis Advantages: Communication requirements are low when the tessellation ratio and the degree of oversampling are high, or when frame-to-frame coherence can be exploited Processing nodes implement the entire rendering pipeline for a portion of the screen Disadvantages: Susceptibility to load imbalance. Primitives may clump into regions, concentrating the work on a few renders. Necessity of retained mode and complex data handling code to take advantage of frame-to-frame coherence.

7

8 Sort-middle Most common approach. Primitives are redistributed in the middle of the rendering pipeline between geometry processing and rasterization. At the point of redistribution, primitives have been transformed into screen coordinates and are ready for rasterization. Many systems use separate processors for geometry processing and rasterization so this is a natural point to divide the pipeline. In sort-middle, geometry processors are assigned arbitrary subsets of the primitives to be displayed and rasterizer(s) are assigned a portion of the display screen During each frame, geometry processors transform, light, etc their portion of the primitives and classify them with respect to screen region boundaries. Then they transmit all these screen-space primitives to the appropriate rasterizer(s).

9 Analysis Advantages: General and straightforward Redistribution occurs at natural place in the pipeline Disadvantages: High communication costs if the tessellation ratio is high Susceptibility to load imbalance between rasterizers when primitives are distributed unevenly over the screen

10 Sort-last Sorting deferred until end of rendering pipeline, after primitives have been rasterized Each renderer is assigned an arbitrary subset of the primitives Each renderer computes pixel values for its primitives Pixel values are then transmitted over interconnect network to compositing processors which resolve the visibility of the pixels from each renderer Two approaches: SL-sparse: minimizes communication by distributing only those pixels actuall produced by rasterization SL-full: stores and transfers a full image from each renderer

11 Analysis Advantages: Processing nodes implement the entire rendering pipeline for a portion of the primitives Less prone to load imbalance SL-full merging can be embedded in a linear network, making it linearly scalable Disadvantages: Pixel traffic can be extremely high

12

13 Talisman: Commodity Realtime 3D Graphics for the PC Jay Torborg and Jim Kajiya, Microsoft Corporation 1996

14 What is it? New architecture for 3D graphics Cost $200-$300 Requirements? Smooth motion, synchronized with sound and video and low-latency interaction (want real-time at 72-85Hz)

15 Limitations of Traditional Architectures High Memory Bandwidth Requirement System Latency Cost/Memory Cost

16

17

18 Composited Image Layers Independent image layer for each non- interpenetrating object in the scene Each object can be updated independently (optimize updates based on priority) Layers can be arbitrary size and shape Image layers are composited at video rates Support image layer transformations at video rates (scaling, rotation) Typically, the same rendered image can be used for 4 frames

19 Image Compression Used for textures and image layers Lossless and lossy compression supported Significantly reduces bandwidth and capacity requirements 16:1 texture compression 5:1 image layer compression

20 Chunking Each image layer is rendered in 32x32 chunks All polygons for a 32x32 chunk are rendered before proceeding to next chunk Allows 32x32 depth buffer to be on-chip Anti-aliasing supported with depth buffering and translucency using on-chip fragment buffer

21 High Quality Rendering Anisotropic filtering of textures Multipass rendering Shadows, spot lights, fog Antialiasing

22

23 Reference Hardware Targets high-end consumer PC market Uses PCI expansion bus Replaces: Windows accelerator board 3D accelerator board MPEG playback board Video conferencing board Sound board modem

24

25 Polygon Object Processor Polygons are processed in 32x32 chunks Initial Evaluation Computes intersection of a chunk with a triangle and computes the values for color, transparency, depth and texture coordinates for the starting point of the triangle within the chunk Pixel Engine Performs pixel level calculations (compositing, depth buffering, fragment generation) for pixels which are partially covered Fragment Resolve Performs final anti-aliasing by resolving depth sorted pixel fragments with partial coverage or transparency

26 Image Layer Compositor Responsible for generating the graphics output from a collection of depth sorted image layers Locked to the video refresh Data structure maintains z-sorted list of image layers visible in each 32 scanline region Performs affine transforms (scaling, rotation) Passes pixel and alpha data to compositing buffer at four pixels per clock cycle

27 Compositing Buffer Simple specialty memory Contains two 32-scanline buffers for double buffering of scanline regions one sl buffer for compositing the other for display

28 Software DirectDraw, Direct3D, DirectSound Media DSP Real-time kernel on DSP for scheduling and load balancing

29 Performance High Resolution Display 1344x1024 @ 75Hz 24 bit color at all resolutions 20-30k polygon scene complexity 40 MPix/sec w/ anisotropic texturing and anti-aliasing


Download ppt "A Sorting Classification of Parallel Rendering Molnar et al., 1994."

Similar presentations


Ads by Google