Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture.

Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture Group Computer Systems Laboratory Stanford University

Today’s Best Hardware Commercial hardware:  Fast  Cheap  Ubiquitous Flexibility limited OpenGL scenes: Programmable streams deliver comparable performance. Frame from Quake 3 Arena, © id Software

Today’s Best Software Today’s software solutions:  Powerful and flexible  Slow OpenGL scenes: Streams deliver 20x performance. Frame from A Bug’s Life, © Pixar Animation Studios, 1998

The Vision + Performance of a special-purpose processor Programmability of a general-purpose processor “Real-Time Renderman”

Outline What is stream processing? The Imagine architecture Polygon rendering on a stream architecture Results Conclusions

Kernels and Streams A stream is a set of elements of an arbitrary datatype. A computational kernel operates on streams. Kernel Streams Transform

Stream Processing All data is streams! 2 levels of programming:  Stream-level code  Kernel-level code Transform Shader Z Buffer Zcompare Color Buffer z, color z z, color offset

Media Apps and Streams Producer-consumer locality High arithmetic requirements Homogeneous computation  Efficient control  Data parallelism … poor match for microprocessors Transform Shader Z Buffer Zcompare Color Buffer z, color z z, color offset

The Imagine Architecture

Bandwidth Hierarchy 4GB/s32GB/s SDRAM Stream Register File ALU Cluster 544GB/s ALU Cluster SIMD/VLIW Control Peak BW:

Cluster Organization

Imagine Stats & Status 0.59 cm 2 CMOS chip  500 MHz Circuits/Logic: expected completion 9/15/00 Tapeout: expected Q4/2000  Fab: TI GS30KA process (0.15  m drawn)

Polygon Rendering Outline Overview of OpenGL pipeline How we map OpenGL into streams & kernels How stream operations are sequenced How kernels are mapped onto Imagine  Use of stream recirculation  Detail of 3 steps in the pipeline:  Matrix transformation  Scan conversion  Enforcing ordering in composite stage

OpenGL Pipeline Overview Application Geometry Rasterization Image Composite OpenGL: Has state Requires immediate mode Respects ordering

Pipeline Detail Transform GLShader Primitive Assembly Cull Project Geometry Spanprep Spangen Spanrast Texture Lookup Rasterization Hash Z Lookup Zcompare Compact Color, Z Write Composite Image Input Data Sort / Merge

Pipeline Stream Datatypes Transform GLShader Primitive Assembly Cull Project Geometry Spanprep Spangen Spanrast Texture Lookup Rasterization Hash Z Lookup Zcompare Compact Color, Z Write Composite Image Sort / Merge vertices trianglesspansfragments offsets depths Most data is floating point.

Stream Recirculation Transform Memory SRFClusters Shader Z Buffer Zcompare Color Buffer z, color z z, color offset Strip-mining Memory accesses:  Initial load of vertices  Lookup of color/z/texture  Writeback of color/z All other data accesses are local to the SRF

Stream and Kernel Flow xform project assemble rasterize zcompare Z load Z store Color store Texture load Vertex load for next batch xform CLUSTERS MEM STR 0MEM STR 1 Excerpt from ADVS-1 run

Mapping Xform to Imagine RAM SRF Cluster Transform Memory SRFClusters

SRF Cluster Mapping Spanrast to Imagine Spanrast Memory SRFClusters

Enforcing ordering General sort possible  But too expensive Hash much cheaper!  Hash function: 12 bits  Low 6 bits of x, low 6 bits of y  Hash table: 2 12 entries  2 bits/entry  16 words/scratchpad/ cluster Compact: Enforces ordering constraint Compact Sort Hash Merge Zcompare

Image Composition RAM SRF Cluster Memory SRFClusters Z Buffer Zcompare Offset, z, color z z, color offset Color Buffer

Benchmarks ADVS-1: 62k vertices as point-sampled polygons (SPECviewperf 6.1.1 Advanced Visualizer) ADVS-8: mipmapped version of ADVS-1 Sphere: 82k lit, Gouraud- shaded triangles; 3 positional lights Fill: 20k mipmapped 25- pixel triangles ADVS Sphere

Experimental setup Comparison systems:  Microsoft opengl32.dll (sustained)  NVIDIA Quadro (sustained)  NVIDIA Quadro (peak) Test system: 450 MHz PIII Xeon, NT 4.0 For comparison:  Low overhead trace player (no appn. overhead)  Average over 100s of frames (no startup costs)  Disabled vsync

Results Summary

Stream-level Performance Computation, not memory, bound  Highest memory system occupancy: 58.7% Cluster occupancy: 94.3% - 98.8%  Reuse 5.6 GOPS on Sphere CLUSTERS MEM STR 0MEM STR 1

Imagine Kernel Breakdown Majority of time is in rasterization  ADVS-8 has 2.5x ops/frame than ADVS-1 ADVS-8

Future Directions Extend generality of OpenGL pipeline  Add more complex scenes Programmable shading and lighting  Straightforward to add per-vertex/per-fragment ops  Eliminate multipass  Goal: “Toolbox” of flexible elements Non-polygon rendering: raytracing, IBR, … Scalability: multi-Imagine implementations

Conclusions Streams: Powerful primitive Stream architectures: Enable high performance Flexibility of general-purpose processor  20x better frame rates than commercial software Performance of special-purpose processor  Comparable frame rates to commercial hardware

Acknowledgements DARPA Industrial sponsors  Texas Instruments  Intel Corporation Matthew Eldridge and Kekoa Proudfoot Brian Towles and Brucek Khailany Anonymous reviewers for helpful comments The US Passport Office  same-day turnaround!

Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture.

Similar presentations

Presentation on theme: "Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture.

Similar presentations

Presentation on theme: "Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture."— Presentation transcript:

Similar presentations

About project

Feedback