Presentation is loading. Please wait.

Presentation is loading. Please wait.

Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture.

Similar presentations


Presentation on theme: "Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture."— Presentation transcript:

1 Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture Group Computer Systems Laboratory Stanford University

2 Today’s Best Hardware Commercial hardware:  Fast  Cheap  Ubiquitous Flexibility limited OpenGL scenes: Programmable streams deliver comparable performance. Frame from Quake 3 Arena, © id Software

3 Today’s Best Software Today’s software solutions:  Powerful and flexible  Slow OpenGL scenes: Streams deliver 20x performance. Frame from A Bug’s Life, © Pixar Animation Studios, 1998

4 The Vision + Performance of a special-purpose processor Programmability of a general-purpose processor “Real-Time Renderman”

5 Outline What is stream processing? The Imagine architecture Polygon rendering on a stream architecture Results Conclusions

6 Kernels and Streams A stream is a set of elements of an arbitrary datatype. A computational kernel operates on streams. Kernel Streams Transform

7 Stream Processing All data is streams! 2 levels of programming:  Stream-level code  Kernel-level code Transform Shader Z Buffer Zcompare Color Buffer z, color z z, color offset

8 Media Apps and Streams Producer-consumer locality High arithmetic requirements Homogeneous computation  Efficient control  Data parallelism … poor match for microprocessors Transform Shader Z Buffer Zcompare Color Buffer z, color z z, color offset

9 The Imagine Architecture

10 Bandwidth Hierarchy 4GB/s32GB/s SDRAM Stream Register File ALU Cluster 544GB/s ALU Cluster SIMD/VLIW Control Peak BW:

11 Cluster Organization

12 Imagine Stats & Status 0.59 cm 2 CMOS chip  500 MHz Circuits/Logic: expected completion 9/15/00 Tapeout: expected Q4/2000  Fab: TI GS30KA process (0.15  m drawn)

13 Polygon Rendering Outline Overview of OpenGL pipeline How we map OpenGL into streams & kernels How stream operations are sequenced How kernels are mapped onto Imagine  Use of stream recirculation  Detail of 3 steps in the pipeline:  Matrix transformation  Scan conversion  Enforcing ordering in composite stage

14 OpenGL Pipeline Overview Application Geometry Rasterization Image Composite OpenGL: Has state Requires immediate mode Respects ordering

15 Pipeline Detail Transform GLShader Primitive Assembly Cull Project Geometry Spanprep Spangen Spanrast Texture Lookup Rasterization Hash Z Lookup Zcompare Compact Color, Z Write Composite Image Input Data Sort / Merge

16 Pipeline Stream Datatypes Transform GLShader Primitive Assembly Cull Project Geometry Spanprep Spangen Spanrast Texture Lookup Rasterization Hash Z Lookup Zcompare Compact Color, Z Write Composite Image Sort / Merge vertices trianglesspansfragments offsets depths Most data is floating point.

17 Stream Recirculation Transform Memory SRFClusters Shader Z Buffer Zcompare Color Buffer z, color z z, color offset Strip-mining Memory accesses:  Initial load of vertices  Lookup of color/z/texture  Writeback of color/z All other data accesses are local to the SRF

18 Stream and Kernel Flow xform project assemble rasterize zcompare Z load Z store Color store Texture load Vertex load for next batch xform CLUSTERS MEM STR 0MEM STR 1 Excerpt from ADVS-1 run

19 Mapping Xform to Imagine RAM SRF Cluster Transform Memory SRFClusters

20 SRF Cluster Mapping Spanrast to Imagine Spanrast Memory SRFClusters

21 Enforcing ordering General sort possible  But too expensive Hash much cheaper!  Hash function: 12 bits  Low 6 bits of x, low 6 bits of y  Hash table: 2 12 entries  2 bits/entry  16 words/scratchpad/ cluster Compact: Enforces ordering constraint Compact Sort Hash Merge Zcompare

22 Image Composition RAM SRF Cluster Memory SRFClusters Z Buffer Zcompare Offset, z, color z z, color offset Color Buffer

23 Benchmarks ADVS-1: 62k vertices as point-sampled polygons (SPECviewperf 6.1.1 Advanced Visualizer) ADVS-8: mipmapped version of ADVS-1 Sphere: 82k lit, Gouraud- shaded triangles; 3 positional lights Fill: 20k mipmapped 25- pixel triangles ADVS Sphere

24 Experimental setup Comparison systems:  Microsoft opengl32.dll (sustained)  NVIDIA Quadro (sustained)  NVIDIA Quadro (peak) Test system: 450 MHz PIII Xeon, NT 4.0 For comparison:  Low overhead trace player (no appn. overhead)  Average over 100s of frames (no startup costs)  Disabled vsync

25 Results Summary

26 Stream-level Performance Computation, not memory, bound  Highest memory system occupancy: 58.7% Cluster occupancy: 94.3% - 98.8%  Reuse 5.6 GOPS on Sphere CLUSTERS MEM STR 0MEM STR 1

27 Imagine Kernel Breakdown Majority of time is in rasterization  ADVS-8 has 2.5x ops/frame than ADVS-1 ADVS-8

28 Future Directions Extend generality of OpenGL pipeline  Add more complex scenes Programmable shading and lighting  Straightforward to add per-vertex/per-fragment ops  Eliminate multipass  Goal: “Toolbox” of flexible elements Non-polygon rendering: raytracing, IBR, … Scalability: multi-Imagine implementations

29 Conclusions Streams: Powerful primitive Stream architectures: Enable high performance Flexibility of general-purpose processor  20x better frame rates than commercial software Performance of special-purpose processor  Comparable frame rates to commercial hardware

30 Acknowledgements DARPA Industrial sponsors  Texas Instruments  Intel Corporation Matthew Eldridge and Kekoa Proudfoot Brian Towles and Brucek Khailany Anonymous reviewers for helpful comments The US Passport Office  same-day turnaround!


Download ppt "Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture."

Similar presentations


Ads by Google