1 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Optimizing DirectX* Graphic Applications.

Slides:



Advertisements
Similar presentations
CS123 | INTRODUCTION TO COMPUTER GRAPHICS Andries van Dam © 1/16 Deferred Lighting Deferred Lighting – 11/18/2014.
Advertisements

DSPs Vs General Purpose Microprocessors
1 Optimizing compilers Managing Cache Bercovici Sivan.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
CS-378: Game Technology Lecture #9: More Mapping Prof. Okan Arikan University of Texas, Austin Thanks to James O’Brien, Steve Chenney, Zoran Popovic, Jessica.
9/25/2001CS 638, Fall 2001 Today Shadow Volume Algorithms Vertex and Pixel Shaders.
The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.
Introduction to Geometry Shaders Patrick Cozzi Analytical Graphics, Inc.
Shading Languages GeForce3, DirectX 8 Michael Oswald.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
Status – Week 231 Victor Moya. Summary Primitive Assembly Primitive Assembly Clipping triangle rejection. Clipping triangle rejection. Rasterization.
Status – Week 277 Victor Moya.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future.
Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU Mauricio Breternitz Jr, Herbert Hum, Sanjeev.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Computer Organization
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Enhancing GPU for Scientific Computing Some thoughts.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Computer Graphics Graphics Hardware
4.7. I NSTANCING Introduction to geometry instancing.
Geometric Objects and Transformations. Coordinate systems rial.html.
Graphics Graphics Korea University cgvr.korea.ac.kr 1 Using Vertex Shader in DirectX 8.1 강 신 진
Kenneth Hurley Sr. Software Engineer
Week 2 - Wednesday CS361.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
UW EXTENSION CERTIFICATE PROGRAM IN GAME DEVELOPMENT 2 ND QUARTER: ADVANCED GRAPHICS Textures.
Cg Programming Mapping Computational Concepts to GPUs.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
The programmable pipeline Lecture 3.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
09/16/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Environment mapping Light mapping Project Goals for Stage 1.
A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation Presented by Han-Wei Shen.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Lab: Vertex Shading Chris Wynn Ken Hurley. Objective Hands-On vertex shader programming Start with simple programs … Part 1: Textured-Lit Teapot.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
09/25/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Shadows Stage 2 outline.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Mesh Skinning Sébastien Dominé. Agenda Introduction to Mesh Skinning 2 matrix skinning 4 matrix skinning with lighting Complex skinning for character.
Sunpyo Hong, Hyesoon Kim
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Computer Graphics Graphics Hardware
GPU Architecture and Its Application
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
- Introduction - Graphics Pipeline
5.2 Eleven Advanced Optimizations of Cache Performance
Graphics Processing Unit
Deferred Lighting.
Chapter 6 GPU, Shaders, and Shading Languages
CS451Real-time Rendering Pipeline
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Introduction to Programmable Hardware
Computer Graphics Graphics Hardware
Where does the Vertex Engine fit?
UMBC Graphics for Games
Computer Graphics Introduction to Shaders
Presentation transcript:

1 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Optimizing DirectX* Graphic Applications using Software Vertex Processing Ronen Zohar/Kim Pallister Intel Corporation

2 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Agenda Do I need SW vertex processing? The PSGP Using SW vertex processing for maximum performance: memory, batching and render-states SW vertex processing and DirectX*’s 8.0 new features

3 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Do I need SW vertex processing? Your publisher wants: –Eye-candy graphics, using all the latest 3D features –Lower the “minimum system requirements” –and many more Problem: older systems does not support all the eye- candy features Solution1: Disable features for low-end systems Solution2: Use SW vertex processing (at least for the features that you can) and keep some features

4 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Inside DirectX* Graphics Driver Application API Front-end Communication to the driver (DDI) SW Vertex processing (PSGP) DirectX run-time HW vertex processing path

5 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 PSGP – Processor Specific Geometry Pipeline Part of DirectX graphics responsible for the SW vertex processing algorithms, optimized for the client’s processor DirectX’s 8.0 PSGP is optimized for: –Intel® Pentium® III processor –Intel Pentium 4 processor

6 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 The PSGP VB Map stream to registers Execute vertex shader code Vertex shader path TransformationLightingTex Gen Fixed function path Format data to output- FVF Internal temporary VB’s Clipper IB To driver

7 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 PSGP Principles Use SIMD to process multiple vertices in each iteration –Vertical processing –Data is swizzled on the fly Prefetch input streams to hide memory latency Write output to temporary VB’s based on XYZRHW FVF code –In system memory if need to read back transformed vertices –In driver memory if no read-back is required –More on this later…

8 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Input Stream Memory Allocation Create SW processed primitives in system memory (using the D3DUSAGE_SOFTWAREPROCESSING usage create flags). If the same VB is processed both in SW and HW –Try to avoid it –If you must - create multiple copies, one in system and one in driver memory If the primitive is never clipped, use the D3DUSAGE_DONOTCLIP usage flag

9 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Primitive Batching Batch all the SW processed primitives together SW processes the entire VB range that you submit, if multiple primitives are using the same VB – squeeze the vertices range As with HW, bigger primitives are always better (the PSGP have long setup)

10 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Primitive Batching (Cont) The PSGP is batching the processed vertices before sending them to HW (to reduce HW’s VB changes) Primitives are batched as long as their output FVF is equal: –XYZ | NORMAL | TEX1 and XYZ | DIFFUSE | TEX1 have the same output FVF ( XYZRHW | DIFFUSE | TEX1 ) –In SW mode, changing the VB FVF does not mean a slowdown (unlike HW)

11 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Clipping Render-state When clipping is enabled, the PSGP –Stores its output to system memory buffer As it need to read vertices in order to clip –Driver need to copy it across the AGP When clipping disabled writes to driver allocated buffer –No Copy here! –Calculates clip flags (out-codes) for each vertex more execution cycles per vertex –Clips Minimize the amount of clipping Use bounding boxes/spheres on your objects Don’t forget to take the guard-band into account

12 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Clipping Render-state (Cont) Pseudo-code to minimize clipping –If (BB is outside screen) Don’t render primitive –Elseif (BB is inside guard-band) Render with clipping off –Else Render with clipping on Typical game scene should have <10% of primitives clipped –Biggest problem is front plane clipping

13 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Performance Render-states Specular – very expensive LocalViewer – smaller performance impact than HW, but still costs more NormalizeNormals – extra work for the PSGP, use only when needed Fog – written as “specular alpha”, can change PSGP’s output FVF

14 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 DirectX* 8.0 Graphics New Features Point sprites Tweening Indexed vertex blending/ Indexed palette skinning Vertex Shaders

15 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Point Sprites PSGP writes in native FVF format If HW does not support –Each point is expanded to quad, using the point size calculated –The quad list is submitted to the driver Very slow solution if no HW support for point sprites, try to avoid it

16 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Tweening Tween the position and normal before transformation (in SIMD) After tweening continuous the “standard” PSGP flow Costs very few cycles –But, for tweening and transformation only a vertex shader would run faster –Try to compare your exact scenario to a vertex shader

17 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Indexed Skinning Transforms all vertices to matrix0 space –Using scalar code, with lookup for the needed matrix Than continuous the normal PSGP flow DirectX* 7 style skinning is supported by some HW and may run faster, but requires multiple models and DrawPrimitive calls

18 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Vertex shaders At vertex shader creation –The shader code is compiled to equivalent IA32 code –Using all possible assembly optimizations and instructions available on client’s CPU to achieve fastest code At vertex shader execution –Calling the generated code SW vertex shaders have excellent performance

19 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 SW Vs. HW Vertex Shaders Calculates more than one vertex in a single iteration –Based on the processor SIMD width Not every shader instruction is 1 clock –But, the CPU runs with much higher frequency than today’s 3D graphics chips

20 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 SW Vs. HW Vertex Shaders (Cont) Simple compilation sample: –Mul r0.xyz,v0,c0 Movapsxmm0,[v0.x] Mulpsxmm0,[c0.x] Movapsxmm1,[v0.y] Mulpsxmm1,[c0.y] Movapsxmm2,[v0.z] Mulpsxmm2,[c0.z] Movaps[r0.x],xmm0 Movaps[r0.y],xmm1 Movaps[r0.z],xmm2

21 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 SW Vs. HW Vertex Shaders (cont) Data that you write, is data that the CPU have to calculate –Write only needed data (using the vertex shader write mask) –Use the swizzle modifiers, and don’t duplicate written data Vertex shader instructions are blended to achieve maximum performance –But, keeping dependency chains squeezed will help the compiler in physical register assignments

22 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Performance Tips for SW Vertex Shaders m?x? macros have better performance than the un-expanded macros Try to minimize the use of the address register –Due to the parallelism of the SW vertex shader –Sort the VB by values used in the address register

23 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Performance Tips for SW Vertex Shaders lit, expp and logp are big cycle consumers –Use the worse accuracy (i.e. expp.x) when possible –Use either.x or.z (but not both) –exp and log are worse than expp, logp Don’t implicitly saturate color values –it is done automatically

24 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Optimized Vertex Shader dp4 oPos.x, v0, c2 dp4 oPos.y, v0, c3 dp4 oPos.z, v0, c4 dp4 oPos.w, v0, c5 add r1, c6,-v0 dp3 r2, r1, r1 rsq r2, r2 mov oT0, v2 mul r1,r1,r2 dp3 r3, v1, r1 max r3,r3,c8 add r3, r3, c7 min oD0,r3,c9 m4x4 oPos, v0, c[2] add r1.xyz, c6,-v0 dp3 r2.w, r1, r1 rsq r2.w, r2.w mul r1.xyz,r1,r2.w dp3 r2.w, v1, r1 max r2.w,r2.w,c8 add oD0.xyz, r2.w, c7 mov oT0, v2

25 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Questions?? Intel, Pentium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Copyright © 2001 Intel Corp.

26 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Backup

27 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Tweening + transformation vertex shader Mul r0.xyz,v0,c0.x // c0.x – α Mad r0.xyz,v1,c0.y,r0 // c0.y – (1- α) M4x4oPos,r0,c1 MovoD[0].xyz,v2 MovoT[0].xy,v3

28 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Not Equal Address Value Const register file (x4) 1.0f 2.0f 3.0f 1212 Address register (x4) Need to re-arrange a combination register for the SIMD instruction to use (costs ~20 cycles) 1.0f2.0f1.0f2.0f Instruction argument

29 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Equal Address Value Const register file (x4) 1.0f 2.0f 3.0f 2222 Address register (x4) Accessing directly the x4 constant register file. No penalty for “re-arranging” vertices Address accessing mode is selected when storing address value Instruction argument