GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Advertisements

Normal Map Compression with ATI 3Dc™ Jonathan Zarge ATI Research Inc.
CS 352: Computer Graphics Chapter 7: The Rendering Pipeline.
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
WSCG 2007 Hardware Independent Clipmapping A. Seoane, J. Taibo, L. Hernández, R. López, A. Jaspe VideaLAB – University of A Coruña (Spain)
Week 7 - Monday.  What did we talk about last time?  Specular shading  Aliasing and antialiasing.
Rational Apex 4.0 Optimization “Beware the benchmark!”
Lecture 6: 3D Rendering Pipeline (III) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute of Technology.
Real-Time Rendering TEXTURING Lecture 02 Marina Gavrilova.
Rage Fury MAXX™. The Answer to today’s 3D dilemma High performance AND High quality AND Universal application acceleration.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
X86 and 3D graphics. Quick Intro to 3D Graphics Glossary: –Vertex – point in 3D space –Triangle – 3 connected vertices –Object – list of triangles that.
Associative Cache Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word (or sub-address in line) Tag.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Parallel Graphics Rendering Matthew Campbell Senior, Computer Science
PlayStation 2 Architecture Irin Jose Farid Momin Quy Ngo Olivia Wong.
Introduction to 3D Graphics John E. Laird. Basic Issues u Given a internal model of a 3D world, with textures and light sources how do you project it.
Presentation by David Fong
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
09/18/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Bump Mapping Multi-pass algorithms.
How a Computer Processes Data Hardware. Major Components Involved: Central Processing Unit Types of Memory Motherboards Auxiliary Storage Devices.
Computer Graphics Mirror and Shadows
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
1 Perception, Illusion and VR HNRS 299, Spring 2008 Lecture 19 Other Graphics Considerations Review.
CS 638, Fall 2001 Today Light Mapping (Continued) Bump Mapping with Multi-Texturing Multi-Pass Rendering.
1 Computer Graphics Week3 –Graphics & Image Processing.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
CSE 381 – Advanced Game Programming Basic 3D Graphics
Havok. ©Copyright 2006 Havok.com (or its licensors). All Rights Reserved. HavokFX Next Gen Physics on ATI GPUs Andrew Bowell – Senior Engineer Peter Kipfer.
NVIDIA PROPRIETARY AND CONFIDENTIAL Occlusion (HP and NV Extensions) Ashu Rege.
Nick Sims Like a motherboard, a graphics card is a printed circuit board that houses a processor and RAM.
UW EXTENSION CERTIFICATE PROGRAM IN GAME DEVELOPMENT 2 ND QUARTER: ADVANCED GRAPHICS Textures.
09/09/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Event management Lag Group assignment has happened, like it or not.
Measuring System Performance The speed of a computer is often referred to as THROUGHPUT. This is very difficult to measure. It can be done with Measures.
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
Advanced Computer Graphics Depth & Stencil Buffers / Rendering to Textures CO2409 Computer Graphics Week 19.
OpenGL ES Performance (and Quality) on the GoForce5500 Handheld GPU Lars M. Bishop, NVIDIA Developer Technologies.
Emerging Technologies for Games Alpha Sorting and “Soft” Particles CO3303 Week 15.
CS 638, Fall 2001 Multi-Pass Rendering The pipeline takes one triangle at a time, so only local information, and pre-computed maps, are available Multi-Pass.
Real-time Graphics for VR Chapter 23. What is it about? In this part of the course we will look at how to render images given the constrains of VR: –we.
GPU Computation Strategies & Tricks Ian Buck NVIDIA.
1 Perception and VR MONT 104S, Fall 2008 Lecture 21 More Graphics for VR.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
09/16/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Environment mapping Light mapping Project Goals for Stage 1.
- Laboratoire d'InfoRmatique en Image et Systèmes d'information
Emerging Technologies for Games Deferred Rendering CO3303 Week 22.
Dynamic Gaze-Contingent Rendering Complexity Scaling By Luke Paireepinart.
Copyright © Curt Hill Parallelism in Processors Several Approaches.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Graphics Interface 2009 The-Kiet Lu Kok-Lim Low Jianmin Zheng 1.
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
Shadows David Luebke University of Virginia. Shadows An important visual cue, traditionally hard to do in real-time rendering Outline: –Notation –Planar.
Pipeline Optimization Real-Time Rendering 김 송 국.
Program Reduction Focusing on Compaction Benjamin J. Fruchter MS CS Candidate March
Image Fusion In Real-time, on a PC. Goals Interactive display of volume data in 3D –Allow more than one data set –Allow fusion of different modalities.
2D Graphics Optimizations
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
- Introduction - Graphics Pipeline
Week 7 - Monday CS361.
Deferred Lighting.
CS451Real-time Rendering Pipeline
© University of Wisconsin, CS559 Fall 2004
3D Rendering Pipeline Hidden Surface Removal 3D Primitives
RADEON™ 9700 Architecture and 3D Performance
Frame Buffer Applications
Presentation transcript:

GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy

GDC March 1999Scalability - R Huddy Basic Objectives To produce the best experience on every users machine To exploit all of the resources available To cope with a broad spread of hardware To avoid ‘bottoming out’ during the shelf- life of the game / engine

GDC March 1999Scalability - R Huddy What is a high-end PC? A 125+ mega-texel device A 125+ mega-pixel device A fast CPU ( >= 350MHz) AGP 2X/4X Bus Lots of system RAM ( >= 64MB) Huge frame buffers (16 to 32 MB) Multi-Texture at low cost

GDC March 1999Scalability - R Huddy Power Trends CPU Speed Fill Rate Appreciate the absolute values and the ratios. ?

GDC March 1999Scalability - R Huddy So what’s the problem? Second generation hardware: A a Graphics bc CPU BC time A a Graphics bc CPU BC time Third generation hardware: Wow, 10% faster! BeginScene() EndScene()

GDC March 1999Scalability - R Huddy What can you do to help? Scalability is the key: Run at higher screen resolutions Run at higher color depths Use more complex rendering techniques on good hardware Ship multiple geometry models Protect your CPU Unlock the frame rate

GDC March 1999Scalability - R Huddy Higher Screen Resolutions 1) Include direct support for higher resolution modes (uses lots of disk space). 2) Store high resolution art and filter down to produce lower resolution art. 3) Store low resolution art and pixel double: If you have art at 512x384 use it for 1024x768 If you have art at 640x480 use it on 1280x1024 (but only use a 1280x960 viewport)

GDC March 1999Scalability - R Huddy Higher Color Depths Runs at much the same speed but gives the user a much richer experience Uses frame buffer memory constructively You can re-use the previous 16 bit assets The main performance loss in true color is often due to texture management But beware the Frame Buffer + Z Buffer depth constraint on Riva TNT

GDC March 1999Scalability - R Huddy Complex Rendering Techniques - I Environment Mapping –Beware of spending too much CPU on this. Dual Texture Lighting Bump Mapping Use more alpha transparency –But see also “Alpha sort issues” later on… Please try to use the extra fill rate!

GDC March 1999Scalability - R Huddy Trilinear mipmapping for almost everything Use Detail textures Large textures for extra realism 32 bit textures - where it’s a quality win Compressed textures as long as quality is not compromised Complex Rendering Techniques - II

GDC March 1999Scalability - R Huddy Protect your CPU The big ones: __ftol and other ‘type conversion’ nightmares sqrt() –that’ll be seventy cycles please... Reciprocal square root –One hundred and nine cycles through the FPU… Transform and lighting (more on that later)

GDC March 1999Scalability - R Huddy Removing __ftol Remember that the compiler doesn’t have a choice but you can check the output Write you own inline assembler conversion routine if… –You can accept differing rounding rules This doesn’t break the optimiser!

GDC March 1999Scalability - R Huddy Replacement for sqrt() Sqrt seems ‘natural’ if you are normalising vectors, calculating environment map coordinates or calculating distances - but it’s sloooow Sample code is available from the developer web site or from me directly and will be in future versions of the SDK.

GDC March 1999Scalability - R Huddy Saturation Arithmetic (C) Limiting a floating point number to lie in the range 0.0 to 1.0 inclusive (traditional method): if (f < 0.0) f = 0.0; else if (f > 1.0) f = 1.0;

GDC March 1999Scalability - R Huddy Saturation Arithmetic (Pentium) if (*(long *)&f < 0) *(long *)&f = 0; else if (*(long *)&f > 0x3f800000) *(long *)&f = 0x3f800000; This is faster on a Pentium class processor since the FPU is “non-optimal” (i.e. slow) and the integer unit is much faster.

GDC March 1999Scalability - R Huddy Saturation Arithmetic (Pentium II) Use the “cmov” instructions: cmp[f],0 cmovb[f],0 cmp[f],3f cmova[f],3f Faster since unpredictable branches are the bottleneck here. Unavailable on a Pentium.

GDC March 1999Scalability - R Huddy Unlock the Frame Rate It’s essential that your physics model can run at high refresh rates. –At least 100fps 30 or 60 fps limits are not acceptable and lead to flat performance on high end hardware

GDC March 1999Scalability - R Huddy The Value of Batching Case Specifics: The average # of ‘Polys Per Call’ (PPC) to DrawPrimitive was 2.6, producing 40fps Removing state changes to raise the average PPC to ~50 produced 58fps –Most of the removed state changes were “reasonable”, i.e. not logically redundant –The changes did not reduce visual quality at all –PPC of 200 is optimal

GDC March 1999Scalability - R Huddy Alpha Sort Issues The “standard” solution is… 1) Draw all non-alpha polys (sort by texture) 2) Draw all alpha polys in back to front order with Z compare enabled and Z update disabled. This copes with overlapping alpha polys but you can’t sort by texture. (Intersection requires decimation).

GDC March 1999Scalability - R Huddy Alpha Sort with Bounding Boxes When you are ready to draw your alpha polys then draw non-overlapping sets using the sort-by-texture technique as before A B C Viewport Here, you can safely draw all of A before any of B or C… B&C need sorting

GDC March 1999Scalability - R Huddy Geometry - Part 1 Use the DX6 Transform and Clip engine - it’ll be nearly as fast as your best efforts It takes advantage of CPU specific optimisations done by Intel, AMD etc. It uses the guard band clipping region to enhance performance Use the DX7 interface ASAP

GDC March 1999Scalability - R Huddy Geometry - Part 2 This gets you ready for hardware which can do the job much faster than the CPU Tell the chip designers if you need anything non-standard If you think DX is too slow then use a run- time benchmark to select between DX and your own code

GDC March 1999Scalability - R Huddy DIPVB() Geometry - Part 3 Use the DX pipeline for geometry which may be rendered Use your own transform for bounding boxes, collisions, portals etc Treat hardware T&L as –Write only –Not necessarily pixel identical to CPU T&L

GDC March 1999Scalability - R Huddy Geometry - Part 4 Consider choosing between models at game start-up time More complex Geometry should be several times more complex Introduce some LOD management Your artists are probably generating more complex models and then throwing them away

GDC March 1999Scalability - R Huddy Lighting - Part 1 If the DX Lighting model is good enough then there are people who want to help you Multi-texture shadow maps and light maps can be very fast now –remember that (multi-pass != multi-texture) Tell the chip companies what you need

GDC March 1999Scalability - R Huddy Lighting - Part 2 Support more lights User a richer set of light types Scale with available power If you have more complex geometry you get better lighting quality

GDC March 1999Scalability - R Huddy Summary Use the D3D pipeline as much as possible ‘Use’ the CPU carefully- ‘Abuse’ the fill rate Get on board with DX7 Offer the richest experience possible You may have to treat the PC as two distinct platforms, ‘High-end’ and ‘Low-end’

GDC March 1999Scalability - R Huddy Questions ? Richard Huddy ? ? ? ? ? ?