Kenneth Hurley Sr. Software Engineer

Kenneth Hurley Sr. Software Engineer khurley@nvidia.com

NVIDIA Corporation What are the problems we are seeing when 3D engines are written? Misuse of Vertex Buffers Concurrency Limitations Frame Rate Limiters Non-Optimized surface usage Cache misses Data Ordering

NVIDIA Corporation Misuse of Vertex Buffers Bad Things can happen unless you know the “right” way to use a vertex Buffer Dynamic vertex buffer vs. static vertex buffers When creating the vertex buffer, use D3DVBCABS_WRITEONLY Use D3DLOCK_DISCARDCONTENTS Use D3DLOCK_NOOVERWRITE Vertex buffer ordering Use ordered vertex buffers because of cache coherency

NVIDIA Corporation Using Vertex Buffers Correctly

NVIDIA Corporation Example vertex buffer flow CreateVB(WRITEONLY, 1000- 12000) A: I = 0 B: Space in VB for M vertices? Yes: Lock(NOOVERWRITE) No: GOTO C Fill in M vertices at index I Unlock(); DIPVB(I); I += M; GOTO B; C: Lock(DISCARDCONTENTS) GOTO A

NVIDIA Corporation Concurrency Why do I need it? Concurrency helps parallelism between the CPU and the GPU. OK, How do I achieve it? Use NVPAT to see if “Spin Lock” is happening. “Spin Locks” are when the driver has to stall waiting for the hardware to finish with an object These objects can be vertex buffers or texture surfaces

NVIDIA Corporation Concurrency (cont.) Use the vertex buffer and texture surface flags so the driver can give you another buffer while the hardware is using the other one.

NVIDIA Corporation Frame Rate Limiters Can cause concurrency issues Better ways to achieve constant frame rates Makes effective triangle rate much lower, because driver has to do some work with vertex data.

NVIDIA Corporation Frame Rate Limiter Problem Serialization of code loop Rescheduled for concurrency

NVIDIA Corporation Non Optimized Surface Usage Locking a texture before the GPU is finished with it causes concurrency problems by stalling the CPU inside the driver. Typical examples include locking the backbuffer to do 2D operations on it The best solution for this is to use 2 screen aligned triangles (quad) instead and put them directly in the 3D pipeline

NVIDIA Corporation Cache Misses Big slowdowns can occur here CPU cache misses can occur because of ordering of vertex data. Check these carefully with VTune. GPU has a vertex cache also. Geforce has a 16 entry cache, but optimal cache use is 10, because 6 triangles can be “in flight” at any given time. GPU vertex cache statistics will be added to NVPAT.

NVIDIA Corporation Vertex Ordering Best performance is to also order vertex data and vertex indices in sequential order. This helps both the CPU and the GPU Out of order vertices makes the CPU hit the cache more often It does the same thing to the GPU

NVIDIA Corporation How do we solve these problems? VTune GPT NVPAT

NVIDIA Corporation VTune 4.5 Will help your application optimize for CPU Works well in conjunction with NVPAT I personally use the Time-Based Sampling Wizard VTune is excellent for application specific analysis It doesn’t show where in the driver time is spent, unless you have symbols for the driver. You almost certainly don’t have driver symbols.

NVIDIA Corporation VTune 4.5 Flare Application

NVIDIA Corporation GPT 3.5 Excellent tool to help you achieve maximum performance. Works on both D3D and OpenGL Helps with application  API slowdowns Works well in conjunction with VTune and NVPAT. GPT is excellent for application to Direct3D/OpenGL analysis. It still can’t tell you what is occurring inside the driver that may be slowing your application down

NVIDIA Corporation GPT 3.5 (cont) View of alien world in Half-Life* Quad view for visual analysis modes

NVIDIA Corporation NVPAT 1.07 Analyze interaction with driver Works on NVIDIA hardware only Windows 98/Windows 2000 capable Hotkey capable Online help via F1 function key Logging Frame Rate Display Natural Extension to VTune and GPT

NVIDIA Corporation NVPAT 1.07 Demo – Flare VS NewFlare NVPAT Available free at http://www.nvidia.com/Marketing/Developer/SwDe vStaticPages.nsf/pages/StatsDriver http://www.nvidia.com/Marketing/Developer/SwDe vStaticPages.nsf/pages/StatsDriver You must be a registered NVIDIA developer

NVIDIA Corporation VTune DLL SDK Soon, all these performance tools should be integrated into VTune using the DLL SDK NVPAT will be integrated into the VTune DLL SDK VTune DLL SDK is available from Intel and gives you the ability to integrate performance tools into VTune. http://developer.intel.com/vtune/analyzer/vtperfdll Common User Interface/API means less to learn for developers

NVIDIA Corporation Action Items Profile often and early in the process Use the tools available to you Some are free, the rest are reasonable Architect engine with concurrency in mind Ask for enhancements from your tool vendor

NVIDIA Corporation Questions? Comments/Suggestions? Enhancement requests for NVPAT can be sent to statdriver@nvidia.com

Kenneth Hurley Sr. Software Engineer

Similar presentations

Presentation on theme: "Kenneth Hurley Sr. Software Engineer"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kenneth Hurley Sr. Software Engineer

Similar presentations

Presentation on theme: "Kenneth Hurley Sr. Software Engineer"— Presentation transcript:

Similar presentations

About project

Feedback