Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Tools Jeff Kiel Manager, Developer Performance Tools.

Similar presentations


Presentation on theme: "Performance Tools Jeff Kiel Manager, Developer Performance Tools."— Presentation transcript:

1 Performance Tools Jeff Kiel Manager, Developer Performance Tools

2 © NVIDIA Corporation 2007 Performance Tools Agenda Overview of GPU pipeline and Unified Shader NVIDIA PerfKit 5.0: Driver & GPU Performance Data Instrumented Driver: GPU & driver performance information, GLExpert runtime debugging PerfSDK: Performance data integrated into your application PerfHUD: The Direct3D GPU Performance Accelerator gDEBugger: OpenGL performance analysis and debugging ShaderPerf: Shader program performance

3 © NVIDIA Corporation 2007 GPU Pipelined Architecture (Logical View) Framebuffer Vertex Assembly CPU GPU Blending Rasterizer Vertex Shader GeometryShader Texture Pixel Shader

4 © NVIDIA Corporation 2007 GPU Pipelined Architecture (Logical View) Framebuffer Vertex Assembly CPU GPU Blending Rasterizer Vertex Shader GeometryShader Pixel Shader Texture

5 © NVIDIA Corporation 2007 Common Graphics/GPU Problems New, increasingly complex GPU hardware GPU is a black box Unified shaders changes everything Increasing engine and scene complexity Artists don’t always understand how rendering engines work CPU tuning insufficient (multiple processors, multi-cores) Turn around time for debugging and tuning shaders too long Hard to debug API/pipeline setup issues

6 © NVIDIA Corporation 2007 Unified Shader Tuning No longer “pixel shader bound” GPU balances workload Now just “shader unit bound” Check workload distribution for optimization opportunities Typical optimizations may not work Classic: move calculations from pixels to vertices If #vertices ~= #pixels, no improvement

7 © NVIDIA Corporation 2007 NVIDIA PerfKit 5: The Solution! Instrumented Driver GLExpert PerfHUD PerfSDK PerfAPI Sample Code Helper Classes Documentation Tools NVIDIA Plug-In for Microsoft PIX for Windows gDEBugger 3.1 DevCPL Platforms (x32 & x64) Windows XP & Vista Linux Update Soon!

8 © NVIDIA Corporation 2007 PerfKit Instrumented Driver GLExpert functionality GPU and Driver Performance Counters OpenGL and Direct3D Data exported via NVIDIA API and PDH Simplified Experiments (SimExp) Collect GPU and driver data, retain performance Track intra-frame events & statistics Gather and collate at end of frame Performance overhead 1-2%

9 © NVIDIA Corporation 2007 GLExpert: What is it? Helps eliminate driver/CPU performance issues OpenGL portion of the Instrumented Driver Output to stdout or debugger Different groups/levels of information detail Controlled using environment variables in Linux, DevCPL tab in Windows

10 © NVIDIA Corporation 2007 GLExpert: What is it? Information provided GL Errors: print when raised Software Fallbacks: indicate when the driver is in fall back GPU Programs: errors during compile or link VBOs: show where they reside, mapping details FBOs: print reasons for unsupported configuration Future Enhancements Extensive SLI support Quadro 5600/GeForce 8 Series extensions More detailed pipeline setup messages, buffer object support, fallback information, and more

11 © NVIDIA Corporation 2007 PerfKit: Performance Counter Types SW/Driver Counters: PerfAPI, PDH Raw GPU Counters: PerfAPI, PDH Simplified Experiments: PerfAPI Instrumented GPUs Quadro FX 5600 & 4500 GeForce 8800 GTX, 8600 GT GeForce 7950/7900 GTX & GT GeForce 7800 GTX GeForce 6800 Ultra & GT GeForce 6600

12 © NVIDIA Corporation 2007 OpenGL/Direct3D Driver Counters General FPS ms per frame Driver Driver frame time (total time spent in driver) Driver sleep time (waiting for GPU) Detailed wait timers (kernel, locks, rendering, etc.) Counts Batches, vertices, primitives Triangles and instanced triangles Memory Total used Render targets, textures, buffers

13 © NVIDIA Corporation 2007 GPU Counters vertex_attribute_count culled_primitive_count primitive_count triangle_count vertex_count shaded_pixel_count rop_busy Texture Unit (Filtering) Vertex Assembly Raster / ZCull Raster Operations Frame Buffer (RAM Memory) gpu_idle Vertex Shader GeometryShader Pixel Shader shader_busy vertex, geometry, pixel ratios

14 © NVIDIA Corporation 2007 How do I use PerfKit counters? PerfAPI: Easy integration of PerfKit Real time performance monitoring using GPU and driver counters, round robin sampling Simplified Experiments for single frame analysis PDH: Performance Data Helper for Windows Driver data, GPU counters, and OS information Exposed via Perfmon Good for rapid prototyping PerfSDK: Sample code and helper classes

15 © NVIDIA Corporation 2007 PerfAPI: Real Time // Somewhere in setup NVPMAddCounterByName(“vertex_shader_busy”); NVPMAddCounterByName (“pixel_shader_busy”); NVPMAddCounterByName (“shader_waits_for_texture”); NVPMAddCounterByName (“gpu_idle”); // In your rendering loop, sample using names NVPMSample(NULL, &nNumSamples); NVPMGetCounterValueByName(“vertex_shader_busy”, 0, &nVSEvents, &nVSCycles); NVPMGetCounterValueByName(“pixel_shader_busy”, 0, &nPSEvents, &nPSCycles); NVPMGetCounterValueByName(“shader_waits_for_texture”, 0, &nTexEvents, &nTexCycles); NVPMGetCounterValueByName(“gpu_idle”, 0, &nIdleEvents, &nIdleCycles);

16 © NVIDIA Corporation 2007 PerfAPI: SimExp NVPMAddCounter(“GPU Bottleneck”); NVPMAllocObjects(50); // Set up the experiment, get pass count NVPMBeginExperiment(&nNumPasses); for(int ii = 0; ii < nNumPasses; ++ii) { // Scene setup/clear backbuffer NVPMBeginPass(ii); NVPMBeginObject(0); // Draw calls associated with object 0 and flush NVPMEndObject(0);... NVPMEndPass(ii); // End scene/present/swap buffers } // End experiment and retrieve bottleneck NVPMEndExperiment(); NVPMGetCounterValueByName(“GPU Bottleneck”, 0, &nGPUBneck, &nGPUCycles);

17 © NVIDIA Corporation 2007 PerfHUD: Direct3D debugging and tuning One click bottleneck determination Graphs and debugging tools overlaid on your application 4 screens for targeted analysis Performance Dashboard Debug Console Frame Debugger Frame Profiler Drag and drop application on PerfHUD icon

18 © NVIDIA Corporation 2007 New! PerfHUD 5.0! Interactive model Shader Edit and Continue Render state Modification Configurable Graphs Many more features and usability improvements New technologies Windows Vista & DirectX 10 Quadro 5600 and GeForce 8800 with Unified Shader Architecture

19 © NVIDIA Corporation 2007 Demo: PerfHUD Company of Heroes used with permission from THQ and Relic Entertainment

20 © NVIDIA Corporation 2007 Demo: Performance Dashboard Company of Heroes used with permission from THQ and Relic Entertainment

21 © NVIDIA Corporation 2007 Demo: Performance Dashboard Company of Heroes used with permission from THQ and Relic Entertainment

22 © NVIDIA Corporation 2007 Demo: Performance Dashboard Company of Heroes used with permission from THQ and Relic Entertainment

23 © NVIDIA Corporation 2007 Demo: Frame Debugger Company of Heroes used with permission from THQ and Relic Entertainment

24 © NVIDIA Corporation 2007 Demo: Advanced Frame Debugger

25 © NVIDIA Corporation 2007 Demo: Frame Profiler Company of Heroes used with permission from THQ and Relic Entertainment

26 © NVIDIA Corporation 2007 Frame Profiler One Touch Performance Analysis PerfHUD uses PerfSDK Multiple passes on the scene, sample over 40 performance counters Need to render THE SAME FRAME until all the counters are read Must use time-based animation Do use QueryPerformanceCounter() or timeGetTime() Don’t use RDTSC or throttle frame rate

27 © NVIDIA Corporation 2007 Associated Tools: NVIDIA Plug-In for Microsoft PIX for Windows

28 © NVIDIA Corporation 2007 Graphic Remedy’s gDEBugger OpenGL and OpenGL ES Debugger and Profiler Shorten development time Improve application quality Optimize performance NVIDIA PerfKit and GLExpert integrated Now supports Linux! Windows XP & Vista, x32 & x64 Discounted academic licenses available NVIDIA booth Thursday morning http://www.gremedy.com

29 © NVIDIA Corporation 2007 PerfGraph Open source tool for graphing performance counters Supports PerfKit GPU/Driver signals System performance information (CPU utilization, memory, etc.) Cross platform Windows & Linux

30 © NVIDIA Corporation 2007 Project Status PerfKit 5.0 available now: http://developer.nvidia.com/perfkithttp://developer.nvidia.com/perfkit PerfHUD 5.0 ForceWare Release 100 Driver GeForce 8800 support Windows XP & Vista, 32 and 64 bit Linux 32 and 64 bit PerfSDK/GLExpert in development PerfGraph: www.sourceforge.org\perfgraph Instrumented GPUs Feedback and Support: http://developer.nvidia.com/forumshttp://developer.nvidia.com/forums Quadro FX 5600 & 4500 GeForce 8800 Series GeForce 7950/7900 GTX & GT GeForce 7800 GTX GeForce 6800 Ultra & GT GeForce 6600

31 © NVIDIA Corporation 2007 v2f BumpReflectVS(a2v IN, uniform float4x4 WorldViewProj, uniform float4x4 World, uniform float4x4 ViewIT) { v2f OUT; // Position in screen space. OUT.Position = mul(IN.Position, WorldViewProj); // pass texture coordinates for fetching the normal map OUT.TexCoord.xyz = IN.TexCoord; OUT.TexCoord.w = 1.0; // compute the 4x4 tranform from tangent space to object space float3x3 TangentToObjSpace; // first rows are the tangent and binormal scaled by the bump scale TangentToObjSpace[0] = float3(IN.Tangent.x, IN.Binormal.x, IN.Normal.x); TangentToObjSpace[1] = float3(IN.Tangent.y, IN.Binormal.y, IN.Normal.y); TangentToObjSpace[2] = float3(IN.Tangent.z, IN.Binormal.z, IN.Normal.z); OUT.TexCoord1.x = dot(World[0].xyz, TangentToObjSpace[0]); OUT.TexCoord1.y = dot(World[1].xyz, TangentToObjSpace[0]); OUT.TexCoord1.z = dot(World[2].xyz, TangentToObjSpace[0]); OUT.TexCoord2.x = dot(World[0].xyz, TangentToObjSpace[1]); OUT.TexCoord2.y = dot(World[1].xyz, TangentToObjSpace[1]); OUT.TexCoord2.z = dot(World[2].xyz, TangentToObjSpace[1]); OUT.TexCoord3.x = dot(World[0].xyz, TangentToObjSpace[2]); OUT.TexCoord3.y = dot(World[1].xyz, TangentToObjSpace[2]); OUT.TexCoord3.z = dot(World[2].xyz, TangentToObjSpace[2]); float4 worldPos = mul(IN.Position, World); // compute the eye vector (going from shaded point to eye) in cube space float4 eyeVector = worldPos - ViewIT[3]; // view inv. transpose contains eye position in world space in last row. OUT.TexCoord1.w = eyeVector.x; OUT.TexCoord2.w = eyeVector.y; OUT.TexCoord3.w = eyeVector.z; return OUT; } ///////////////// pixel shader ////////////////// float4 BumpReflectPS(v2f IN, uniform sampler2D NormalMap, uniform samplerCUBE EnvironmentMap, uniform float BumpScale) : COLOR { // fetch the bump normal from the normal map float3 normal = tex2D(NormalMap, IN.TexCoord.xy).xyz * 2.0 - 1.0; normal = normalize(float3(normal.x * BumpScale, normal.y * BumpScale, normal.z)); // transform the bump normal into cube space // then use the transformed normal and eye vector to compute a reflection vector // used to fetch the cube map // (we multiply by 2 only to increase brightness) float3 eyevec = float3(IN.TexCoord1.w, IN.TexCoord2.w, IN.TexCoord3.w); float3 worldNorm; worldNorm.x = dot(IN.TexCoord1.xyz,normal); worldNorm.y = dot(IN.TexCoord2.xyz,normal); worldNorm.z = dot(IN.TexCoord3.xyz,normal); float3 lookup = reflect(eyevec, worldNorm); return texCUBE(EnvironmentMap, lookup); } ShaderPerf 2.0 Inputs: GLSL, Cg, HLSL PS1.x,PS2.x,PS3.x VS1.x,VS2.x, VS3.x !!FP1.0 !!ARBfp1.0 ShaderPerf GPU Arch: Quadro FX series GeForce 8X00, 7X00 GeForce 6X00 & FX Outputs: Resulting assembly codeResulting assembly code # of cycles# of cycles # of temporary registers# of temporary registers Pixel/vertex throughputPixel/vertex throughput Test all fp16 and all fp32Test all fp16 and all fp32

32 © NVIDIA Corporation 2007 ShaderPerf: In your pipeline Test current performance Compare with shader cycle budgets Test optimization opportunities Not just Tex/ALU balance: cycles & throughput Automated regression analysis Integrated in FX Composer 2.0 Artists/TDs code expensive shaders Achieve optimum performance

33 © NVIDIA Corporation 2007 ShaderPerf 2.0 Alpha Supports Direct3D/HLSL GeForce 7, 6, and FX series GPUs ForceWare Release 162 Unified Compiler Improved vertex performance simulation and throughput calculation with branching Multiple drivers from one ShaderPerf Smaller footprint New programmatic interface

34 © NVIDIA Corporation 2007 ShaderPerf 2.0: Beta Full support for Cg and GLSL, vertex and fragment programs Support for Quadro 5600 & GeForce 8 series GPUs Geometry shaders and geometry throughput Fragment program differencing

35 © NVIDIA Corporation 2007 Questions? Stop by our booth for a hands on demo! Online: http://developer.nvidia.com/PerfKit http://developer.nvidia.com/PerfHUD http://developer.nvidia.com/ShaderPerf Feedback and Support: http://developer.nvidia.com/forumshttp://developer.nvidia.com/forums

36 © NVIDIA Corporation 2007 PerfKit 5 The NVIDIA Developer Toolkit GPU Programming Guide ShaderPerf 2 PerfHUD 5 Conference Presentations PerfSDK GLExpert gDEBugger NV PIX Plug-in SDK 10FX Composer 2 Melody Texture Tools 2 Cg Toolkit NVSG Content Creation Software Development PerformanceDocumentation Videos mental mill Artist Edition Books

37 © NVIDIA Corporation 2007 GPU Gems 3 Available Now! SIGGRAPH Bookstore Major Book Retailers Includes chapters from Adobe Systems Apple Crytek Cornell University Electronic Arts Havok Juniper Networks Microsoft SEGA …and many more


Download ppt "Performance Tools Jeff Kiel Manager, Developer Performance Tools."

Similar presentations


Ads by Google