Download presentation
Published byPenelope Claydon Modified over 10 years ago
1
Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD
2
Topics Covered Overview of the DX11 front-end pipeline
Common bottlenecks Advanced Vertex Shader Features Vertex Shader Techniques Samples and Results
3
DX11 Front-End Pipeline VS –vertex data HS – control points
Input Assembler Hull Shader Domain Shader Tessellator Geometry Shader Stream Out CB, SRV, or UAV Vertex Shader . VS –vertex data HS – control points Tessellator DS – generated vertices GS – primitives Write to UAV at all stages Starting with DX11.1 Graphics Hardware
4
Bottlenecks - VS VS Attributes VS Texture Fetches
Limit outputs to 4 attributes (AMD) This applies to all shader stages (except PS) VS Texture Fetches Too many texture fetches can add latency Especially dependent texture fetches Group fetches together for better performance Hide latency with ALU instructions
5
Bottlenecks - VS Use the caches wisely Pre-VS Cache
Input Assembler Pre-VS Cache (Hides Latency) Use the caches wisely Avoid large vertex formats that waste pre-VS cache space DrawIndexed() allows for reuse of processed vertices saved in the post-VS cache Vertices with the same index only need to get processed once Vertex Shader Post-VS Cache (Vertex Reuse)
6
Bottlenecks - GS GS Can add or remove primitives
Adding new primitives requires storing new vertices Going off chip to store data can be a bandwidth issue Using the GS means another shader stage This means more competition for shader resources Better if you can do everything in the VS
7
Advanced Vertex Shader Features
SV_VertexID, SV_InstanceID UAV output (DX11.1) NULL vertex buffer VS can create its own vertex data
8
SV_VertexID Can use the vertex id to decide what vertex data to fetch
Fetch from SRV, or procedurally create a vertex VSOut VertexShader(SV_VertexID id) { float3 vertex = g_VertexBuffer[id]; … } The value of SV_VertexID depends on the draw call. For non-indexed Draw, the vertex ID starts with 0 and increments by 1 for every vertex processed by the shader. For DrawIndexed(), the vertexID is the value of the index in the index buffer for that vertex.
9
UAV buffers Write to UAVs from a Vertex Shader
New feature in DX11.1 (UAV at any stage) Can be used instead of stream-out for writing vertex data Triangle output not limited to strips You can use whatever format you want Can output anything useful to a UAV
10
NULL Vertex Buffer DX11/DX10 allows this Can be used for instancing
Just set the number of vertices in Draw() VS will execute without a vertex buffer bound Can be used for instancing Call Draw() with the total number of vertices Bind mesh and instance data as SRVs
11
Vertex Shader Techniques
Full Screen Triangle Vertex Shader Instancing Merged Instancing Vertex Shader UAVs
12
Full Screen Triangle For post-processing effects
Triangle has better performance than quad Fast and easy with VS generated coordinates No IB or VB is necessary Something you should be using for full screen effects Clip Space Coordinates (-1, -1, 0) (-1, 3, 0) (3, -1, 0)
13
Full Screen Triangle: C++ code
// Null VB, IB pd3dImmediateContext->IASetVertexBuffers( 0, 0, NULL, NULL, NULL ); pd3dImmediateContext->IASetIndexBuffer( NULL, (DXGI_FORMAT)0, 0 ); pd3dImmediateContext->IASetInputLayout( NULL ); // Set Shaders pd3dImmediateContext->VSSetShader( g_pFullScreenVS, NULL, 0 ); pd3dImmediateContext->PSSetShader( … ); pd3dImmediateContext->PSSetShaderResources( … ); pd3dImmediateContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST ); // Render 3 vertices for the triangle pd3dImmediateContext->Draw(3, 0);
14
Full Screen Triangle: HLSL Code
VSOutput VSFullScreenTest(uint id:SV_VERTEXID) { VSOutput output; // generate clip space position output.pos.x = (float)(id / 2) * ; output.pos.y = (float)(id % 2) * ; output.pos.z = 0.0; output.pos.w = 1.0; // texture coordinates output.tex.x = (float)(id / 2) * 2.0; output.tex.y = (float)(id % 2) * 2.0; // color output.color = float4(1, 1, 1, 1); return output; } (-1, 3, 0) (-1, -1, 0) (3, -1, 0) Clip Space Coordinates
15
VS Instancing: Point Sprites
Often done on GS, but can be faster on VS Create an SRV point buffer and bind to VS Call Draw or DrawIndexed to render the full triangle list. Read the location from the point buffer and expand to vertex location in quad Can be used for particles or Bokeh DOF sprites Don’t use DrawInstanced for a small mesh
16
Point Sprites: C++ Code
pd3d->IASetIndexBuffer( g_pParticleIndexBuffer, DXGI_FORMAT_R32_UINT, 0 ); pd3d->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST ); pd3dImmediateContext->DrawIndexed( g_particleCount * 6, 0, 0); For indexed Draw calls, create an index buffer which contains (index location + index number). That way you can calculate (vertexID/vertsPerMesh) to get the instance index, and (vertexID % vertsPerMesh) to get the index value which you can use to look up the vertex.
17
Point Sprites: HLSL Code
VSInstancedParticleDrawOut VSIndexBuffer(uint id:SV_VERTEXID) { VSInstancedParticleDrawOut output; uint particleIndex = id / 4; uint vertexInQuad = id % 4; // calculate the position of the vertex float3 position; position.x = (vertexInQuad % 2) ? 1.0 : -1.0; position.y = (vertexInQuad & 2) ? -1.0 : 1.0; position.z = 0.0; position.xy *= PARTICLE_RADIUS; position = mul( position, (float3x3)g_mInvView ) + g_bufPosColor[particleIndex].pos.xyz; output.pos = mul( float4(position,1.0), g_mWorldViewProj ); output.color = g_bufPosColor[particleIndex].color; // texture coordinate output.tex.x = (vertexInQuad % 2) ? 1.0 : 0.0; output.tex.y = (vertexInQuad & 2) ? 1.0 : 0.0; return output; }
18
Point Sprite Performance
AMD Radeon R9 290x Nvidia Titan
19
Point Sprite Performance
DrawIndexed() is the fastest method Draw() is slower but doesn’t need an IB Don’t use DrawInstanced() for creating sprites on either AMD or NVidia hardware Not recommended for a small number of vertices
20
Merge Instancing Combine multiple meshes that can be instanced many times Better than normal instancing which renders only one mesh Instance nearby meshes for smaller bounding box Each mesh is a page in the vertex data Fixed vertex count for each mesh Meshes smaller than page size use degenerate triangles
21
Merge Instancing Mesh Data 0 Instance 0 Mesh Index 2 Mesh Data 1
Degenerate Triangle Vertex 0 Vertex 1 Vertex 2 Vertex 3 . Fixed Length Page Instance 1 Mesh Index 0 Mesh Data 2 . . Mesh Instance Data Mesh Vertex Data
22
Merged Instancing using VS
Use the vertex ID to look up the mesh to instance All meshes are the same size, so (id / SIZE) can be used as an offset to the mesh Faster than using DrawInstanced()
23
Merge Instancing Performance
Instancing performance test by Cloud Imperium Games for Star Citizen Renders 13.5M triangles (~40M verts) DrawInstanced version calls DrawInstanced() and uses instance data in a vertex buffer Soft Instancing version uses vertex instancing with Draw() calls and fetches instance data from SRV AMD Radeon R9 290X ms Nvidia GTX 780
24
Vertex Shader UAVs Random access Read/Write in a VS
Can be used to store transformed vertex data for use in multi-pass algorithms Can be used for passing constant attributes between any shader stage (not just from VS)
25
Skinning to UAV Skin vertex data then output to UAV
Instance the skinned UAV data multiple times Can also be used for non-instanced data Multiple passes can reuse the transformed vertex data – Shadow map rendering Performance is about the same as stream-out, but you can do more …
26
Bounding Box to UAV Can calculate and store Bbox in the VS
Use a UAV to store the min/max values (6) InterlockedMin/InterlockedMax determine min and max of the bbox Need to use integer values with atomics Use the stored bbox in later passes GPU physics (collision) Tile based processing - If the mesh is being reused many times, then calculating the bounding box has little overhead. Bounding box can be used for collision detection
27
Bounding Box: HLSL Code
void UAVBBoxSkinVS(VSSkinnedIn input, uint id:SV_VERTEXID ) { // skin the vertex . . . // output the max and min for the bounding box int x = (int) (vSkinned.Pos.x * FLOAT_SCALE); // convert to integer int y = (int) (vSkinned.Pos.y * FLOAT_SCALE); int z = (int) (vSkinned.Pos.z * FLOAT_SCALE); InterlockedMin(g_BBoxUAV[0], x); InterlockedMin(g_BBoxUAV[1], y); InterlockedMin(g_BBoxUAV[2], z); InterlockedMax(g_BBoxUAV[3], x); InterlockedMax(g_BBoxUAV[4], y); InterlockedMax(g_BBoxUAV[5], z);
28
Particle System UAV Single pass GPU-only particle system In the VS:
Generate sprites for rendering Do Euler integration and update the particle system state to a UAV
29
Particle System: HLSL Code
uint particleIndex = id / 4; uint vertexInQuad = id % 4; // calculate the new position of the vertex float3 oldPosition = g_bufPosColor[particleIndex].pos.xyz; float3 oldVelocity = g_bufPosColor[particleIndex].velocity.xyz; // Euler integration to find new position and velocity float3 acceleration = normalize(oldVelocity) * ACCELLERATION; float3 newVelocity = acceleration * g_deltaT + oldVelocity; float3 newPosition = newVelocity * g_deltaT + oldPosition; g_particleUAV[particleIndex].pos = float4(newPosition, 1.0); g_particleUAV[particleIndex].velocity = float4(newVelocity, 0.0); // Generate sprite vertices . . . Could read and write from the UAV instead of binding an input SRV
30
Conclusion Vertex shader “tricks” can be more efficient than more commonly used methods Use SV_Vertex ID for smarter instancing Sprites Merge Instancing UAVs add lots of freedom to vertex shaders Bounding box calculation Single pass VS particle system
31
Demos Particle System UAV Skinning Bbox
32
Acknowledgements Merge Instancing Thanks to
Emil Person, “Graphics Gems for Games” SIGGRAPH 2011 Brendan Jackson, Cloud Imperium Thanks to Nick Thibieroz, AMD Raul Aguaviva (particle system UAV), AMD Alex Kharlamov, AMD
33
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.