Download presentation
Presentation is loading. Please wait.
1
UE4 Vulkan Updates & Tips
Rolando Caloca O. Sr. Engine Programmer
2
Descriptor Pool Management Command Buffer Timings
INTRO Descriptor Pool Management Command Buffer Timings Depth Read / Stencil Write Dynamic Loader Uniform Buffers Future work
3
DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool!
4
DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! Simple and quick to implement! ‘Reasonable’ limits per type If out of entries, just alloc a new pool!
5
DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! Simple and quick to implement! Issues: Hardcoded limits per type Wasted memory (how many entries per type?) Driver hitches! Did I mention hardcoded?
6
DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline!
7
DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline! Each Pipeline knows its layout (# of types / pool) Recycle after cmd buffer fence passes
8
DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline! Each Pipeline knows its layout (# of types / pool) Recycle after cmd buffer fence passes But: When to free? Slow to go traverse pool list per PSO Too many pools!
9
DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline! Newer scheme! Set of Pools per Pipeline layout
10
DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline! Newer scheme! Set of Pools per Pipeline layout A lot of pipelines have the same layout (hash) Allocate from Device/Context Acquire them per Command Buffer As soon as Cmd Buf fence passes, all can be recycled! Free after N unused frames
11
Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms
12
Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Cmd Buffer0 Cmd Buffer1 Cmd Buffer2 Cmd Buffer3 Begin Timestamp End Timestamp
13
Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers!
14
Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers!
15
Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers!
16
Original scheme: New scheme! Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers! New scheme! Time each command buffer!
17
Original scheme: New scheme! Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers! New scheme! Time each command buffer! Cmd Buffer0 Cmd Buffer1 Cmd Buffer2 Cmd Buffer3 Begin Begin Begin Begin End End End End
18
Original scheme: New scheme! Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers! New scheme! Time each command buffer! GPU idle time: 2.7ms Real GPU work time: 5.8ms Can add features like Dynamic Resolution TAA! Won’t time async/overlapped work correctly
19
DEPTH READ STENCIL WRITE
Deferred shading technique(s) Set read-only depth/stencil render target Stencil write while reading in the shader from depth texture
20
DEPTH READ STENCIL WRITE
21
DEPTH READ STENCIL WRITE
Deferred shading technique(s) Set read-only depth/stencil render target Stencil write while reading in the shader from depth texture Issue: Can’t leave D/S in VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
22
DEPTH READ STENCIL WRITE
Deferred shading technique(s) Set read-only depth/stencil render target Stencil write while reading in the shader from depth texture Issue: Can’t leave D/S in VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL Solution: Change to VK_IMAGE_LAYOUT_GENERAL
23
DEPTH READ STENCIL WRITE
Deferred shading technique(s) Set read-only depth/stencil render target Stencil write while reading in the shader from depth texture Issue: Can’t leave D/S in VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL Solution: Change to VK_IMAGE_LAYOUT_GENERAL Better: VK_KHR_maintenance2 extension! New layout: VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_STENCIL_ATTACHMENT_OPTIMAL_KHR
24
DEPTH READ STENCIL WRITE
25
Initially statically link with vulkan-1.lib
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading)
26
Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet…
27
Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet… Switched all platforms to do dynamic loading!
28
Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet… Switched all platforms to do dynamic loading! One loader/header for all platforms
29
Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet… Switched all platforms to do dynamic loading! One loader/header for all platforms Each platform subscribes to special extensions
30
Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet… Switched all platforms to do dynamic loading! One loader/header for all platforms Each platform subscribes to special extensions In the end we couldn’t measure any gains… (Yet?) But still good clean up!
31
RADEON GPU PROFILER BLOOPER
TFW you submit too many Command Buffers...
32
UE4 uniform buffers end up as HLSL constant buffers
33
UE4 uniform buffers end up as HLSL constant buffers
34
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays vu_h: vertex uniforms high-precision (float32) pu_m: pixel uniforms medium-precision (float16)
35
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays vu_h: vertex uniforms high-precision (float32) pu_m: pixel uniforms medium-precision (float16) eg ReflectionPlane is on buffer offset 60, 4 bytes long
36
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets()
37
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() Requires updating the full descriptor set:
38
UNIFORM BUFFERS
39
UNIFORM BUFFERS
40
UNIFORM BUFFERS
41
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() Requires updating the full descriptor set: Slow!
42
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw!
43
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw! Check for redundant writes And make the global UBs Dynamic!
44
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw! Check for redundant writes And make the global UBs Dynamic! One big 16MB ring-buffer Always at the same offset in the descriptor set (So never dirty) Only the bind offset changes! vkCmdBindDescriptors(..., PackedGlobalOffsets[Stage])
45
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw! Check for redundant writes And make the global UBs Dynamic! Helped make Vulkan single threaded perf beat D3D11 on AMD (render thread): Infiltrator Editor D3D11: 46.83ms / 21.28fps Vulkan: 36.75ms / 28.15fps
46
UNIFORM BUFFERS
47
UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw! Check for redundant writes And make the global UBs Dynamic! Helped make Vulkan single threaded perf beat D3D11 on AMD (render thread): Infiltrator Editor D3D11: 46.83ms / 21.28fps Vulkan: 36.75ms / 28.15fps Infiltrator Demo, wide city view D3D11: 30.36ms / 29.79fps Vulkan: 13.62ms / 38.08fps
48
Future plans: Thanks! Vulkan 1.1/mGPU support
FUTURE & THANKS Future plans: Vulkan 1.1/mGPU support Future-proof the Renderer: Frame/Render Graph More dynamic/configurable High-level changes to the RHI API Better suited to Vulkan/D3D12/Metal Thanks! AMD Immersive Technology team AMD Driver engineers Special thanks to Ilya Terentiev
49
Automate Dynamic Loader
BONUS SLIDE Automate Dynamic Loader Codegen for fast Vulkan volk - Meta loader for Vulkan API Simple Dynamic Vulkan Extension Loader
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.