Presentation is loading. Please wait.

Presentation is loading. Please wait.

UE4 Vulkan Updates & Tips

Similar presentations


Presentation on theme: "UE4 Vulkan Updates & Tips"— Presentation transcript:

1 UE4 Vulkan Updates & Tips
Rolando Caloca O. Sr. Engine Programmer

2 Descriptor Pool Management Command Buffer Timings
INTRO Descriptor Pool Management Command Buffer Timings Depth Read / Stencil Write Dynamic Loader Uniform Buffers Future work

3 DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool!

4 DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! Simple and quick to implement! ‘Reasonable’ limits per type If out of entries, just alloc a new pool!

5 DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! Simple and quick to implement! Issues: Hardcoded limits per type Wasted memory (how many entries per type?) Driver hitches! Did I mention hardcoded?

6 DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline!

7 DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline! Each Pipeline knows its layout (# of types / pool) Recycle after cmd buffer fence passes

8 DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline! Each Pipeline knows its layout (# of types / pool) Recycle after cmd buffer fence passes But: When to free? Slow to go traverse pool list per PSO Too many pools!

9 DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline! Newer scheme! Set of Pools per Pipeline layout

10 DESCRIPTOR POOL MANAGEMENT
Original scheme: One big happy pool! New scheme: Set of Pools per Pipeline! Newer scheme! Set of Pools per Pipeline layout A lot of pipelines have the same layout (hash) Allocate from Device/Context Acquire them per Command Buffer As soon as Cmd Buf fence passes, all can be recycled! Free after N unused frames

11 Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms

12 Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Cmd Buffer0 Cmd Buffer1 Cmd Buffer2 Cmd Buffer3 Begin Timestamp End Timestamp

13 Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers!

14 Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers!

15 Original scheme: Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers!

16 Original scheme: New scheme! Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers! New scheme! Time each command buffer!

17 Original scheme: New scheme! Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers! New scheme! Time each command buffer! Cmd Buffer0 Cmd Buffer1 Cmd Buffer2 Cmd Buffer3 Begin Begin Begin Begin End End End End

18 Original scheme: New scheme! Write GPU timestamp at Begin Frame
COMMAND BUFFER TIMING Original scheme: Write GPU timestamp at Begin Frame And one at End Frame This frame: 8.5ms Issue: GPU idle between command buffers! New scheme! Time each command buffer! GPU idle time: 2.7ms Real GPU work time: 5.8ms Can add features like Dynamic Resolution TAA! Won’t time async/overlapped work correctly

19 DEPTH READ STENCIL WRITE
Deferred shading technique(s) Set read-only depth/stencil render target Stencil write while reading in the shader from depth texture

20 DEPTH READ STENCIL WRITE

21 DEPTH READ STENCIL WRITE
Deferred shading technique(s) Set read-only depth/stencil render target Stencil write while reading in the shader from depth texture Issue: Can’t leave D/S in VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL

22 DEPTH READ STENCIL WRITE
Deferred shading technique(s) Set read-only depth/stencil render target Stencil write while reading in the shader from depth texture Issue: Can’t leave D/S in VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL Solution: Change to VK_IMAGE_LAYOUT_GENERAL

23 DEPTH READ STENCIL WRITE
Deferred shading technique(s) Set read-only depth/stencil render target Stencil write while reading in the shader from depth texture Issue: Can’t leave D/S in VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL Solution: Change to VK_IMAGE_LAYOUT_GENERAL Better: VK_KHR_maintenance2 extension! New layout: VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_STENCIL_ATTACHMENT_OPTIMAL_KHR

24 DEPTH READ STENCIL WRITE

25 Initially statically link with vulkan-1.lib
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading)

26 Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet…

27 Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet… Switched all platforms to do dynamic loading!

28 Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet… Switched all platforms to do dynamic loading! One loader/header for all platforms

29 Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet… Switched all platforms to do dynamic loading! One loader/header for all platforms Each platform subscribes to special extensions

30 Initially statically link with vulkan-1.lib After an inspiring tweet…
DYNAMIC LOADER Initially statically link with vulkan-1.lib (Though Android and Linux required dynamic loading) After an inspiring tweet… Switched all platforms to do dynamic loading! One loader/header for all platforms Each platform subscribes to special extensions In the end we couldn’t measure any gains… (Yet?) But still good clean up!

31 RADEON GPU PROFILER BLOOPER
TFW you submit too many Command Buffers...

32 UE4 uniform buffers end up as HLSL constant buffers

33 UE4 uniform buffers end up as HLSL constant buffers

34 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays vu_h: vertex uniforms high-precision (float32) pu_m: pixel uniforms medium-precision (float16)

35 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays vu_h: vertex uniforms high-precision (float32) pu_m: pixel uniforms medium-precision (float16) eg ReflectionPlane is on buffer offset 60, 4 bytes long

36 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets()

37 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() Requires updating the full descriptor set:

38 UNIFORM BUFFERS

39 UNIFORM BUFFERS

40 UNIFORM BUFFERS

41 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() Requires updating the full descriptor set: Slow!

42 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw!

43 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw! Check for redundant writes And make the global UBs Dynamic!

44 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw! Check for redundant writes And make the global UBs Dynamic! One big 16MB ring-buffer Always at the same offset in the descriptor set (So never dirty) Only the bind offset changes! vkCmdBindDescriptors(..., PackedGlobalOffsets[Stage])

45 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw! Check for redundant writes And make the global UBs Dynamic! Helped make Vulkan single threaded perf beat D3D11 on AMD (render thread): Infiltrator Editor D3D11: 46.83ms / 21.28fps Vulkan: 36.75ms / 28.15fps

46 UNIFORM BUFFERS

47 UE4 uniform buffers end up as HLSL constant buffers
Global shader parameter get packed into arrays Update each using vkCmdUpdateDescriptorSets() But: UB’s don’t change that much from draw to draw! Check for redundant writes And make the global UBs Dynamic! Helped make Vulkan single threaded perf beat D3D11 on AMD (render thread): Infiltrator Editor D3D11: 46.83ms / 21.28fps Vulkan: 36.75ms / 28.15fps Infiltrator Demo, wide city view D3D11: 30.36ms / 29.79fps Vulkan: 13.62ms / 38.08fps

48 Future plans: Thanks! Vulkan 1.1/mGPU support
FUTURE & THANKS Future plans: Vulkan 1.1/mGPU support Future-proof the Renderer: Frame/Render Graph More dynamic/configurable High-level changes to the RHI API Better suited to Vulkan/D3D12/Metal Thanks! AMD Immersive Technology team AMD Driver engineers Special thanks to Ilya Terentiev

49 Automate Dynamic Loader
BONUS SLIDE Automate Dynamic Loader Codegen for fast Vulkan volk - Meta loader for Vulkan API Simple Dynamic Vulkan Extension Loader


Download ppt "UE4 Vulkan Updates & Tips"

Similar presentations


Ads by Google