Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPONSORED BY SA2014.SIGGRAPH.ORG Efficient Real-Time Shading with Many Lights 2014 Many Light Rendering on Mobile Hardware Part 4/4 Presenter: Markus Billeter.

Similar presentations


Presentation on theme: "SPONSORED BY SA2014.SIGGRAPH.ORG Efficient Real-Time Shading with Many Lights 2014 Many Light Rendering on Mobile Hardware Part 4/4 Presenter: Markus Billeter."— Presentation transcript:

1 SPONSORED BY SA2014.SIGGRAPH.ORG Efficient Real-Time Shading with Many Lights 2014 Many Light Rendering on Mobile Hardware Part 4/4 Presenter: Markus Billeter - Visualization and Multimedia Lab, University of Zürich - Chalmers University of Technology

2 SPONSORED BY SA2014.SIGGRAPH.ORG Efficient Real-Time Shading with Many Lights 2014 Overview Introduction: Mobile Hardware What’s different? Different methods Review/Repetition – but with focus on mobile arch. Practical Clustered Forward implementation

3 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Introduction: Mobile Hardware

4 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Mobile vs. Desktop So far considered modern high-end systems How is mobile hardware different? What are the challenges?

5 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Mobile vs. Desktop, II Differences & Challenges: Lower memory bandwidth Fewer features (still) Energy consumption (more) important

6 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Mobile vs. Desktop, II Some of this is improving E.g., features are being “ported” But still noticeable if you target current devices I.e., what’s out there right now.

7 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Mobile vs. Desktop, III Desktop GPUs are mainly Immediate Mode Renderers (IMR) Mobile GPU architecture includes both Immediate Mode Renderers (IMR) Tile Based Renderers (TBR) TBR probably more common than IMR For now?

8 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Immediate Mode Renderer (IMR) “Normal” rendering Geometry transformed and then rasterized immediately Framebuffer resides mainly in VRAM Multiple writes to VRAM

9 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tile Based Renderer (TBR) Framebuffer divided into tiles Geometry binned into tiles Tiles processed independently later

10 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tile Based Renderer (TBR) Framebuffer divided into tiles Geometry binned into tiles Tiles processed independently later The tile’s portion of the framebuffer is kept in fast on-chip memory!

11 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tile Based Renderer (TBR), II Tile’s FB-contents stored to RAM when needed

12 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tile Based Renderer (TBR), II Tile’s FB-contents stored to RAM when needed Best case: Once, when all rendering to that tile has finished

13 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Some Examples By the presented classification: TBR: Mali, PowerVR, Adreno IMR: Tegra (and most desktop GPUs) See “Performance Tuning for Tile-Based Applications”; Bruce Merry, OpenGL Insights, 2012 “Tiled Deferred Blending”; Ramses Ladlani, GPU Pro 5, 2014

14 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Variants Two main variants Tile based immediate mode rendering (TBIMR) Tile based deferred rendering (TBDR)

15 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBIMR Tile based immediate mode rendering Per-tile geometry (after binning) processed in IMR fashion

16 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBIMR Tile based immediate mode rendering Per-tile geometry (after binning) processed in IMR fashion Uses depth buffer for hidden surface removal Possibly with EarlyZ Submit geometry front-to-back Overdraw can (and will) occur

17 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBDR Tile based deferred rendering Hidden surface removal before shading Geometry order shouldn’t matter No overdraw/overshading

18 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Summary Majority of mobile GPUs are TBR At least, for now So, want to pick methods that map well to this

19 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Important Aspects All TBR keep the tile’s portion of the FB in fast on-chip memory.

20 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Important Aspects All TBR keep the tile’s portion of the FB in fast on-chip memory. Make it stay there! Loads/stores to/from RAM use precious memory bandwidth Implications to performance and energy consumption

21 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Goal Goal: keep data in the on-chip buffer when possible True for both TBR variants (TBIMR & TBDR)

22 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Goal Goal: keep data in the on-chip buffer when possible True for both TBR variants (TBIMR & TBDR) Preferably without affecting IMR performance.

23 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Important! Tiled based rendering (TBR) != Tiled Shading

24 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Important! Tiled based rendering (TBR) != Tiled Shading TBR Hardware property (it’s out of your hands) Tiled Shading Software algorithm (your choice) Src: Wikipedia/Exynos, by User:Köf3, CC-SA3

25 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Important! It’s perfectly valid to use Tiled Shading on a TBR platform More on this later

26 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs OpenGL|ES on mobile platforms Either 2.0 or 3.0+

27 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs OpenGL|ES on mobile platforms Either 2.0 or 3.0+ 2.0 still somewhat common Has shader support & framebuffer objects But lacks multiple render targets (MRTs)

28 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs OpenGL|ES on mobile platforms Either 2.0 or 3.0+ 2.0 still somewhat common Has shader support & framebuffer objects But lacks multiple render targets (MRTs) 3.0 getting adopted Has MRTs... … but no renderable floating point color formats

29 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs What about Compute Shaders? Tricky…

30 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs What about Compute Shaders? Tricky… OpenCL not officially included in Android Some devices support it anyway But might not have GL interop even then?

31 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs What about Compute Shaders? Tricky… OpenCL not officially included in Android Some devices support it anyway But might not have GL interop even then? OpenGL|ES 3.1 includes compute shaders Limited availability at the moment?

32 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Introduction - Summary On TBR architectures: keep stuff on-chip In the fast per-tile memory Going off-chip costs in terms of memory bandwidth and power consumption Keep this in mind in the next section discussing different methods

33 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Introduction - Summary Limited feature set Especially if targeting OpenGL|ES 2.0 Can’t rely on compute shaders being available For now.

34 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Many Light Methods Revisited

35 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Many-Light Methods Revisited Ola listed a number of methods in his introduction earlier in the course

36 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Many-Light Methods Revisited Ola listed a number of methods in his introduction earlier in the course Quickly revisit them Look at aspects related to mobile arch I.e., suitability on TBR and w.r.t. API limitations

37 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Methods “Plain” Forward (baseline) Traditional Deferred Tiled/Clustered Deferred Tiled/Clustered Forward “Practical” Clustered (by Emil) On-chip Deferred (new) Forward with on-chip Light Stack (new)

38 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled vs. Clustered I’ll lump tiled shading and clustered shading together for now

39 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled vs. Clustered I’ll lump tiled shading and clustered shading together for now The basic idea is similar enough Pick the one that suits your use case better E.g., tiled may be sufficient for 2D-ish top-down views … as described earlier vs. Just Cause 2 (Avalanche Studios 2010) Star Craft 2 (Blizzard 2010)

40 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Plain” Forward Assign lights per geometry batch Loop over lights in fragment shader

41 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Plain” Forward Assign lights per geometry batch Loop over lights in fragment shader Default method with no extra frills Possible in vanilla OpenGL|ES 2.0 (of course)

42 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Plain” Forward Assign lights per geometry batch Loop over lights in fragment shader Default method with no extra frills Possible in vanilla OpenGL|ES 2.0 (of course) Scales badly with number of lights See Ola’s introduction

43 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred Render scene to G-Buffers Color Normal Position(Z)

44 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred Render scene to G-Buffers Render lights using proxy geometry In shader: read G-Buffer, compute lighting, blend results Color Position(Z) Normal

45 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred Render scene to G-Buffers Render lights using proxy geometry In shader: read G-Buffer, compute lighting, blend results Many reads from G-Buffer, many writes to Framebuffer (one per light) Color Position(Z) Normal

46 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred Render scene to G-Buffers Render lights using proxy geometry In shader: read G-Buffer, compute lighting, blend results Many reads from G-Buffer, many writes to Framebuffer (one per light) Can get very good light assignment, though Color Position(Z) Normal

47 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred, II Requires multiple render targets I.e., not supported by vanilla OpenGL|ES 2.0 G-Buffers are expensive w.r.t. memory bandwidth Copied off-chip, so they can be accessed as textures

48 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred Designed to avoid some of the above issues

49 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred Designed to avoid some of the above issues Render scene to G-Buffers Light assignment with depth from G-Buffers 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … …

50 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred Designed to avoid some of the above issues Render scene to G-Buffers Light assignment with depth from G-Buffers In full-screen pass compute lighting Read G-Buffers once per pixel Write result to framebuffer once per pixel 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … … + =

51 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred Designed to avoid some of the above issues Render scene to G-Buffers Light assignment with depth from G-Buffers In full-screen pass compute lighting Read G-Buffers once per pixel Write result to framebuffer once per pixel Still requires MRT and off-chip G-Buffers 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … … + =

52 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred, II Original method relies on compute shaders To identify valid clusters And for light assignment

53 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred, II Original method relies on compute shaders To identify valid clusters And for light assignment For tiled shading Compute per-tile bounds with a fragment shader Perform light assignment on CPU No compute shaders needed (but read-back).

54 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Depth Tiled/Clustered Forward Use pre-pass to determine clusters/tile-Z-bounds Light assignment: per-cluster/per-tile light lists 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … …

55 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Depth Tiled/Clustered Forward Use pre-pass to determine clusters/tile-Z-bounds Light assignment: per-cluster/per-tile light lists Render scene normally In fragment shader Determine pixel’s cluster/tile Read cluster’s/tile’s light list Tiled Forward a.k.a. Forward+ 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … …

56 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Forward, II No MRT / G-Buffers!

57 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Forward, II No MRT / G-Buffers! Tiled Forward possible in OpenGL|ES 2.0 A bit messy, though (e.g., read-back) Handy extensions (not strictly necessary): Render-to-depth texture Dynamic looping in fragment shader

58 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered” Clustered variant presented by Emil Up-front light assignment to dense cluster structure on CPU

59 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered” Clustered variant presented by Emil Up-front light assignment to dense cluster structure on CPU Applicable to both deferred and forward Or a mix of them. I will be referring mostly to the forward variant, though.

60 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered”, II Single geometry pass

61 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered”, II Single geometry pass Has potentially issues with overdraw when used with forward only Front-to-back geometry Occlusion culling? … or depth pre-pass (two geometry passes, then).

62 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered”, II Single geometry pass Has potentially issues with overdraw when used with forward only Front-to-back geometry Occlusion culling? … or depth pre-pass (two geometry passes, then). Stays on-chip with TBR!

63 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Deferred with Tile Storage Presented by Sam Martin at SIGGRAPH 2013 Very much like Traditional Deferred But! Uses on-chip storage for the G-Buffer G-Buffer only stored while each tile is being processed

64 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Deferred with Tile Storage, II Relies on OpenGL Extensions EXT_shader_pixel_local_storage ARM_shader_framebuffer_fetch ARM_shader_framebuffer_fetch_depth_stencil

65 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Deferred with Tile Storage, II Relies on OpenGL Extensions EXT_shader_pixel_local_storage ARM_shader_framebuffer_fetch ARM_shader_framebuffer_fetch_depth_stencil Supported by e.g., ARM Mali T6xx But extensions unavailable on e.g. the Nexus 10 or Galaxy Alpha

66 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Deferred with Tile Storage, III EXT_shader_pixel_local_storage gives a small amount of on-chip per-pixel storage Preserved across fragment shader instances But not backed by external RAM A bit finicky E.g., writes to ordinary color output destroys the storage Incompatible with MSAA Read the spec!

67 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Forward with Light Stack Also presented by Sam Martin at SIGGRAPH 13 Perform depth-only pre-pass Splat lights to build per-pixel light lists Stored in per-pixel on-chip storage (EXT_shader_pixel_local_storage) Limited size light lists…

68 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Forward with Light Stack Also presented by Sam Martin at SIGGRAPH 13 Perform depth-only pre-pass Splat lights to build per-pixel light lists Stored in per-pixel on-chip storage (EXT_shader_pixel_local_storage) Limited size light lists… Forward render Loop over lights in the list

69 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Forward with Light Stack Also presented by Sam Martin at SIGGRAPH 13 Perform depth-only pre-pass Splat lights to build per-pixel light lists Stored in per-pixel on-chip storage (EXT_shader_pixel_local_storage) Limited size light lists… Forward render Loop over lights in the list C.f. “Light-Indexed Deferred Rendering”

70 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Forward with Light Stack, II Can perform blending Requires additional space in the per-pixel storage But not MSAA May change in the future With new extensions.

71 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Summary - Comparison (1) Out of box; (2) Blending must be done by hand into per-pixel storage (3) Correctly accounting for MSAA when deriving per-tile bounds can be tricky Plain Fwd Trad. Deferred Clust. Deferred Clust. Forward Practical Forward Deferred Tile Store Forward LightStack Off-chip G-Buffer NoYes No #geometry passes 1112112 Overshading IMR/TBIMR YesNo No: PreZ YesNoNo: PreZ Supports MSAA (1) YesNo Yes(3)YesNo Supports Blending (1) YesNo Yes NoYes(2) GL|ES Any?3.0/MRT 2.0 3.0 + Exts.

72 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Next up: a Practical Clustered Forward implementation Why practical clustered forward?

73 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Initially, Nexus 10 device only supported ES2.0 So, we ended up being limited by that

74 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Summary - Comparison Without OpenGL|ES 3.0 and MRT. Plain Fwd Trad. Deferred Clust. Deferred Clust. Forward Practical Forward Deferred Tile Store Forward LightStack Off-chip G-Buffer NoYes No #geometry passes 1112112 Overshading IMR/TBIMR YesNo No: PreZ YesNoNo: PreZ Supports MSAA (1) YesNo Yes(3)YesNo Supports Blending (1) YesNo Yes NoYes(2) GL|ES Any?3.0/MRT 2.0 3.0 + Exts.

75 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Started off with Tiled Forward Works, but had a lot of issues with depth discontinuities Read-back from GPU to CPU and back was a bottleneck too.

76 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Started off with Tiled Forward Works, but had a lot of issues with depth discontinuities Read-back from GPU to CPU and back was a bottleneck too. Practical Clustered Forward avoids these issues And supports blending and MSAA

77 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Started off with Tiled Forward Works, but had a lot of issues with depth discontinuities Read-back from GPU to CPU and back was a bottleneck too. Practical Clustered Forward avoids these issues And supports blending and MSAA Works on desktop GPU as-is Development & Debugging more fun :-)

78 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward I think it’s a good match for current devices

79 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward I think it’s a good match for current devices Might be worth trying out deferred Especially if you have overshading troubles and a PreZ pass is not viable (high geometric complexity)

80 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward I think it’s a good match for current devices Might be worth trying out deferred Especially if you have overshading troubles and a PreZ pass is not viable (high geometric complexity) Deferred by Martin et al. seems very interesting If you can rely on the extensions being available

81 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method… Totally untested idea

82 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method… Totally untested idea Deferred with Tile Storage + Practical Clustered

83 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method…, II Method overview: GPU: render tile G-buffers CPU: perform up-front light assignment Render full-screen quad to perform shading Optional: render transparent geometry (forward)

84 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method…, II Method overview: GPU: render tile G-buffers CPU: perform up-front light assignment Render full-screen quad to perform shading Optional: render transparent geometry (forward) Possibly in parallel

85 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method…, III Pro: No off-chip G-Buffer Single geometry pass No overshading Support for blending No read-back Cons: Rely on (possibly) unavailable extension MSAA trouble (Deferred) … totally untested ;-)

86 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Mobile Implementation

87 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Clustering, Redux Shown a couple of different ways of clustering

88 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Clustering, Redux Shown a couple of different ways of clustering E.g., The original sparse exponential clustering Emil’s up-front dense clustering (Tiling, which is a special case)

89 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Clustering, Redux Shown a couple of different ways of clustering E.g., The original sparse exponential clustering Emil’s up-front dense clustering (Tiling, which is a special case) I’m going to present one more variation.

90 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Up-front clustering assigns lights into dense structure

91 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Up-front clustering assigns lights into dense structure Possibly a ton of clusters Original sparse methods avoids this by only considering occupied clusters

92 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Reduce #clusters.. How? Lower resolution?

93 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Reduce #clusters.. How? Lower resolution? Observation: problematic close to the camera Lot’s of tiny clusters

94 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Reduce #clusters.. How? Lower resolution? Observation: problematic close to the camera Lot’s of tiny clusters A single light source may end up in all of them!

95 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Lot’s of extra work during light assignment With little to no gain Especially problematic if camera moves through light volumes

96 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Lot’s of extra work during light assignment With little to no gain Especially problematic if camera moves through light volumes Discussed previously E.g., move first subdivision in Z back (shown by Emil) Still have a lot of slices in the XY plane, though

97 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Different solution Cascaded clustering

98 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Different solution Cascaded clustering Subdivide frustum into “cascades” and select individual resolution for each

99 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Different solution Cascaded clustering Subdivide frustum into “cascades” and select individual resolution for each

100 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Cascade resolution selected to give Approx. cubical clusters with NxN pixel footprint But enforcing a minimal size of each cluster

101 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Cascade resolution selected to give Approx. cubical clusters with NxN pixel footprint But enforcing a minimal size of each cluster I currently use 12 cascades With 48x48 pixel footprints Minimum size: 0.5 Frustum: near =.1, far = 100 Could use some further tweaking…

102 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Compute cluster ID in shader: 2 steps..

103 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Compute cluster ID in shader: 2 steps.. Compute cascade index

104 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Compute cluster ID in shader: 2 steps.. Compute cascade index Compute cluster ID with cascade data

105 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results

106 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results 65 light sources Approx. 6000 non-empty clusters with on average 4 light sources Worst cluster has 14 lights. Clustering takes approx. 0.75 ms by itself Nexus 10 tablet Without cascades: ~5ms and 40k non-empty clusters

107 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results 65 light sources Approx. 6000 non-empty clusters with on average 4 light sources Worst cluster has 14 lights. Clustering takes approx. 0.75 ms by itself Nexus 10 tablet Without cascades: ~5ms and 40k non-empty clusters 0.4 ms On Samsung Galaxy Alpha

108 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results Average around 40 ms per frame total time 1280 x 720 Worst case: 60 ms per frame E.g., view to the left Use PreZ due to unsorted geometry. I.e. two geometry passes.

109 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results Average around 40 ms per frame total time 1280 x 720 Worst case: 60 ms per frame E.g., view to the left Use PreZ due to unsorted geometry. I.e. two geometry passes. 20 ms 30 ms On Samsung Galaxy Alpha

110 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results

111 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results 192 light sources < 3500 non-empty clusters with on average 3.5 light sources Worst cluster has 20 lights. Rather view-dependent (large-ish scene) Some views have ~500 active clusters only Clustering takes <0.35 ms by itself Samsung Galaxy Alpha

112 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results Average around 30 ms per frame total time 1280 x 720 Worst case: ~35 ms per frame Limited overlap between lights Again, with PreZ

113 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 To conclude…

114 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Conclusion There exist many light rendering methods that are viable on mobile hardware. The “Practical Clustered Forward” seems like a good choice at this point I may be a bit biased, though Other methods with very good potential out there (e.g. On-Chip Deferred) Especially if they become available on “normal” devices ;-) … and don’t need blending/transparency

115 SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 References Tiled Deferred Blending, Ladlani, 2014 The Revolution in Mobile Game Graphics, Martin and Bjørge 2014 Challenges with High Quality Mobile Graphics, Martin et al. 2013 An Evolution of Mobile Graphics, Shebanow 2013 Performance Tuning for Tile-Based Architectures, Merry 2012 Forward+: Bringing deferred lighting to the next level, Harada et al. 2012 Light Indexed Deferred Lighting, Treblico 2007 Our work on Tiled / Clustered Shading: Clustered Deferred and Forward Shading, Olsson et al. 2012 Tiled and Clustered Forward Shading, Olsson et al. 2012 Tiled Forward Shading, Billeter et al. 2013 (Slides –including references- available online later!)


Download ppt "SPONSORED BY SA2014.SIGGRAPH.ORG Efficient Real-Time Shading with Many Lights 2014 Many Light Rendering on Mobile Hardware Part 4/4 Presenter: Markus Billeter."

Similar presentations


Ads by Google