Download presentation
Presentation is loading. Please wait.
Published byRoger Lloyd Modified over 8 years ago
1
SPONSORED BY SA2014.SIGGRAPH.ORG Efficient Real-Time Shading with Many Lights 2014 Many Light Rendering on Mobile Hardware Part 4/4 Presenter: Markus Billeter - Visualization and Multimedia Lab, University of Zürich - Chalmers University of Technology
2
SPONSORED BY SA2014.SIGGRAPH.ORG Efficient Real-Time Shading with Many Lights 2014 Overview Introduction: Mobile Hardware What’s different? Different methods Review/Repetition – but with focus on mobile arch. Practical Clustered Forward implementation
3
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Introduction: Mobile Hardware
4
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Mobile vs. Desktop So far considered modern high-end systems How is mobile hardware different? What are the challenges?
5
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Mobile vs. Desktop, II Differences & Challenges: Lower memory bandwidth Fewer features (still) Energy consumption (more) important
6
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Mobile vs. Desktop, II Some of this is improving E.g., features are being “ported” But still noticeable if you target current devices I.e., what’s out there right now.
7
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Mobile vs. Desktop, III Desktop GPUs are mainly Immediate Mode Renderers (IMR) Mobile GPU architecture includes both Immediate Mode Renderers (IMR) Tile Based Renderers (TBR) TBR probably more common than IMR For now?
8
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Immediate Mode Renderer (IMR) “Normal” rendering Geometry transformed and then rasterized immediately Framebuffer resides mainly in VRAM Multiple writes to VRAM
9
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tile Based Renderer (TBR) Framebuffer divided into tiles Geometry binned into tiles Tiles processed independently later
10
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tile Based Renderer (TBR) Framebuffer divided into tiles Geometry binned into tiles Tiles processed independently later The tile’s portion of the framebuffer is kept in fast on-chip memory!
11
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tile Based Renderer (TBR), II Tile’s FB-contents stored to RAM when needed
12
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tile Based Renderer (TBR), II Tile’s FB-contents stored to RAM when needed Best case: Once, when all rendering to that tile has finished
13
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Some Examples By the presented classification: TBR: Mali, PowerVR, Adreno IMR: Tegra (and most desktop GPUs) See “Performance Tuning for Tile-Based Applications”; Bruce Merry, OpenGL Insights, 2012 “Tiled Deferred Blending”; Ramses Ladlani, GPU Pro 5, 2014
14
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Variants Two main variants Tile based immediate mode rendering (TBIMR) Tile based deferred rendering (TBDR)
15
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBIMR Tile based immediate mode rendering Per-tile geometry (after binning) processed in IMR fashion
16
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBIMR Tile based immediate mode rendering Per-tile geometry (after binning) processed in IMR fashion Uses depth buffer for hidden surface removal Possibly with EarlyZ Submit geometry front-to-back Overdraw can (and will) occur
17
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBDR Tile based deferred rendering Hidden surface removal before shading Geometry order shouldn’t matter No overdraw/overshading
18
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Summary Majority of mobile GPUs are TBR At least, for now So, want to pick methods that map well to this
19
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Important Aspects All TBR keep the tile’s portion of the FB in fast on-chip memory.
20
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Important Aspects All TBR keep the tile’s portion of the FB in fast on-chip memory. Make it stay there! Loads/stores to/from RAM use precious memory bandwidth Implications to performance and energy consumption
21
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Goal Goal: keep data in the on-chip buffer when possible True for both TBR variants (TBIMR & TBDR)
22
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 TBR – Goal Goal: keep data in the on-chip buffer when possible True for both TBR variants (TBIMR & TBDR) Preferably without affecting IMR performance.
23
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Important! Tiled based rendering (TBR) != Tiled Shading
24
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Important! Tiled based rendering (TBR) != Tiled Shading TBR Hardware property (it’s out of your hands) Tiled Shading Software algorithm (your choice) Src: Wikipedia/Exynos, by User:Köf3, CC-SA3
25
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Important! It’s perfectly valid to use Tiled Shading on a TBR platform More on this later
26
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs OpenGL|ES on mobile platforms Either 2.0 or 3.0+
27
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs OpenGL|ES on mobile platforms Either 2.0 or 3.0+ 2.0 still somewhat common Has shader support & framebuffer objects But lacks multiple render targets (MRTs)
28
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs OpenGL|ES on mobile platforms Either 2.0 or 3.0+ 2.0 still somewhat common Has shader support & framebuffer objects But lacks multiple render targets (MRTs) 3.0 getting adopted Has MRTs... … but no renderable floating point color formats
29
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs What about Compute Shaders? Tricky…
30
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs What about Compute Shaders? Tricky… OpenCL not officially included in Android Some devices support it anyway But might not have GL interop even then?
31
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Features & APIs What about Compute Shaders? Tricky… OpenCL not officially included in Android Some devices support it anyway But might not have GL interop even then? OpenGL|ES 3.1 includes compute shaders Limited availability at the moment?
32
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Introduction - Summary On TBR architectures: keep stuff on-chip In the fast per-tile memory Going off-chip costs in terms of memory bandwidth and power consumption Keep this in mind in the next section discussing different methods
33
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Introduction - Summary Limited feature set Especially if targeting OpenGL|ES 2.0 Can’t rely on compute shaders being available For now.
34
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Many Light Methods Revisited
35
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Many-Light Methods Revisited Ola listed a number of methods in his introduction earlier in the course
36
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Many-Light Methods Revisited Ola listed a number of methods in his introduction earlier in the course Quickly revisit them Look at aspects related to mobile arch I.e., suitability on TBR and w.r.t. API limitations
37
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Methods “Plain” Forward (baseline) Traditional Deferred Tiled/Clustered Deferred Tiled/Clustered Forward “Practical” Clustered (by Emil) On-chip Deferred (new) Forward with on-chip Light Stack (new)
38
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled vs. Clustered I’ll lump tiled shading and clustered shading together for now
39
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled vs. Clustered I’ll lump tiled shading and clustered shading together for now The basic idea is similar enough Pick the one that suits your use case better E.g., tiled may be sufficient for 2D-ish top-down views … as described earlier vs. Just Cause 2 (Avalanche Studios 2010) Star Craft 2 (Blizzard 2010)
40
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Plain” Forward Assign lights per geometry batch Loop over lights in fragment shader
41
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Plain” Forward Assign lights per geometry batch Loop over lights in fragment shader Default method with no extra frills Possible in vanilla OpenGL|ES 2.0 (of course)
42
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Plain” Forward Assign lights per geometry batch Loop over lights in fragment shader Default method with no extra frills Possible in vanilla OpenGL|ES 2.0 (of course) Scales badly with number of lights See Ola’s introduction
43
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred Render scene to G-Buffers Color Normal Position(Z)
44
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred Render scene to G-Buffers Render lights using proxy geometry In shader: read G-Buffer, compute lighting, blend results Color Position(Z) Normal
45
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred Render scene to G-Buffers Render lights using proxy geometry In shader: read G-Buffer, compute lighting, blend results Many reads from G-Buffer, many writes to Framebuffer (one per light) Color Position(Z) Normal
46
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred Render scene to G-Buffers Render lights using proxy geometry In shader: read G-Buffer, compute lighting, blend results Many reads from G-Buffer, many writes to Framebuffer (one per light) Can get very good light assignment, though Color Position(Z) Normal
47
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Traditional Deferred, II Requires multiple render targets I.e., not supported by vanilla OpenGL|ES 2.0 G-Buffers are expensive w.r.t. memory bandwidth Copied off-chip, so they can be accessed as textures
48
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred Designed to avoid some of the above issues
49
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred Designed to avoid some of the above issues Render scene to G-Buffers Light assignment with depth from G-Buffers 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … …
50
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred Designed to avoid some of the above issues Render scene to G-Buffers Light assignment with depth from G-Buffers In full-screen pass compute lighting Read G-Buffers once per pixel Write result to framebuffer once per pixel 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … … + =
51
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred Designed to avoid some of the above issues Render scene to G-Buffers Light assignment with depth from G-Buffers In full-screen pass compute lighting Read G-Buffers once per pixel Write result to framebuffer once per pixel Still requires MRT and off-chip G-Buffers 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … … + =
52
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred, II Original method relies on compute shaders To identify valid clusters And for light assignment
53
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Deferred, II Original method relies on compute shaders To identify valid clusters And for light assignment For tiled shading Compute per-tile bounds with a fragment shader Perform light assignment on CPU No compute shaders needed (but read-back).
54
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Depth Tiled/Clustered Forward Use pre-pass to determine clusters/tile-Z-bounds Light assignment: per-cluster/per-tile light lists 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … …
55
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Depth Tiled/Clustered Forward Use pre-pass to determine clusters/tile-Z-bounds Light assignment: per-cluster/per-tile light lists Render scene normally In fragment shader Determine pixel’s cluster/tile Read cluster’s/tile’s light list Tiled Forward a.k.a. Forward+ 0063064 … 0 1 1 3 4 3 7 1 66 1 67 2 69 2 … … L0L0 L1L1 L2L2 L3L3 L4L4 L5L5 … L6L6 4 … …
56
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Forward, II No MRT / G-Buffers!
57
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Tiled/Clustered Forward, II No MRT / G-Buffers! Tiled Forward possible in OpenGL|ES 2.0 A bit messy, though (e.g., read-back) Handy extensions (not strictly necessary): Render-to-depth texture Dynamic looping in fragment shader
58
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered” Clustered variant presented by Emil Up-front light assignment to dense cluster structure on CPU
59
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered” Clustered variant presented by Emil Up-front light assignment to dense cluster structure on CPU Applicable to both deferred and forward Or a mix of them. I will be referring mostly to the forward variant, though.
60
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered”, II Single geometry pass
61
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered”, II Single geometry pass Has potentially issues with overdraw when used with forward only Front-to-back geometry Occlusion culling? … or depth pre-pass (two geometry passes, then).
62
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 “Practical Clustered”, II Single geometry pass Has potentially issues with overdraw when used with forward only Front-to-back geometry Occlusion culling? … or depth pre-pass (two geometry passes, then). Stays on-chip with TBR!
63
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Deferred with Tile Storage Presented by Sam Martin at SIGGRAPH 2013 Very much like Traditional Deferred But! Uses on-chip storage for the G-Buffer G-Buffer only stored while each tile is being processed
64
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Deferred with Tile Storage, II Relies on OpenGL Extensions EXT_shader_pixel_local_storage ARM_shader_framebuffer_fetch ARM_shader_framebuffer_fetch_depth_stencil
65
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Deferred with Tile Storage, II Relies on OpenGL Extensions EXT_shader_pixel_local_storage ARM_shader_framebuffer_fetch ARM_shader_framebuffer_fetch_depth_stencil Supported by e.g., ARM Mali T6xx But extensions unavailable on e.g. the Nexus 10 or Galaxy Alpha
66
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Deferred with Tile Storage, III EXT_shader_pixel_local_storage gives a small amount of on-chip per-pixel storage Preserved across fragment shader instances But not backed by external RAM A bit finicky E.g., writes to ordinary color output destroys the storage Incompatible with MSAA Read the spec!
67
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Forward with Light Stack Also presented by Sam Martin at SIGGRAPH 13 Perform depth-only pre-pass Splat lights to build per-pixel light lists Stored in per-pixel on-chip storage (EXT_shader_pixel_local_storage) Limited size light lists…
68
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Forward with Light Stack Also presented by Sam Martin at SIGGRAPH 13 Perform depth-only pre-pass Splat lights to build per-pixel light lists Stored in per-pixel on-chip storage (EXT_shader_pixel_local_storage) Limited size light lists… Forward render Loop over lights in the list
69
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Forward with Light Stack Also presented by Sam Martin at SIGGRAPH 13 Perform depth-only pre-pass Splat lights to build per-pixel light lists Stored in per-pixel on-chip storage (EXT_shader_pixel_local_storage) Limited size light lists… Forward render Loop over lights in the list C.f. “Light-Indexed Deferred Rendering”
70
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Forward with Light Stack, II Can perform blending Requires additional space in the per-pixel storage But not MSAA May change in the future With new extensions.
71
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Summary - Comparison (1) Out of box; (2) Blending must be done by hand into per-pixel storage (3) Correctly accounting for MSAA when deriving per-tile bounds can be tricky Plain Fwd Trad. Deferred Clust. Deferred Clust. Forward Practical Forward Deferred Tile Store Forward LightStack Off-chip G-Buffer NoYes No #geometry passes 1112112 Overshading IMR/TBIMR YesNo No: PreZ YesNoNo: PreZ Supports MSAA (1) YesNo Yes(3)YesNo Supports Blending (1) YesNo Yes NoYes(2) GL|ES Any?3.0/MRT 2.0 3.0 + Exts.
72
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Next up: a Practical Clustered Forward implementation Why practical clustered forward?
73
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Initially, Nexus 10 device only supported ES2.0 So, we ended up being limited by that
74
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Summary - Comparison Without OpenGL|ES 3.0 and MRT. Plain Fwd Trad. Deferred Clust. Deferred Clust. Forward Practical Forward Deferred Tile Store Forward LightStack Off-chip G-Buffer NoYes No #geometry passes 1112112 Overshading IMR/TBIMR YesNo No: PreZ YesNoNo: PreZ Supports MSAA (1) YesNo Yes(3)YesNo Supports Blending (1) YesNo Yes NoYes(2) GL|ES Any?3.0/MRT 2.0 3.0 + Exts.
75
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Started off with Tiled Forward Works, but had a lot of issues with depth discontinuities Read-back from GPU to CPU and back was a bottleneck too.
76
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Started off with Tiled Forward Works, but had a lot of issues with depth discontinuities Read-back from GPU to CPU and back was a bottleneck too. Practical Clustered Forward avoids these issues And supports blending and MSAA
77
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Started off with Tiled Forward Works, but had a lot of issues with depth discontinuities Read-back from GPU to CPU and back was a bottleneck too. Practical Clustered Forward avoids these issues And supports blending and MSAA Works on desktop GPU as-is Development & Debugging more fun :-)
78
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward I think it’s a good match for current devices
79
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward I think it’s a good match for current devices Might be worth trying out deferred Especially if you have overshading troubles and a PreZ pass is not viable (high geometric complexity)
80
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward I think it’s a good match for current devices Might be worth trying out deferred Especially if you have overshading troubles and a PreZ pass is not viable (high geometric complexity) Deferred by Martin et al. seems very interesting If you can rely on the extensions being available
81
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method… Totally untested idea
82
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method… Totally untested idea Deferred with Tile Storage + Practical Clustered
83
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method…, II Method overview: GPU: render tile G-buffers CPU: perform up-front light assignment Render full-screen quad to perform shading Optional: render transparent geometry (forward)
84
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method…, II Method overview: GPU: render tile G-buffers CPU: perform up-front light assignment Render full-screen quad to perform shading Optional: render transparent geometry (forward) Possibly in parallel
85
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 A hypothetical method…, III Pro: No off-chip G-Buffer Single geometry pass No overshading Support for blending No read-back Cons: Rely on (possibly) unavailable extension MSAA trouble (Deferred) … totally untested ;-)
86
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Practical Clustered Forward Mobile Implementation
87
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Clustering, Redux Shown a couple of different ways of clustering
88
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Clustering, Redux Shown a couple of different ways of clustering E.g., The original sparse exponential clustering Emil’s up-front dense clustering (Tiling, which is a special case)
89
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Clustering, Redux Shown a couple of different ways of clustering E.g., The original sparse exponential clustering Emil’s up-front dense clustering (Tiling, which is a special case) I’m going to present one more variation.
90
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Up-front clustering assigns lights into dense structure
91
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Up-front clustering assigns lights into dense structure Possibly a ton of clusters Original sparse methods avoids this by only considering occupied clusters
92
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Reduce #clusters.. How? Lower resolution?
93
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Reduce #clusters.. How? Lower resolution? Observation: problematic close to the camera Lot’s of tiny clusters
94
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Reduce #clusters.. How? Lower resolution? Observation: problematic close to the camera Lot’s of tiny clusters A single light source may end up in all of them!
95
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Lot’s of extra work during light assignment With little to no gain Especially problematic if camera moves through light volumes
96
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 The Problem Lot’s of extra work during light assignment With little to no gain Especially problematic if camera moves through light volumes Discussed previously E.g., move first subdivision in Z back (shown by Emil) Still have a lot of slices in the XY plane, though
97
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Different solution Cascaded clustering
98
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Different solution Cascaded clustering Subdivide frustum into “cascades” and select individual resolution for each
99
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Different solution Cascaded clustering Subdivide frustum into “cascades” and select individual resolution for each
100
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Cascade resolution selected to give Approx. cubical clusters with NxN pixel footprint But enforcing a minimal size of each cluster
101
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Cascade resolution selected to give Approx. cubical clusters with NxN pixel footprint But enforcing a minimal size of each cluster I currently use 12 cascades With 48x48 pixel footprints Minimum size: 0.5 Frustum: near =.1, far = 100 Could use some further tweaking…
102
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Compute cluster ID in shader: 2 steps..
103
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Compute cluster ID in shader: 2 steps.. Compute cascade index
104
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Cascaded Clustering Compute cluster ID in shader: 2 steps.. Compute cascade index Compute cluster ID with cascade data
105
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results
106
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results 65 light sources Approx. 6000 non-empty clusters with on average 4 light sources Worst cluster has 14 lights. Clustering takes approx. 0.75 ms by itself Nexus 10 tablet Without cascades: ~5ms and 40k non-empty clusters
107
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results 65 light sources Approx. 6000 non-empty clusters with on average 4 light sources Worst cluster has 14 lights. Clustering takes approx. 0.75 ms by itself Nexus 10 tablet Without cascades: ~5ms and 40k non-empty clusters 0.4 ms On Samsung Galaxy Alpha
108
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results Average around 40 ms per frame total time 1280 x 720 Worst case: 60 ms per frame E.g., view to the left Use PreZ due to unsorted geometry. I.e. two geometry passes.
109
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results Average around 40 ms per frame total time 1280 x 720 Worst case: 60 ms per frame E.g., view to the left Use PreZ due to unsorted geometry. I.e. two geometry passes. 20 ms 30 ms On Samsung Galaxy Alpha
110
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results
111
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results 192 light sources < 3500 non-empty clusters with on average 3.5 light sources Worst cluster has 20 lights. Rather view-dependent (large-ish scene) Some views have ~500 active clusters only Clustering takes <0.35 ms by itself Samsung Galaxy Alpha
112
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Results Average around 30 ms per frame total time 1280 x 720 Worst case: ~35 ms per frame Limited overlap between lights Again, with PreZ
113
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 To conclude…
114
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 Conclusion There exist many light rendering methods that are viable on mobile hardware. The “Practical Clustered Forward” seems like a good choice at this point I may be a bit biased, though Other methods with very good potential out there (e.g. On-Chip Deferred) Especially if they become available on “normal” devices ;-) … and don’t need blending/transparency
115
SA2014.SIGGRAPH.ORG SPONSORED BYEfficient Real-Time Shading with Many Lights 2014 References Tiled Deferred Blending, Ladlani, 2014 The Revolution in Mobile Game Graphics, Martin and Bjørge 2014 Challenges with High Quality Mobile Graphics, Martin et al. 2013 An Evolution of Mobile Graphics, Shebanow 2013 Performance Tuning for Tile-Based Architectures, Merry 2012 Forward+: Bringing deferred lighting to the next level, Harada et al. 2012 Light Indexed Deferred Lighting, Treblico 2007 Our work on Tiled / Clustered Shading: Clustered Deferred and Forward Shading, Olsson et al. 2012 Tiled and Clustered Forward Shading, Olsson et al. 2012 Tiled Forward Shading, Billeter et al. 2013 (Slides –including references- available online later!)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.