Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis.

Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Parallelism in Interactive Graphics Well-expressed in hardware as well as APIs Well-expressed in hardware as well as APIs Consistently growing in degree & expression Consistently growing in degree & expression – More and more cores on upcoming GPUs – From programmable shaders to pipelines We should rethink algorithms to exploit this We should rethink algorithms to exploit this This paper provides one example This paper provides one example – Parallelization of composite/filter stages

A Feed-Forward Rendering Pipeline Geometry Processing RasterizationRasterization CompositeComposite FilterFilter Primitives Pixels

Composite & Filter Input: Input: – Unordered list of fragments Output Output – Pixel colors Assumption Assumption – No fragments are discarded Pixel Sample Locations

Basic Idea Insufficientparallelism Irregularity

Pixel-Parallel Processors

Basic Idea Insufficientparallelism Irregularity Fragment-Parallel Processors

Motivation Most applications have low depth complexity Most applications have low depth complexity – Pixel-level parallelism is sufficient We are interested in applications with We are interested in applications with – Very high depth complexity – High variation in depth complexity Further Further – Future platforms will demand more parallelism – High depth-complexity can limit pixel-parallelism

Motivation

Related Work Order-Independent Transparency (OIT) Depth-Peeling [Everitt 01] Depth-Peeling [Everitt 01] – One pass per transparent layer Stencil-Routed A-buffer [Myers & Bavoil 07] Stencil-Routed A-buffer [Myers & Bavoil 07] – One pass per 8 depth layers 1 Bucket Depth-Peeling [Liu et al. 09] Bucket Depth-Peeling [Liu et al. 09] – One pass per up to 32 layers 2 1 Maximum MSAA samples per pixel 2 Maximum render targets

Related Work Order-Independent Transparency (OIT) OIT using Direct3D 11 [Gruen et al. 10] OIT using Direct3D 11 [Gruen et al. 10] – Use fragment linked-lists – Per-pixel sort and composite Hair Self-Shadowing [Sintorn et al. 09] Hair Self-Shadowing [Sintorn et al. 09] – Each fragment computes its contribution – Assumes constant opacity

Related Work Programmable Rendering Pipelines RenderAnts [Zhou et al. 09] RenderAnts [Zhou et al. 09] – Sort fragments globally – Per-pixel composite/filter FreePipe [Liu et al. 10] FreePipe [Liu et al. 10] – Sort fragments globally – Per-pixel composite/filter

Pixel-Parallel Formulation PiPi P (i+1) P (i+2) SjSj S (j+1) S (j+2) S (j+3) S (j+4) S (j+5) S (j+6) j(j+1)(j+2)(j+3)(j+4)(j+5)(j+6) Thread IDs P: Pixel S: Subsample

Pixel-Parallel Formulation Workload size Workload size – Depends on number of fragments – Limits the size of rendering Degree of parallelism Degree of parallelism – Depends on number of pixels/subpixels These two may not always correspond These two may not always correspond

Fragment-Parallel Formulation P: Pixel S: Subsample PiPi P (i+1) P (i+2) SjSj S (j+1) S (j+2) S (j+3) S (j+4) S (j+5) S (j+6) P: Pixel S: Subsample Thread IDs j j+1 j+2 j+3 j+4 j+5 j+6 j+7 j+8 j+9 j+10 j+11 j+12 j+13 j+14 j+15 j+16 j+17 j+18 j+19 j+20 j+21 j+22 j+23

Fragment-Parallel Formulation How can this behavior be achieved? How can this behavior be achieved? Revisit the composite equation Revisit the composite equation C s = α 1 C 1 + (1-α 1 ){α 2 C 2 +(1-α 2 )(…(α N +(1-α N )C B )…} fragment 1 fragment 2 … background C s = 1.α 1.C 1 + (1-α 1 ).α 2.C 2 + (1-α 1 )(1-α 2 ).α 3.C 3 + … + (1-α 1 )(1-α 2 )…(1-α k-1 ).α i.C k + … + (1-α 1 )(1-α 2 )…(1-α k-1 ).α i.C k + … + (1-α 1 )(1-α 2 )…(1-α N ).C B + (1-α 1 )(1-α 2 )…(1-α N ).C B Local Contribution L k Global Contribution G k

Fragment-Parallel Formulation L k is trivially parallel (local computation) L k is trivially parallel (local computation) G k is the result of a scan operation (product) G k is the result of a scan operation (product) For the list of input fragments For the list of input fragments – Compute G[ ] and L[ ], multiply – Perform reduction to add subpixel contributions C s = G 1.L 1 + G 2.L 2 + G 3.L 3 … G N.L N G k = (1-α 1 ).(1-α 2 )…(1-α k-1 ) L k = α k.C k

Fragment-Parallel Formulation Filter, for every pixel: Filter, for every pixel: This can be expressed as another reduction This can be expressed as another reduction – After multiplying with subpixel weights κ m – Can be merged with previous reduction C p = C s1.κ 1 + C s2.κ 2 + … + C sM.κ M

Fragment-Parallel Composite & Filter Final Algorithm 1.Two-key sort (Subpixel ID, depth) 2.Segmented Scan (obtain G k ) 3.Premultiply with weights ( L k, κ m ) 4.Segmented Reduction

Fragment-Parallel Formulation P: Pixel S: Subsample PiPi P (i+1) P (i+2) P: Pixel S: Subsample Segmented Scan (product) Segmented Reduction (sum)

Implementation Hardware used: NVIDIA GeForce GTX 280 Hardware used: NVIDIA GeForce GTX 280 We require fast Segmented Scan and Reduce We require fast Segmented Scan and Reduce – CUDPP library provides that – Restricts implementation to NVIDIA CUDA No direct access to hardware rasterizer No direct access to hardware rasterizer – We wrote our own

Example System – Polygons Applications Applications – Games Depth Complexity Depth Complexity – 1 to few tens of layers – Suited to pixel-parallel Fragment-parallel software rasterizer Fragment-parallel software rasterizer

Example System – Particles Applications Applications – Simulations, games Depth Complexity Depth Complexity – Hundreds of layers – High depth-variance Particle-parallel sprite rasterizer Particle-parallel sprite rasterizer

Example System – Volumes Applications Applications – Scientific Visualization Depth Complexity Depth Complexity – Tens to Hundreds of layers – Low depth-variance Major-axis-slice rasterizer Major-axis-slice rasterizer

Example System – Reyes Applications Applications – Offline rendering Depth Complexity Depth Complexity – Tens of layers – Moderate depth variance Data-parallel micropolygon rasterizer Data-parallel micropolygon rasterizer

Performance Results

Performance Variation

Limitations Increased memory traffic Increased memory traffic – Several passes through CUDPP primitives Unclear how to optimize for special cases Unclear how to optimize for special cases – Threshold opacity – Threshold depth complexity

Summary and Conclusion Parallel formulation of composite equation Parallel formulation of composite equation – Maps well to known primitives – Can be integrated with filter – Consistent performance across varying workloads FPC is applicable to future rendering pipelines FPC is applicable to future rendering pipelines – Exploits higher degree of parallelism – Better related to size of rendering workload A tool for building programmable pipelines A tool for building programmable pipelines

Future Work Performance Performance – Reduction in memory traffic – Extension to special-case scenes – Hybrid PPC-FPC formulations Applications Applications – Integration with hardware rasterizer – Cinematic rendering, Photoshop

Acknowledgments NSF Award 0541448 NSF Award 0541448 SciDAC Insitute for Ultrascale Visualization SciDAC Insitute for Ultrascale Visualization NVIDIA Research Fellowship NVIDIA Research Fellowship Equipment donated by NVIDIA Equipment donated by NVIDIA Discussions and Feedback Discussions and Feedback – Shubho Sengupta (UC Davis), Matt Pharr (Intel), Aaron Lefohn (Intel), Mike Houston (AMD) – Anonymous reviewers Implementation assistance Implementation assistance – Jeff Stuart, Shubho Sengupta

Thanks!

Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis.

Similar presentations

Presentation on theme: "Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis.

Similar presentations

Presentation on theme: "Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis."— Presentation transcript:

Similar presentations

About project

Feedback