Download presentation
Presentation is loading. Please wait.
1
Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008
2
2 Background, Outline Stanford Graphics / Architecture Research CPU, GPU trends And collision? Two research areas: –HW/SW Interface, Programming Model –Future Graphics API
3
3 Problem Statement Drive efficient development and execution in many- /multi-core systems. Support homogeneous, heterogeneous cores. Inform future hardware Status Quo: GPU Pipeline (Good for GL, otherwise hard) CPU (No guidance, fast is hard)
4
4 Software defined graphs Producer-consumer, data-parallelism Initial focus on rendering GRAMPS Input Fragment Queue Output Fragment Queue Rasterization Pipeline Ray Tracing Pipeline = Thread Stage = Shader Stage = Fixed-func Stage = Queue = Stage Output Frame Buffer Ray Queue Ray Hit Queue Fragment Queue CameraIntersect Shade FB Blend Frame Buffer Shade FB Blend Rasterize
5
5 As a GPU Evolution Not (too) radical for ‘graphics’ Like fixed → programmable shading –Pipeline undergoing massive shake up –Diversity of new parameters and use cases Bigger picture than ‘graphics’ –Rendering is more than GL/D3D –Compute is more than rendering –Larrabee has no innate pipeline
6
6 As a Compute Evolution Sounds like streaming: Execution graphs, kernels, data-parallelism Streaming: “squeeze out every FLOP” –Goals: bulk transfer, arithmetic intensity –Intensive static analysis, custom chips (mostly) –Bounded space, data access, execution time GRAMPS: “interesting apps are irregular” –Goals: Dynamic, data-dependent code –Aggregate work at run-time –Heterogeneous commodity platforms –Naturally supports streaming when applicable
7
7 GRAMPS’ Role A ‘graphics pipeline’ is now an app! GRAMPS models parallel state machines. Compared to status quo: –More flexible than a GPU pipeline –More guidance than bare metal –Portability in between –Not domain specific
8
8 GRAMPS Interfaces Host/Setup: Create execution graph Thread: Stateful, singleton Shader: Data-parallel, auto-instanced
9
9 What We’ve Built (System)
10
10 GRAMPS Scheduler Tiered Scheduler ‘Fat’ cores: per-thread, per-core ‘Micro’ cores: shared hw scheduler Top level: tier N
11
11 What We’ve Built (Apps) Direct3D Pipeline (with Ray-tracing Extension) Ray-tracing Pipeline IA 1 VS 1 RO Rast Trace IA N VS N PS Frame Buffer Vertex Buffers Sample Queue Set Ray Queue Primitive Queue Input Vertex Queue 1 Primitive Queue 1 Input Vertex Queue N … … OM PS2 Fragment Queue Ray Hit Queue Ray-tracing Extension Primitive Queue N Tiler Shade FB Blend Frame Buffer Sample Queue Tile Queue Ray Queue Ray Hit Queue Fragment Queue Camera Sampler Intersect = Thread Stage = Shader Stage = Fixed-func = Queue = Stage Output = Push Output
12
12 Initial Results Queues are small, utilization is good
13
13 GRAMPS Visualization
14
14 GRAMPS Visualization
15
15 GRAMPS Portability Portability really means performance. Less portable than GL/D3D –GRAMPS graph is hardware sensitive More portable than bare metal –Enforces modularity –Best case, just works –Worst case, saves boilerplate
16
16 High-level Challenges Is GRAMPS a suitable GPU evolution? –Enable pipeline competitive with bare metal? –Enable innovation: advanced / alternative methods? Is GRAMPS a good parallel compute model? –Map well to hardware, hardware trends? –Support important apps? –Concepts influence developers?
17
17 What’s Next for GRAMPS? Implementation: scheduling, simulation details Model: Graph modification (state change) Blocking calls (join) Intra/inter-stage synchronization primitives Data sharing / ref-counting Workloads: REYES, physics, others? Develop new graphics pipelines…
18
“Real-Time REYES” 18
19
19 Just Build It Build a real-time REYES pipeline... … that is tightly integrated with ray tracing for global effects.
20
20 What does real-time REYES mean? (to us) Smooth surfaces via adaptive tessellation –Everything is a displaced subdivision surface Shade on surface, prior to rasterization Stochastic rasterization for motion blur and DOF Order-independent transparency
21
21 Split Dice Shade Rasterize Z Test Blend/Resolve Displace Early Z Tessellate (xbox) Early Z Frag Shade Z Test Blend/Resolve Vertex Shade Rasterize REYES OpenGL/Direct3D
22
22 Split primitive into smaller primitives until a “GOOD” grid can be created. REYES Tessellation
23
23
24
24
25
25
26
26 Grids GOOD GRID = - Max polygon area < 1 pixel - All polys about the same size - Bounded # polys per grid Regular parametric sampling of primitive surface (like XBox360). Compact representation for many adjacent polygons. Grids provide SIMD efficiency and bulk processing benefits.
27
27 Split Dice Shade Rast/Crack Fix Z Test Blend/Resolve Displace Early Z Tessellate (xbox) Early Z Frag Shade Z Test Blend/Resolve Vertex Shade Rast REYESOpenGL/Direct3D
28
28 What does real-time REYES mean? (to us) Smooth surfaces via adaptive tessellation –Splitting is irregular (and serial) –Crack fixing Shade on surface, prior to rasterization –We feel confident about this –But most “work” done before moving to raster space… hmm Stochastic rasterization for motion blur and DOF –Many tiny polygons parallel rasterization –SIMD tricky Order-independent transparency –Not unique to REYES
29
29 Shading in a Hybrid System Evaluate displacement (due to REYES or on demand for ray tracing) Shade grids Shade ray hits Looking forward… shade quads too? One shading system or two or three?
30
This Project is Really About Re-architecting REYES pipeline for real-time performance (for throughput architectures like LRB) Hybrid rendering: study interoperability of advanced techniques (REYES + ray tracing + maybe Direct3D ) –Hybrid shading system –Understand workload balance Hybrid pipeline interface: real-time, retained mode Pursuit of more flexible, advanced graphics pipelines
31
31 Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.