Download presentation
Presentation is loading. Please wait.
1
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan
2
2 Problem Statement Facilitate efficient development and execution in many-/multi-core commodity systems. Homogeneous or heterogeneous cores. Status Quo: GPUs: Easy to write GL/D3D and run it fast, hard to express anything else CPUs: Possible (not easy) to write anything, possible (hard) to run it fast
3
3 GRAMPS Background Resembles a GPU with software constructed pipeline. Not (too) radical even in a pure graphics context Similar story saw fixed -> programmable shading Now the pipeline topology is under analogous pressures: proliferation of stages and options And graphics is more than a GL/D3D pipeline… And throughput / many-core is more than graphics…
4
4 GRAMPS Programming Model Software constructs the pipeline (actually graph) Exposes threads, shaders, fixed function stages –Coprocessors exposed via ISA Exposes FIFOs / Queues connecting stages Also enables software push / re-sorting Exposes Buffers for memory access
5
5 GRAMPS’ Place Compared to GPU Pipeline: More things possible (and medium easy), still (mostly) runs fast, less hardware independent Compared to CPU: Easier to write things, easier to run them well, some loss of expressivity and flexibility Still a role for a ‘graphics pipeline’. It’s an app! GRAMPS is a layer, model for state machines.
6
6 GRAMPS and Streaming From some angles, GRAMPS sounds a lot like Stream Processing / Computing Distinctions are most visible in the target traits. Streaming expects predictable data creation, flow, and consumption. Intensive offline / compile-time optimization and pre-scheduling. GRAMPS expects dynamic data-dependent execution, (and thus) run-time scheduling Also, GRAMPS assumes commodity and heterogeneity.
7
GRAMPS Examples Rast Shade FB Blend Frame Buffer Input Fragment Queue Output Fragment Queue Camera Intersect FB Blend Frame Buffer Ray Queue Sample Queue Shade Pixel Queue Rasterization Pipeline Ray Tracing Pipeline
8
8 GRAMPS Overview Concepts: Graphs Stages: thread, shader, fixed-function Queues: ordered, unordered, sets (exclusion) Buffers Components APIs: setup/driver, thread, shader Scheduler: fat core, shader core, top-level
9
9 What We’ve Built Three rendering pipelines: Direct3D, Packet Tracer, D3D + Push (Hybrid) Simulator and Runtime for two machines: GPU-like: Many threads per core, hw sched CPU-like: Few threads per core, sw sched
10
10 Rendering Pipelines Direct3D Pipeline (with Ray-tracing Extension) IA 1 VS 1 RO Rast Trace IA N VS N PS Frame Buffer Vertex Buffers Sample Queue Set Ray Queue Primitive Queue Input Vertex Queue 1 Primitive Queue 1 Input Vertex Queue N … … Ray-tracing Pipeline Tiler Sampler CameraIntersect Shade FB Blend Frame Buffer Sample Queue Tile Queue Ray Queue Ray Hit Queue Fragment Queue = Thread Stage = Shader Stage = Fixed-func Stage = Queue = Output via Push OM PS2 Fragment Queue = Stage Output Ray Hit Queue Ray-tracing Extension Primitive Queue N
11
11 Initial Results Measured thread occupancy, worst case total queue memory.
12
12 GRAMPS Vis
13
13 High-level Challenges Is GRAMPS a suitable GPU evolution? –Enable pipeline competitive with bare metal? –Enable innovation: advanced / alternative methods? –Is there a ‘best’ graphics pipeline on top? Is GRAMPS a good parallel compute model? –Map well to hardware, hardware trends? –Support important apps? –Concepts influence developers?
14
14 What’s Next? Low level implementation: scheduling, more accurate simulation. More apps: REYES, physics, likely more. Audit and refine model: graph modification / state change, fork-join / blocking calls, locks / barriers / synchronization primitives intra- or inter-stage Prototype, explore next generation graphics pipelines.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.