Download presentation
Presentation is loading. Please wait.
Published byBrittany Jennings Modified over 9 years ago
1
J++ Machine Jeremy Sugerman Kayvon Fatahalian
2
Background Multicore CPUs Generalized GPUs (Brook, CTM, CUDA) Tightly coupled traditional CPU (more than one) and GPU cores –AMD Fusion/ Intel’s competing designs –PS3/XBox 360
3
Beliefs GPUs and CPU-style cores fundamentally different (but both useful) Important applications exhibit workloads suited for each type of arch –Eg. Raytracing/shading fundamentally different Programmer understands system contains different cores, chooses appropriately –Inappropriate to abstract differences… but this does not preclude unified system (“CPU” includes SPU)
4
Assumptions (new rules) Combination of GPU/CPU cores (with good interconnect) Unified system address space Both CPU/GPU work can create new “work” “Work” dispatch HW driven on GPU, SW driven on CPU Optional –SW-managed CPU L2 –Intercore sync/signaling primitives
5
Claims - Phase 1 (Arch/HW interface) We can show interactive rendering (with raytracing) can benefit from this platform Work queues with associated scheduling/resource allocations are a good way to describe work –Discrete execution environments –Granularity (threads/kernels/streams)/properties of work We will develop a scheduling/resource management algorithm that works –Note hints to scheduler
6
Claims - Phase 2 (DirectK) We can present programmer with unified execution environment –Architecture independent –Queue-based application level programming model –Shader API extensions – TBD
7
Research Questions Can you write a useful app? (with fast interconnect and ability to create work from work) How to schedule/launch/run work? –Scheduling model HW vs. SW implementation Understand execution environments How to make it tractable to write an application? (to program architecture) –What is the application level queue abstraction?
8
Demonstrations Synthetic (micro) tests on rough runtime –Execute GPU programs + CPU threads –Fragments creates fragments/vertices –Fragments create CPU work –Assumptions: No horizontal, bounded creation, unordered Hybrid scheduling investigation –More detailed simulation environment Hybrid renderer design
9
Evaluation Feasibility (arch) Utilization –State vs. performance tradeoff –Maintaining coherence (instruction/data) Containing state explosion –Number of queues –Length of queues (size of state) –Per “work” state –Scheduling/spilling/handing failure
10
List of Papers HW scheduling with geometry shader delta? –GPU only micro-architecture evaluation Whole system implementation (fully exercise CPU-GPU queues with renderer) –Unified queue abstraction (HW/SW impl) –Synchronization discussed? Deluxe hybrid renderer in DirectK –Configurable queues, specialized exec, synchr
11
Other slides
12
Specialized “Work” fragments vertices rays Geometric primitive CPU threads (Graphics-specific terms)
13
Work environments Fragment (no gather) Fragment + gather Fragment + scatter Fragment + create work (bounded) Fragment + sync CPU CPU + SW managed L2 (Resource-specific terms)
14
Describing work with queues? Queue per environment? Queue per kernel? (else how to specify kernel) Queue per processor? Dynamic SW specified How to describe work –Kernel? (is this an issue?) –Arguments Granularity of creation –Kernel + stream –Element by element Spawn/Enqueue constraints –Bounding / explosion (TTL? HW Kills?)
15
Scheduling policy Prioritize coherence Minimize required state –Prioritize leafs when # threads explodes Issues –Fail gracefully –Ensure Progress
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.