Presentation is loading. Please wait.

Presentation is loading. Please wait.

J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more.

Similar presentations


Presentation on theme: "J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more."— Presentation transcript:

1 J++ Machine Jeremy Sugerman Kayvon Fatahalian

2 Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more than one) and GPU cores –AMD Fusion/ Intel’s competing designs –PS3/XBox 360

3 Beliefs  GPUs and CPU-style cores fundamentally different (but both useful)  Important applications exhibit workloads suited for each type of arch –Eg. Raytracing/shading fundamentally different  Programmer understands system contains different cores, chooses appropriately –Inappropriate to abstract differences… but this does not preclude unified system (“CPU” includes SPU)

4 Assumptions (new rules)  Combination of GPU/CPU cores (with good interconnect)  Unified system address space  Both CPU/GPU work can create new “work”  “Work” dispatch HW driven on GPU, SW driven on CPU  Optional –SW-managed CPU L2 –Intercore sync/signaling primitives

5 Claims - Phase 1 (Arch/HW interface)  We can show interactive rendering (with raytracing) can benefit from this platform  Work queues with associated scheduling/resource allocations are a good way to describe work –Discrete execution environments –Granularity (threads/kernels/streams)/properties of work  We will develop a scheduling/resource management algorithm that works –Note hints to scheduler

6 Claims - Phase 2 (DirectK)  We can present programmer with unified execution environment –Architecture independent –Queue-based application level programming model –Shader API extensions – TBD

7 Research Questions  Can you write a useful app? (with fast interconnect and ability to create work from work)  How to schedule/launch/run work? –Scheduling model HW vs. SW implementation Understand execution environments  How to make it tractable to write an application? (to program architecture) –What is the application level queue abstraction?

8 Demonstrations  Synthetic (micro) tests on rough runtime –Execute GPU programs + CPU threads –Fragments creates fragments/vertices –Fragments create CPU work –Assumptions: No horizontal, bounded creation, unordered  Hybrid scheduling investigation –More detailed simulation environment  Hybrid renderer design

9 Evaluation  Feasibility (arch)  Utilization –State vs. performance tradeoff –Maintaining coherence (instruction/data)  Containing state explosion –Number of queues –Length of queues (size of state) –Per “work” state –Scheduling/spilling/handing failure

10 List of Papers  HW scheduling with geometry shader delta? –GPU only micro-architecture evaluation  Whole system implementation (fully exercise CPU-GPU queues with renderer) –Unified queue abstraction (HW/SW impl) –Synchronization discussed?  Deluxe hybrid renderer in DirectK –Configurable queues, specialized exec, synchr

11 Other slides

12 Specialized “Work”  fragments  vertices  rays  Geometric primitive  CPU threads (Graphics-specific terms)

13 Work environments  Fragment (no gather)  Fragment + gather  Fragment + scatter  Fragment + create work (bounded)  Fragment + sync  CPU  CPU + SW managed L2 (Resource-specific terms)

14 Describing work with queues?  Queue per environment?  Queue per kernel? (else how to specify kernel)  Queue per processor?  Dynamic SW specified  How to describe work –Kernel? (is this an issue?) –Arguments  Granularity of creation –Kernel + stream –Element by element  Spawn/Enqueue constraints –Bounding / explosion (TTL? HW Kills?)

15 Scheduling policy  Prioritize coherence  Minimize required state –Prioritize leafs when # threads explodes  Issues –Fail gracefully –Ensure Progress


Download ppt "J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more."

Similar presentations


Ads by Google