J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more.

J++ Machine Jeremy Sugerman Kayvon Fatahalian

Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more than one) and GPU cores –AMD Fusion/ Intel’s competing designs –PS3/XBox 360

Beliefs  GPUs and CPU-style cores fundamentally different (but both useful)  Important applications exhibit workloads suited for each type of arch –Eg. Raytracing/shading fundamentally different  Programmer understands system contains different cores, chooses appropriately –Inappropriate to abstract differences… but this does not preclude unified system (“CPU” includes SPU)

Assumptions (new rules)  Combination of GPU/CPU cores (with good interconnect)  Unified system address space  Both CPU/GPU work can create new “work”  “Work” dispatch HW driven on GPU, SW driven on CPU  Optional –SW-managed CPU L2 –Intercore sync/signaling primitives

Claims - Phase 1 (Arch/HW interface)  We can show interactive rendering (with raytracing) can benefit from this platform  Work queues with associated scheduling/resource allocations are a good way to describe work –Discrete execution environments –Granularity (threads/kernels/streams)/properties of work  We will develop a scheduling/resource management algorithm that works –Note hints to scheduler

Claims - Phase 2 (DirectK)  We can present programmer with unified execution environment –Architecture independent –Queue-based application level programming model –Shader API extensions – TBD

Research Questions  Can you write a useful app? (with fast interconnect and ability to create work from work)  How to schedule/launch/run work? –Scheduling model HW vs. SW implementation Understand execution environments  How to make it tractable to write an application? (to program architecture) –What is the application level queue abstraction?

Demonstrations  Synthetic (micro) tests on rough runtime –Execute GPU programs + CPU threads –Fragments creates fragments/vertices –Fragments create CPU work –Assumptions: No horizontal, bounded creation, unordered  Hybrid scheduling investigation –More detailed simulation environment  Hybrid renderer design

Evaluation  Feasibility (arch)  Utilization –State vs. performance tradeoff –Maintaining coherence (instruction/data)  Containing state explosion –Number of queues –Length of queues (size of state) –Per “work” state –Scheduling/spilling/handing failure

List of Papers  HW scheduling with geometry shader delta? –GPU only micro-architecture evaluation  Whole system implementation (fully exercise CPU-GPU queues with renderer) –Unified queue abstraction (HW/SW impl) –Synchronization discussed?  Deluxe hybrid renderer in DirectK –Configurable queues, specialized exec, synchr

J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more.

Similar presentations

Presentation on theme: "J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more.

Similar presentations

Presentation on theme: "J++ Machine Jeremy Sugerman Kayvon Fatahalian. Background  Multicore CPUs  Generalized GPUs (Brook, CTM, CUDA)  Tightly coupled traditional CPU (more."— Presentation transcript:

Similar presentations

About project

Feedback