Last time: Runtime infrastructure for hybrid (GPU-based) platforms  Task scheduling Extracting performance models at runtime  Memory management Asymmetric.

Last time: Runtime infrastructure for hybrid (GPU-based) platforms  Task scheduling Extracting performance models at runtime  Memory management Asymmetric Distributed Shared Memory StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines, Cédric Augonnet, Samuel Thibault, and Raymond Namyst. TR-7240, INRIA, March 2010. [link] [link] An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems, Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro, Wen-mei Hwu, ASPLOS’10 [pdf]pdf

Today:  Bridging runtime and language support  ‘Virtualizing GPUs’ Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf] Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011

Today:  Bridging runtime and language support  ‘Virtualizing GPUs’ Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf] Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011  best paper!

Context: clouds shift to support HPC applications  initially tightly coupled applications not suited for could applications  today Chinese – cloud with 40Gbps infiniband Amazaon HPC instance GPU instances: Amazon, Nimbix Challenge: make GPUs shared resources in the could.

Challenge: make GPUs a shared resource in the could. Why do this?  GPUs are costly resources Multiple VMs on a node with a single GPU  Increase utilization app level: some apps might not use GPUs much; kernel level: some kernels can be collocatd

Two streams 1. How? 2. Evaluate … opportunities gains overheads

1. The ‘How?’ Preamble: Concurrent kernels are supported by today’s GPUs  Each kernel can execute a different task  Tasks can be mapped to different streaming multiprocessors (using thread-block configuration)  Problem: concurrent execution limited to the set of kernels invoked within a single processor context Past virtualization solutions  API rerouting / intercept library

1. The ‘How?’ Preamble: Concurrent kernels are supported by today’s GPUs  Each kernel can execute a different task  Tasks can be mapped to different streaming multiprocessors (using thread-block configuration)  Problem: concurrent execution limited to the set of kernels invoked within a single processor context

1. The ‘How?’ Architecture

2. Evaluation – The opportunity The opportunity  Key assumption: Under-utilization of GPUs  Space-sharing Kernels occupy different SP  Time-sharing Kernels time-share same SP (benefit form harware support form context switces) Note: is it not always possible

2. Evaluation – The opportunity The opportunity  Key assumption: Under-utilization of GPUs  Sharing Space-sharing  Kernels occupy different SP Time-sharing  Kernels time-share same SP (benefit form harware support form context switces) Note: resource conflicts may prevent this  Molding – change kernel configuration (different number of thread blocks / threads per block) to improve collocation

2. Evaluation – The gains

2. Evaluation – The overheads

Discussion  Limitations  Hardware support

OpenCL vs. CUDA http://ft.ornl.gov/doku/shoc/level1 http://ft.ornl.gov/pubs-archive/shoc.pdf

Last time: Runtime infrastructure for hybrid (GPU-based) platforms  Task scheduling Extracting performance models at runtime  Memory management Asymmetric.

Similar presentations

Presentation on theme: "Last time: Runtime infrastructure for hybrid (GPU-based) platforms  Task scheduling Extracting performance models at runtime  Memory management Asymmetric."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Last time: Runtime infrastructure for hybrid (GPU-based) platforms  Task scheduling Extracting performance models at runtime  Memory management Asymmetric.

Similar presentations

Presentation on theme: "Last time: Runtime infrastructure for hybrid (GPU-based) platforms  Task scheduling Extracting performance models at runtime  Memory management Asymmetric."— Presentation transcript:

Similar presentations

About project

Feedback