Download presentation
Presentation is loading. Please wait.
Published byNeal Peters Modified over 9 years ago
1
Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling Extracting performance models at runtime Memory management Asymmetric Distributed Shared Memory StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines, Cédric Augonnet, Samuel Thibault, and Raymond Namyst. TR-7240, INRIA, March 2010. [link] [link] An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems, Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro, Wen-mei Hwu, ASPLOS’10 [pdf]pdf
2
Today: Bridging runtime and language support ‘Virtualizing GPUs’ Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf] Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011
3
Today: Bridging runtime and language support ‘Virtualizing GPUs’ Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf] Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011 best paper!
4
Context: clouds shift to support HPC applications initially tightly coupled applications not suited for could applications today Chinese – cloud with 40Gbps infiniband Amazaon HPC instance GPU instances: Amazon, Nimbix Challenge: make GPUs shared resources in the could.
5
Challenge: make GPUs a shared resource in the could. Why do this? GPUs are costly resources Multiple VMs on a node with a single GPU Increase utilization app level: some apps might not use GPUs much; kernel level: some kernels can be collocatd
6
Two streams 1. How? 2. Evaluate … opportunities gains overheads
7
1. The ‘How?’ Preamble: Concurrent kernels are supported by today’s GPUs Each kernel can execute a different task Tasks can be mapped to different streaming multiprocessors (using thread-block configuration) Problem: concurrent execution limited to the set of kernels invoked within a single processor context Past virtualization solutions API rerouting / intercept library
8
1. The ‘How?’ Preamble: Concurrent kernels are supported by today’s GPUs Each kernel can execute a different task Tasks can be mapped to different streaming multiprocessors (using thread-block configuration) Problem: concurrent execution limited to the set of kernels invoked within a single processor context
9
1. The ‘How?’ Architecture
10
2. Evaluation – The opportunity The opportunity Key assumption: Under-utilization of GPUs Space-sharing Kernels occupy different SP Time-sharing Kernels time-share same SP (benefit form harware support form context switces) Note: is it not always possible
11
2. Evaluation – The opportunity The opportunity Key assumption: Under-utilization of GPUs Sharing Space-sharing Kernels occupy different SP Time-sharing Kernels time-share same SP (benefit form harware support form context switces) Note: resource conflicts may prevent this Molding – change kernel configuration (different number of thread blocks / threads per block) to improve collocation
12
2. Evaluation – The gains
13
2. Evaluation – The overheads
14
Discussion Limitations Hardware support
15
OpenCL vs. CUDA http://ft.ornl.gov/doku/shoc/level1 http://ft.ornl.gov/pubs-archive/shoc.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.