Presenter: Hung-Fu Li HPDS Lab. NKUAS vCUDA: GPU Accelerated High Performance Computing in Virtual Machines Lin Shi, Hao Chen and Jianhua Sun IEEE 2009
2 Lecture Outline Abstract33 Background44 Motivation55 CUDA Architecture77 vCUDA Architecture88 Experiment Result1313 Conclusion1919
3 Abstract This paper describe vCUDA, a GPGPU computation solution for virtual machine. The author announced that the API interception and redirection could provide transparent and high performance to the applications. This paper would carry out the performance evaluation on the overhead of their framework.
4 Background VM(Virtual Machine) CUDA (Computation Unified Device Architecture) API (Application Programming Interface) API Interception, Redirection RPC(Remote Procedure Call)
5 Motivation Virtualization may be the simplest solution to heterogeneous computation environment. Hardware varied by vendors, it is not necessary for VM- developer to implements hardware drivers for them. (due to license, vendor would not public the source and kernel technique)
6 Motivation ( cont. ) Currently the virtualization does only support Accelerated Graphic API such as OpenGL, named VMGL, which is not used for general computation purpose.
7 CUDA Architecture Component Stack CUDA Enabled Device CUDA Driver API CUDA Runtime API CUDA Driver User Application >
8 vCUDA Architecture Split the stack into hardware/software binding CUDA Enabled Device CUDA Driver API CUDA Runtime API CUDA Driver User Application > hard binding soft binding Direct communicate Part of SDK
9 vCUDA Architecture ( cont. ) Re-group the stack into host and remote side. CUDA Enabled Device [v]CUDA Driver API [v]CUDA Runtime API CUDA Driver User Application > CUDA Driver API Host binding Remote binding (guestOS) Part of SDK [v]CUDA Enabled Device(vGPU)
10 vCUDA Architecture ( cont. ) Use fake API as adapter to adapt the instant driver and the virtual driver. API Interception Parameters passed Order Semantics Hardware State Communication Use Lazy-RPC Transmission Use XML-RPC as high-level communication.(for cross-platform requirement) [v]CUDA Driver API [v]CUDA Runtime API Remote binding (guestOS) [v]CUDA Enabled Device(vGPU)
11 vCUDA Architecture ( cont. ) Virtual Machine OSHost OS lazyRPC Non instant API Instant API
12 vCUDA Architecture ( cont. ) vCUDA API with virtual GPU Lazy RPC Reduce the overhead of switching between host OS and guest OS. APLazyRPC vGPU Hardware states API Invocation GPU Instant api call NonInstant API call NonInstant Package Stub vStub
13 Experiment Result Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility
14 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility
15 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility
16 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility
17 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility
18 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility MV: Matrix Vector Multiplication Algorithm StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems MRRR: Multiple Relatively Robust Representations GPUmg: Molecular Dynamics Simulation with GPU
19 Conclusion They have developed CUDA interface for virtual machine, which is compatible to the native interface. The data transmission is a significant bottleneck, due to RPC XML- parsing. This presentation have briefly present the major architecture of the vCUDA and the idea of it. We could extend the architecture as component / solution to make the cloud computing support GPU.
20 End of Presentation Thanks for your listening.