Presenter: Hung-Fu Li HPDS Lab. NKUAS 2009-12-31 1 vCUDA: GPU Accelerated High Performance Computing in Virtual Machines Lin Shi, Hao Chen and Jianhua.

Slides:



Advertisements
Similar presentations
IEEE INFOCOM 2004 MultiNet: Connecting to Multiple IEEE Networks Using a Single Wireless Card.
Advertisements

Database System Concepts and Architecture
Operating Systems Manage system resources –CPU scheduling –Process management –Memory management –Input/Output device management –Storage device management.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
By : Versha Thakur Shravani Aishwarya
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
Evaluating GPU Passthrough in Xen for High Performance Cloud Computing Andrew J. Younge 1, John Paul Walters 2, Stephen P. Crago 2, and Geoffrey C. Fox.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Object Based Operating Systems1 Learning Objectives Object Orientation and its benefits Controversy over object based operating systems Object based operating.
ELEC6200, Fall 07, Oct 29 Westrom: Virtual Machines 1 Kenneth Westrom ELEC-6620.
CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.
DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
CSE598C Virtual Machines and Their Applications Operating System Support for Virtual Machines Coauthored by Samuel T. King, George W. Dunlap and Peter.
Ch4: Distributed Systems Architectures. Typically, system with several interconnected computers that do not share clock or memory. Motivation: tie together.
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
Supporting GPU Sharing in Cloud Environments with a Transparent
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
Operating System Support for Virtual Machines Samuel T. King, George W. Dunlap,Peter M.Chen Presented By, Rajesh 1 References [1] Virtual Machines: Supporting.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Other Topics RPC & Middleware.
Java-Based Middleware IT 490 Stan Senesy IT Program NJIT.
Improving Network I/O Virtualization for Cloud Computing.
Communication Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology.
Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.
Introduction 1-1 Introduction to Virtual Machines From “Virtual Machines” Smith and Nair Chapter 1.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
PART II OPERATING SYSTEMS LECTURE 8 SO TAXONOMY Ştefan Stăncescu 1.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Presented By:- Sudipta Dhara Roll Table of Content Table of Content 1.Introduction 2.How it evolved 3.Need of Middleware 4.Middleware Basic 5.Categories.
CS 346 – Chapter 2 OS services –OS user interface –System calls –System programs How to make an OS –Implementation –Structure –Virtual machines Commitment.
Department of Computer Science and Software Engineering
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
1 Chapter 38 RPC and Middleware. 2 Middleware  Tools to help programmers  Makes client-server programming  Easier  Faster  Makes resulting software.
Operating-System Structures
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Introduction to EJB. What is an EJB ?  An enterprise java bean is a server-side component that encapsulates the business logic of an application. By.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Virtual Machines Mr. Monil Adhikari. Agenda Introduction Classes of Virtual Machines System Virtual Machines Process Virtual Machines.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
By Adam Reimel. Outline Introduction Platform Architecture Future Conclusion.
Synergy.cs.vt.edu VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units Shucai Xiao 1, Pavan Balaji 2, Qian Zhu 3,
VMM-Independent Graphics Acceleration H. Andrés Lagar-Cavilla, U of Toronto Niraj Tolia (CMU), Eyal de Lara (Toronto), M. Satyanarayanan.
Xen GPU Rider. Outline Target & Vision GPU & Xen CUDA on Xen GPU Hardware Acceleration On VM - VMGL.
VMGL: VMM-Independent Graphics Acceleration H. Andrés Lagar-Cavilla, U of Toronto Niraj Tolia (CMU), Eyal de Lara (Toronto), M.
Computer System Structures
Parallel Programming Models
The Post Windows Operating System
NEWS LAB 薛智文 嵌入式系統暨無線網路實驗室
GPUvm: GPU Virtualization at the Hypervisor
Use server-based personal desktops in Windows Server 2016
Group 8 Virtualization of the Cloud
A Survey on Virtualization Technologies
Process Migration Troy Cogburn and Gilbert Podell-Blume
Middleware, Services, etc.
Introduction to CUDA.
Introduction to Virtual Machines
Outline Operating System Organization Operating System Examples
Introduction to Virtual Machines
Presentation transcript:

Presenter: Hung-Fu Li HPDS Lab. NKUAS vCUDA: GPU Accelerated High Performance Computing in Virtual Machines Lin Shi, Hao Chen and Jianhua Sun IEEE 2009

2 Lecture Outline Abstract33 Background44 Motivation55 CUDA Architecture77 vCUDA Architecture88 Experiment Result1313 Conclusion1919

3 Abstract This paper describe vCUDA, a GPGPU computation solution for virtual machine. The author announced that the API interception and redirection could provide transparent and high performance to the applications. This paper would carry out the performance evaluation on the overhead of their framework.

4 Background VM(Virtual Machine)‏ CUDA (Computation Unified Device Architecture)‏ API (Application Programming Interface)‏ API Interception, Redirection RPC(Remote Procedure Call)‏

5 Motivation Virtualization may be the simplest solution to heterogeneous computation environment. Hardware varied by vendors, it is not necessary for VM- developer to implements hardware drivers for them. (due to license, vendor would not public the source and kernel technique)‏

6 Motivation ( cont. )‏ Currently the virtualization does only support Accelerated Graphic API such as OpenGL, named VMGL, which is not used for general computation purpose.

7 CUDA Architecture Component Stack CUDA Enabled Device CUDA Driver API CUDA Runtime API CUDA Driver User Application >

8 vCUDA Architecture Split the stack into hardware/software binding CUDA Enabled Device CUDA Driver API CUDA Runtime API CUDA Driver User Application > hard binding soft binding Direct communicate Part of SDK

9 vCUDA Architecture ( cont. )‏ Re-group the stack into host and remote side. CUDA Enabled Device [v]CUDA Driver API [v]CUDA Runtime API CUDA Driver User Application > CUDA Driver API Host binding Remote binding (guestOS)‏ Part of SDK [v]CUDA Enabled Device(vGPU)‏

10 vCUDA Architecture ( cont. )‏ Use fake API as adapter to adapt the instant driver and the virtual driver. API Interception Parameters passed Order Semantics Hardware State Communication Use Lazy-RPC Transmission Use XML-RPC as high-level communication.(for cross-platform requirement)‏ [v]CUDA Driver API [v]CUDA Runtime API Remote binding (guestOS)‏ [v]CUDA Enabled Device(vGPU)‏

11 vCUDA Architecture ( cont. )‏ Virtual Machine OSHost OS lazyRPC Non instant API Instant API

12 vCUDA Architecture ( cont. )‏ vCUDA API with virtual GPU Lazy RPC Reduce the overhead of switching between host OS and guest OS. APLazyRPC vGPU Hardware states API Invocation GPU Instant api call NonInstant API call NonInstant Package Stub vStub

13 Experiment Result Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility

14 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility

15 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility

16 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility

17 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility

18 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility MV: Matrix Vector Multiplication Algorithm StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems MRRR: Multiple Relatively Robust Representations GPUmg: Molecular Dynamics Simulation with GPU

19 Conclusion They have developed CUDA interface for virtual machine, which is compatible to the native interface. The data transmission is a significant bottleneck, due to RPC XML- parsing. This presentation have briefly present the major architecture of the vCUDA and the idea of it. We could extend the architecture as component / solution to make the cloud computing support GPU.

20 End of Presentation Thanks for your listening.