Download presentation
Presentation is loading. Please wait.
Published byCamilla Horton Modified over 9 years ago
1
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 04/11/11 1 GPU Computing in EGI environment using Cloud approach F. Vella, R. Cefalà, A. Costantini, O. Gervasi, and C. Tanci Dept of Mathematics and Computer Science University of Perugia, Italy osvaldo@unipg.it April 11, 2011 Vilnius (Lithuania) EGI User Forum 2011
2
www.egi.eu EGI-InSPIRE RI-261323 Introduction GPU Computing has generated considerable interest in the scientific community Some EGI Virtual Organizations are reshap- ing their applications to exploit the new programming paradigm. Cloud Computing implements a transparent use of computational resources We provide the access to a GPU environ- ment on EGI using a Cloud approach 04/11/11 2
3
www.egi.eu EGI-InSPIRE RI-261323 Introduction GPU Computing is pushing the development of distribution models aimed at integrating HPC and HTC resources. We provide on-demand execution environ- ments through the joint usage of g-Lite components and the EC2 web-service API. The entire Job flow has been defined, enabl- ing the request of GPU resources through JDL parameters, dynamically allocating the resources in a Cloud-like infrastructure 04/11/11 3
4
www.egi.eu EGI-InSPIRE RI-261323 Grid and Cloud Grid and Cloud paradigms share some essential driving ideas and overlapping areas which lead to −Encapsulate the complexity of hardware resources and make them easily accessible by means of high-level user interfaces −Address the intrinsic scalability issues of large scale computational challenges −Cope with the need of resources that cannot be hosted locally 04/11/11 4
5
www.egi.eu EGI-InSPIRE RI-261323 Grid and Cloud Among the key differences between Grid and Cloud there are those related to abstraction and computational models 04/11/11 5 Cloud definition: A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet.
6
www.egi.eu EGI-InSPIRE RI-261323 Grid and Cloud Clouds enable the users to choose between different computational models suited to match the requirements of a particular application This allow to encompass some limits of the batch model of the Grid, where resources are often statically managed and partitioned preventing any possible dynamical arrangement to match the application requirements. The workload in Grid is often unpredictable leading to unbalanced distribution on the use of resources and to a reduction of QoS 04/11/11 6
7
www.egi.eu EGI-InSPIRE RI-261323 Grid and Cloud Cloud Computing presents still some weak aspects that require further developments which are instead well-established in Grid, i.e: −Security −Data management The more reasonable approach could be an integration model that combines the features of both paradigms 04/11/11 7
8
www.egi.eu EGI-InSPIRE RI-261323 Grid and Cloud integration Hybrid with batch-dependent cloud-enabled LRMS −a single local batch system is used to schedule the jobs on a pool of dynamically provisioned re- sources on public/private clouds Hybrid batch independent −the local Grid site spawns resources on public/ private clouds on the basis of job requests. The integration is done at the Computing Element level. It enables the creation of multiple virtual clusters, including LRMS. 04/11/11 8
9
www.egi.eu EGI-InSPIRE RI-261323 HPC in EGI Grid can be seen as the ideal computing infrastruc- ture to solve very large computational campaigns, distributing calculations on the available Grid resources in a secure way. 04/11/11 9
10
www.egi.eu EGI-InSPIRE RI-261323 HPC in EGI: MPI EGI allows to the execution of HPC applica- tions, at Grid site level, using the MPI API, which gained an important role in many VOs, paying some efforts to integrate the parallel environment with gLite middleware. Nevertheless, the use of MPI in Grid takes some limitations due mainly to the policies adopted by the scheduler on each Grid site. This reduces the advantage of parallelization and may cause job failures 04/11/11 10
11
www.egi.eu EGI-InSPIRE RI-261323 Some MPI tests We ran some tests using COMPCHEM VO applica- tions and results confirmed that there are several problems running MPI jobs. 04/11/11 11
12
www.egi.eu EGI-InSPIRE RI-261323 HPC in EGI: Multi/Manycore The opportunity to using heterogeneous resources (CPUs and GPU) as compute units can solve the need of HPC resources. An application written in OpenCL or CUDA is inherently parallel but the management of its workflow is simple like a single job. The cost of communications among threads is smaller in a single node in respect of an MPI one An application as the like would be less dependent from the site peculiarities. Not all HPC scientific application can be implemented in a Many/Multicore fashion. 04/11/11 12
13
www.egi.eu EGI-InSPIRE RI-261323 A simple integration model A typical Grid site runs an application that “speaks” with the CREAM CE BLAH component and an IaaS provider To create a working testbed we chose and implemented a simple infrastructure for the provision of GPU virtual instances using Eucalyptus on top of the Xen Hypervisor The interfaces to the Cloud Controller are Amazon EC2 compliant and we used boto API to develop our testbed 04/11/11 13
14
www.egi.eu EGI-InSPIRE RI-261323 A simple integration model The physical GPU resources have been virtualized and made available in a Infrastructure as a Service (IaaS) private Cloud. A centralized mechanism that interfaces a generic EGI Grid site to an IaaS provider has been implemented to fulfill job requirements. The fully working testbed has been built with the adoption of Eucalyptus software system to implement a private Cloud over the cluster. We created virtual appliances for OpenCL or CUDA to match job requirements related to Multi/Manycore technology. 04/11/11 14
15
www.egi.eu EGI-InSPIRE RI-261323 Eucalyptus architecture 04/11/11 15
16
www.egi.eu EGI-InSPIRE RI-261323 Xen and GPU virtualization When hardware prerequisites are met, Xen permits to transparently assign PCIE devices to Virtual Machines. A single PCIE resource cannot be shared between Virtual Machines (unless one separate and decouple the graphic driver between front-end and back-end*). At the present time only a few GPUs officially support passing through Virtual Machines. The selection of commodity components, new chipsets, motherboards, CPUs and GPUs require a particular care since some components are not compatible with full hardware virtualization. * 04/11/11 16 F. Giunta, R. Montella, G. Laccetti, F. Isaila and F. J. García Blas. A GPU Accelerated High Performance Cloud Computing Infrastructure for Grid Computing Based Virtual Environmental Laboratory. Advances in Grdi Computing, Edited by Z. Constantinescu, pp.121-146, InTech publ. ISBN 978-953-307-301-9 (2011)
17
www.egi.eu EGI-InSPIRE RI-261323 OpenCL performances To assess OpenCL performance gap between a real machine and a virtual one a testbed has been implemented using an Asus P7P55 LX motherboard with support of both Intel Virtualization Technology (VT-x) and Intel Virtualization Technology for Directed I/O (VT-D) provided by I7 870 CPU, a 4 cores, 64-bit, x86 part from INTEL. Graphic Adapter: AMD FireStream 9270. We used Phoronix test suite benchmark The virtual GPU is 3% slower 04/11/11 17 Black=Real GPU Gray=Virtual GPU
18
www.egi.eu EGI-InSPIRE RI-261323 The implemented system The main components are: Grid site CE and BLAH, a demon named resource-marshal and an IaaS provider that exposes EC2 interfaces. For each job the CE receives, after the BLAH parsing operation, we are able to gather all informa- tion for that job and marshal its execution. To this end we implemented a basic daemon that manages the allocation of instances according to Job information and simple policies 04/11/11 18
19
www.egi.eu EGI-InSPIRE RI-261323 The implemented system Job information are used to determine the flavor of the required virtual appliance, specified via GlueSchema in the JDL If an instance is already available the job proceeds immediately. To implement such behavior a call to simple client application is interposed to the BLAH XXX_job_submit scripts that inquires the resource-marshal daemon about the availability of resources 04/11/11 19 Type = "job"; executable="/bin/sleep"; [...] CeRequirements= "other.GlueHostMainMemoryRAMSize > 2048 && (Member( "GPU\", other. GlueHostApplicationSoftwareRuntimeEnvironmen t))"; The daemon keeps track of available instances and sends requests of new ones to the IaaS provider via boto EC2 interfaces. We haven't applied a strict termination policy for the instances, however the possibilities are: Job termination events can be used to trigger the reclamation of unused instances Grid users proxy lifetime could be bound to the instances lifecycle. Some instances could be kept permanently running
20
www.egi.eu EGI-InSPIRE RI-261323 Conclusions We presented a system to provide to EGI users a specific on-demand GPU environment to transparently execute jobs on GPU devices The proposed system uses a Cloud approach, based on EC2 Compliant Clouds, in order to control the specific GPU-enabled VO environment from EGI middleware interfaces. The proposed solution enables the users to run their applications in parallel, exploring the Many/Multicore capabilities of the Grid nodes. This innovative approach provides to the users an interesting alternative to MPI for running parallel jobs 04/11/11 20
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.