The VM deployment process has 3 major steps: 1.The client queries the VM repository, sending a list of criteria describing a workspace. The repository returns a list of VM descriptors that match them. 2.The client contacts the VMManager, sending it the descriptor of the VM they want to deploy, along with an identifier, and a lifetime for the VM. The VMManager authorizes the request using an access control list. 3.The VM instance is registered with the VMManager and the VM is copied from the VMRepository. The VMManager then interfaces with the VMM on the resource to power on the VM. How does using VMs help the Bioinformatics community? Virtual Machines meet the Grids Performance Implications Do VMs fulfill their promise? VM Deployment The performance of applications running on a VM depends on the third-party VMMs and the applications themselves. A purely CPU- bound program will have almost no performance degradation as all instructions will be executed directly on hardware. Typically, virtual machines intercept privileged instructions (such as I/O) resulting in a performance hit for those instructions although new methods, such as those implemented by Xen, improve this factor. In our implementation, we experimented with VMWare Workstation and Xen and in our experience slowdown was never more than 30% and is often less than 5%. (The Xen slowdown was much less than 30%) Broader base of resources: Our tests show that this first promise is met. Consider the following situation: a scientist can use a testbed on DOE Science Grid across several clusters. A scientist has access to 20 Solaris nodes in LBNL, 20 nodes in ANLs Jazz Cluster (Linux nodes), and 20 Linux nodes on NERSCs pdsf cluster. If only the Jazz nodes have the necessary configuration to run EMBOSS, it would take a lot more work to get EMBOSS to run on the LBNL and pdsf clusters. If we install EMBOSS on a VM, and then run an instance of the VM on each node we can use all 60 nodes instead of just 20. Easier deployment/distribution: Using VMs makes deployment easier and faster. In our tests we experimented with a 2 GB minimal VM image with the following results: EMBOSS installation: 45 minutes VM deployment on our testbed: 6 minutes 23 seconds Peace of mind (not having to debug installation): priceless! Fine Grained resource management: Depending on the implementation of, a VM can provide fine-grained resource usage enforcement critical in many scenarios in the Grid Enhanced security: VMs offer enhanced isolation and are therefore a more secure solution for representing user environments. Migration Complex applications require customized software configurations; such environments may not be widely available on Grid nodes Installing scientific applications by hand can be arduous, lengthy and error-prone; the ability to amortize this process over many installations would help Providing good isolation of Grid computations is a key security requirement; the currently used mechanism of Unix accounts is not sufficient Providing a vehicle for fine-grained resource usage enforcement is critical for more efficient use of Grid resources, yet such technology is not widely available The ability to migrate or restart applications would be of enormous value in a Grid environment; yet the current Grid frameworks do not support it After a scientist has deployed a VM onto the resource, he may run an application in it. For this purpose, each of our VMs was configured with the Globus Toolkit. This picture represents a scientist running the TOPO program, creating an image of a transmembrane protein. A Glossary of Terms: VMM (Virtual Machine Monitor) – a 3 rd -party tool providing the interface between a Virtual Machine and the host machine. Some examples of VMMs are VMWare and Xen. Quality of Life in the Grids: VMs meet Bioinformatics Applications Daniel Galron [1] Tim Freeman [2] Kate Keahey [3] Stephanie Gato [4] Natalia Maltsev [5] Alex Rodriguez [6] Mike Wilde [7] [1] The Ohio State University. [2] Argonne National Laboratory. [3] Argonne National Laboratory. [4] Indiana University. [5] Argonne National Laboratory. [6] Argonne National Laboratory. [7] Argonne National Laboratory. Describing VM Properties A VM constitutes a virtual workspace configured to meet the requirements of Grid computations. We use an XML Schema to describes various aspects of such workspace including virtual hardware (RAM size, disk size, Virtual CD-ROM drives, serial ports, parallel ports), installed software including the operating system (e.g. kernel version, distribution type) as well as library signature, as well as other properties such as image name and VM owner. Based on those descriptions VMs can be selected, duplicated, or further configured. Integrating Virtual Machines with Grid technology allows easy migration of applications from one node to another. The steps are as follows: 1.Using Grid software, the client freezes execution of the VM 2.The client then sends the migrate command to the VMManager, specifying the new host node as a parameter 3.After checking for the proper authorization, the VM is registered with the new host and a GridFTP call transfers the image In terms of performance this is on a par with deployment – it is mainly bound by the length of transfer. In our tests, we migrated a 2GB VM image from two identical nodes through a Fast Ethernet connection. VMManager – Grid service interface to allow a remote client to interact with the VMM VMRepository – Grid service which catalogues VM images of a VO and which stores them for retrieval and deployment Authorization Service – Grid service which the VMManager and VMRepository services call to check if a user is authorized to perform the requested operation Issues or Problems Encountered When developing the architecture we encountered several important but interesting issues and problems we had to resolve. Clocks: While a VM image is frozen or powered-off, the VMs clock does not update. We need a way to update a VMs clock as soon as it is powered-on or unpaused. IP Addresses: We need a way to assign unique IP addresses to each new VM instance (i.e. each time a VM is deployed) so that multiple copies of the same VM can be deployed on the same subnet. Starting a Grid container: We also need a way to automatically start up a Grid container on startup of a VM if we want it to be a full-fledged Grid node, or at least launch a User Hosting Environment. We solved these issues by installing a daemon on the VM: upon deployment, it sets the IP address of the VM, launches a UHE and, if needed, updates the clock. Current limitations to using Grids: The graph to the right shows the proportion of time taken by the constituents of the deployment process, measured in seconds. The authorization time is not included, but it is comparable to registration time. The dominant factor in overall deployment time depends on network latency and bandwidth. The graph to the right shows the proportion of time taken by the constituents of the deployment process, measured in seconds. Note that the graph does not include time for authorization, but those times are comparable to registration time. Also, the actual migration time depends on the network latency and bandwidth. The pause and resume times are dependent on 3 rd party VMM. The promise of VMs Using VMs has many benefits for scientists running complex applications: Broader resource base: a virtual machine can be pre- configured with a required OS, library signature and application installation and then deployed on many different nodes independently of that nodes configuration Simplified deployment/distribution: VMs can be used as distribution packages; to duplicate an installation, just copy a VM image Easy Migration capability: an executing VM image can be frozen, transferred to (another) resource and restarted within milliseconds Fine grained resource management: one can confine resource usage within most VM implementations Enhanced security: VMs provide outstanding isolation protecting the resource from the user and isolating users from each other We implemented the architecture using Globus Toolkit 3.2, an open- source grid middleware toolkit which provides a framework for resource, data, and security management. Instead of running Grid software within VMs, we integrated VM deployment into the Grid infrastructure: mapping a client credential to a Unix account was replaced by deploying a VM and starting the clients environment within it. In a nutshell The low level features of our architecture are detailed in the diagram to the right. The diagram describes for nodes, each running a (potentially different) host OS. Each node is running a VMM and a VMManager Grid Service. On top of that layer, run the actual VMs, which are installed with Grid software, allowing them to be run as Grid nodes. The VMs could also be used as independent execution environments, without Grid middleware installed on them. (Instead they would run applications directly). Legend - VMRepository - VMManager