Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato /CERN
Brief introduction to Virtualization ◦ Taxonomy ◦ Hypervisors Usages of Virtualization The CernVM project ◦ Application Appliance ◦ Specialized file system CernVM as job hosting environment ◦ Clouds, Grids and Volunteer Computing Summary Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 2
Credit for bringing virtualization into computing goes to IBM IBM VM/370 a reimplementation of CP/CMS, and was made available in 1972 ◦ added virtual memory hardware and operating systems to the System/370 series. Even in the 1970s anyone with any sense could see the advantages virtualization offered ◦ It separates applications and OS from the hardware ◦ In spite of that, VM/370 was not a great commercial success The idea of abstracting computer resources continued to develop 3 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
Virtualization of system computer resources such as ◦ Memory virtualization Aggregates RAM resources from networked systems into virtualized memory pool ◦ Network virtualization Creation of a virtualized network addressing space within or across network subnets Using multiple links combined to work as though they offered a single, higher-bandwidth link ◦ Virtual memory Allows uniform, contiguous addressing of physically separate and non-contiguous memory and disk areas ◦ Storage virtualization Abstracting logical storage from physical storage RAID, disk partitioning, logical volume management Storage Networking Memory 4 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
This is what most people today identify with term “virtualization” ◦ Also known as server virtualization ◦ Hides the physical characteristics of computing platform from the users ◦ Host software (hypervisor or VMM) creates a simulated computer environment, a virtual machine, for its guest OS ◦ Enables server consolidation Platform virtualization approaches ◦ Operating system-level virtualization ◦ Partial virtualization ◦ Paravirtualization ◦ Full virtualization ◦ Hardware-assisted virtualization 5 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
Virtual machine simulates enough hardware to allow an unmodified "guest" OS A key challenge for full virtualization is the interception and simulation of privileged operations ◦ The effects of every operation performed within a given virtual machine must be kept within that virtual machine ◦ The instructions that would "pierce the virtual machine" cannot be allowed to execute directly; they must instead be trapped and simulated. Examples ◦ Parallels Workstation, Parallels Desktop for Mac, VirtualBox, Virtual Iron, Oracle VM, Virtual PC, Virtual Server, Hyper-V, VMware Workstation, VMware Server (formerly GSX Server), QEMU 6 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
To create several virtual servers on one physical machine we need a hypervisor or Virtual Machine Monitor (VMM). ◦ The most important role is to arbitrate the access to the underlying hardware, so that guest OSes can share the machine. ◦ VMM manages virtual machines (Guest OS + applications) like an OS manages processes and threads. Most modern operating system work with two modes: ◦ kernel mode allowed to run almost any CPU instructions, including "privileged" instructions that deal with interrupts, memory management… ◦ user mode allows only instructions that are necessary to calculate and process data, applications running in this mode can only make use of the hardware by asking the kernel to do some work (a system call). 7 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
A technique that all (software based) virtualization solutions use is ring deprivileging: ◦ the operating system that runs originally on ring 0 is moved to another less privileged ring like ring 1. ◦ This allows the VMM to control the guest OS access to resources. ◦ It avoids one guest OS kicking another out of memory, or a guest OS controlling the hardware directly. 8 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
Virtualization technique that presents a software interface to virtual machines that is similar but not identical to that of the underlying hardware. ◦ Guest kernel source code modification instead of binary translation ◦ The paravirtualization provides specially defined 'hooks' to allow the guest(s) and host to request and acknowledge these tasks, which would otherwise be executed in the virtual domain (where execution performance is worse) ◦ Paravirtualized platform may allow the virtual machine monitor (VMM) to be simpler (by relocating execution of critical tasks from the virtual domain to the host domain) and faster Paravirtualization requires the guest operating system to be explicitly ported for the para-API ◦ a conventional O/S distribution which is not paravirtualization aware cannot be run on top of a paravirtualized VMM. 9 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
With hardware-assisted virtualization, the VMM can efficiently virtualize the entire x86 instruction set by handling these sensitive instructions using a classic trap-and-emulate model in hardware, as opposed to software ◦ System calls do not automatically result in VMM interventions: as long as system calls do not involve critical instructions, the guest OS can provide kernel services to the user applications. Intel and AMD came with distinct implementations of hardware-assisted x86 virtualization, Intel VT-x and AMD-V, respectively 10 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
Two strategies to reduce total overhead ◦ Total Overhead = Frequency of "VMM to VM" events * Latency of event ◦ Reducing the number of cycles that the VT-x instructions take. VMentry latency was reduced from 634 (Xeon 70xx) to 352 cycles in the (Xeon 51xx, Xeon 53xx, Xeon 73xx) ◦ Reducing frequency of VMM to VM events Virtual Machine Control Block contains the state of the virtual CPU(s) for each guest OS allowing them to run directly without interference from the VMM. 11 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
Software virtualization is very mature, but there is very little headroom left to improve ◦ Second generation hardware virtualization (VT-x+EPT and AMD-V+NPT) is promising ◦ it is not guaranteed that it will improve performance across all applications due to the heavy TLB miss cost The smartest way is to use a hybrid approach like VMware ESX paravirtualized drivers for the most critical I/O components emulation for the less important I/O Binary Translation to avoid the high "trap and emulate" performance penalty hardware virtualization for 64-bit guests 12 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN
Virtual machines can cut time and money out of the software development and testing process Great opportunity to test software in a large variety of ‘platforms’ ◦ Each platform can be realized by a differently configured virtual machines ◦ Easy to duplicate same environment in several virtual machines ◦ Testing installation procedures from well defined ‘state’ ◦ Etc. Example: Execution Infrastructure in ETHICS (spin-off of the EGEE project) ◦ Set of virtual machines that run a variety of platforms attached to an Execution Engine where Build and Test Jobs are executed on behalf of the submitting users Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 14
Installing the “complete software environment” in the Physicist’s desktop/laptop [or Grid] to be able to do data analysis for any of the LHC experiments is complex and manpower intensive ◦ In some cases not even possible if the desktop/laptop OS does not match any of the supported platforms ◦ Application software versions change often ◦ Only a tiny fraction of the installed software is actually used High cost to support large number of compiler- platform combinations The system infrastructure cannot evolve independently from the evolution of the application ◦ The coupling between OS and application is very strong Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 15
Traditional model ◦ Horizontal layers ◦ Independently developed ◦ Maintained by the different groups ◦ Different lifecycle Application is deployed on top of the stack ◦ Breaks if any layer changes ◦ Needs to be certified every time when something changes ◦ Results in deployment and support nightmare Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN OS Application Libraries Tools Databases Hardware 16
Application driven approach ◦ Analyzing application requirements and dependencies ◦ Adding required tools and libraries ◦ Building minimal OS ◦ Bundling all this into Virtual Machine image Virtual Machine images should be versioned just like the applications ◦ Assuring accountability to mitigate possible negative aspects of newly acquired application freedom Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN Virtual Machine OS Libraries Tools Databases Application 17
Emphasis in the ‘Application’ ◦ The application dictates the platform and not the contrary Application (e.g. simulation) is bundled with its libraries, services and bits of OS ◦ Self-contained, self-describing, deployment ready What makes the Application ready to run in any target execution environment? ◦ e.g. Traditional, Grid, Cloud Virtualization is the enabling technology Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 18 Virtual Machine OS Libraries Tools Databases Application
Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 19
Aims to provides a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid independent of physical software and hardware platform (Linux, Windows, MacOS) ◦ Code check-out, edition, compilation, local small test, debugging,… ◦ Grid submission, data access… ◦ Event displays, interactive data analysis, … ◦ Suspend, resume… Decouple application lifecycle from evolution of system infrastructure Reduce effort to install, maintain and keep up to date the experiment software Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 20
R&D Project in CERN Physics Department ◦ Hosted in the SFT Group ( ) ◦ The same group that takes care of ROOT & Geant4, looks for common projects and seeks synergy between experiments CernVM Project started in 01/01/2007, funded for 4 years ◦ Good collaboration with ATLAS, LHCb and starting with CMS Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 21
Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN Installable CD/DVD Stub Image Raw Filesystem Image Netboot Image Compressed Tar File Demo CD/DVD (Live CD/DVD) Raw Hard Disk Image Vmware ® Virtual Appliance Vmware ® ESX Server Virtual Appliance Microsoft ® VHD Virtual Apliance Xen Enterprise Virtual Appliance Virtual Iron Virtual Appliance Parallels Virtual Appliance Amazon Machine Image Update CD/DVD Appliance Installable ISO Starting from experiment software… …ending with a custom Linux specialised for a given task 22
Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN Every build and every file installed on the system is automatically versioned and accounted for in a database 23
Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 24
Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 1. Login to Web interface 2. Create user account 3. Select experiment, appliance flavor and preferences 25
CernVM defines a common platform that can be used by all experiments/projects ◦ Minimal OS elements (Just-enough-OS) ◦ Same CernVM virtual image for ALL experiments It downloads only what is really needed from the experiment software and puts it in the cache ◦ Does not require persistent network connection (offline mode) ◦ Minimal impact on the network Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 26
CernVM comes with the read-only file system (CVMFS) optimized for software distribution ◦ Very little fraction of the experiment software is actually used (~10%) ◦ Very aggressive local caching, web proxy cache (squids) ◦ Transparent file compression ◦ Integrity checks using checksums, signed file catalog ◦ Operational in off-line mode No need to install any experiment software ◦ ‘Virtually’ all versions of all applications are already installed ◦ The user just needs to start using it to trigger the download Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 27
Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN ~1000 different IP addresses ~2000 different IP addresses 28
Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN Proxy Server Proxy Server Proxy Server Proxy Server CernVM HTTP server HTTP server HTTP server HTTP server Proxy Server Proxy Server Proxy and slave servers could be deployed on strategic locations to reduce latency and provide redundancy Working with ATLAS & CMS Frontier teams to reuse already deployed squid proxy infrastructure 29
Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN HTTP server HTTP server HTTP server HTTP server Proxy Server Proxy Server CernVM Content Distribution Network WAN Use Content Delivery Network (such as SimpleCDN) to remove a single point of failure and fully mirror the central distribution to at least one more site. LAN CROWD: P2P like mechanism for discovery of nearby CernVMs and cache sharing between them. No need to manually setup proxy servers (but they could still be used where exist) Proxy Server Proxy Server HTTP server HTTP server Proxy Server Proxy Server HTTP server HTTP server HTTP server HTTP server HTTP server HTTP server MIRORRINGMIRORRING 30
Is the convergence of three major trends ◦ Virtualization - Applications separated from infrastructure ◦ Utility Computing – Capacity shared across the grid ◦ Software as a Service – Applications available on demand Commercial Cloud offerings can be integrated for several types of work such as simulations or compute- bound applications ◦ Pay-as-you-go model ◦ Question remains in their data access capabilities to match our requirements ◦ Good experience from pioneering experiments (e.g. STAR MC production on Amazon EC2) ◦ Ideal to absorb computing peak demands (e.g. before conferences) Science Clouds start to provide compute cycles in the cloud for scientific communities Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 31
CernVM as job hosting environment on Cloud/Grid ◦ Ideally, users would like to run their applications on the grid (or cloud) infrastructure in exactly the same conditions in which they were originally developed CernVM already provides development environment and can be deployed on cloud (EC2) ◦ One image supports all four LHC experiments ◦ Easily extensible to other communities Fermilab, March
Exactly the same environment for development (user desktop/laptop) and large job execution (grid) and final analysis (local cluster) Software can be efficiently installed using CVMFS ◦ HTTP proxy assures very fast access to software even if VM cache is cleared Can accommodate multi-core jobs Deployment on EC2 or alternative clusters ◦ Nimbus, Elastic Fermilab, March
BOINC ◦ Open-source software for volunteer computing and grid computing ( ) ◦ Ongoing development to use VirtualBox running CernVM as a job container ◦ Adds possibility to run unmodified user applications ◦ Better security due to guest OS isolation BOINC PanDA Pilot
Cloud computing (IaaS, Infrastructure as a Service) should enable us to ‘instantiate’ all sort of virtual clusters effortless ◦ PROOF clusters for individuals or for small groups ◦ Dedicated Batch clusters with specialized services ◦ Etc. Turnkey, tightly-coupled cluster ◦ Shared trust/security context ◦ Shared configuration/context information IaaS tools such as Nimbus would allow one-click deployment of virtual clusters ◦ E.g. the OSG STAR cluster: OSG head-node (gridmapfiles, host certificates, NFS, Torque), worker nodes: SL4 + STAR Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 36
Virtualization is a broad term that refers to the abstraction of computer resources ◦ Old technology making a comeback thanks to breakdown in frequency scaling and appearance of multi and many core CPU technology ◦ Enabling vertical software integration ◦ Enabling technology of Cloud computing ◦ Virtualization is here to stay for a foreseeable future CernVM ◦ A way simplify software deployment and jump on the Cloud-wagon ◦ User environment petty well understood, evolving towards a job hosting environment (grid, cloud, volunteering computing) Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 37