Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France P. Mato /CERN.

Slides:



Advertisements
Similar presentations
Virtualization Dr. Michael L. Collard
Advertisements

Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Virtualisation From the Bottom Up From storage to application.
GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.
Virtual Machine Security Design of Secure Operating Systems Summer 2012 Presented By: Musaad Alzahrani.
Introduction to Virtualization
Virtualization and Cloud Computing
Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)
Virtualization and the Cloud
Virtual Machines. Virtualization Virtualization deals with “extending or replacing an existing interface so as to mimic the behavior of another system”
Virtualization for Cloud Computing
LINUX Virtualization Running other code under LINUX.
Virtualization A way To Begin with Virtual Reality… - Rahul Khanwani.
Tanenbaum 8.3 See references
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
CS 149: Operating Systems April 21 Class Meeting
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
Virtualization Concepts Presented by: Mariano Diaz.
Introduction to Cloud Computing
Secure & flexible monitoring of virtual machine University of Mazandran Science & Tecnology By : Esmaill Khanlarpour January.
V IRTUALIZATION Sayed Ahmed B.Sc. Engineering in Computer Science & Engineering M.Sc. In Computer Science.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
Introduction 1-1 Introduction to Virtual Machines From “Virtual Machines” Smith and Nair Chapter 1.
Virtual Machine and its Role in Distributed Systems.
Predrag Buncic (CERN/PH-SFT) CernVM - a virtual software appliance for LHC applications C. Aguado-Sanchez 1), P. Buncic 1), L. Franco 1), A. Harutyunyan.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Cloud Operating System Unit 09 Cloud OS Core Technology M. C. Chiang Department of Computer Science and Engineering National Sun Yat-sen University Kaohsiung,
DPHEP Workshop CERN, December Predrag Buncic (CERN/PH-SFT) CernVM R&D Project Portable Analysis Environments using Virtualization.
WLCG Overview Board, September 3 rd 2010 P. Mato, P.Buncic Use of multi-core and virtualization technologies.
Full and Para Virtualization
Lecture 12 Virtualization Overview 1 Dec. 1, 2015 Prof. Kyu Ho Park “Understanding Full Virtualization, Paravirtualization, and Hardware Assist”, White.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
Operating-System Structures
Predrag Buncic (CERN/PH-SFT) Virtualizing LHC Applications.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
Predrag Buncic (CERN/PH-SFT) Virtualization – the road ahead.
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
NA61 Collaboration Meeting CERN, December Predrag Buncic, Mihajlo Mudrinic CERN/PH-SFT Enabling long term data preservation.
Predrag Buncic (CERN/PH-SFT) CernVM Status. CERN, 24/10/ Virtualization R&D (WP9)  The aim of WP9 is to provide a complete, portable and easy.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
CHEP 2010 Taipei, 19 October Predrag Buncic Jakob Blomer, Carlos Aguado Sanchez, Pere Mato, Artem Harutyunyan CERN/PH-SFT.
2nd ASPERA Workshop May 2011, Barcelona, Spain P. Mato /CERN.
Virtualization for Cloud Computing
Introduction to Virtualization
Virtualization.
Virtual Machine Monitors
Virtualization Dr. Michael L. Collard
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
Virtualization overview
Virtual Servers.
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
Running other code under LINUX
Group 8 Virtualization of the Cloud
OS Virtualization.
Virtualization Layer Virtual Hardware Virtual Networking
Virtualization Techniques
Windows Virtual PC / Hyper-V
Introduction to Virtual Machines
Introduction to Virtual Machines
Virtualization Dr. S. R. Ahmed.
Hypervisor A hypervisor or virtual machine monitor (VMM) is computer software, firmware or hardware that creates and runs virtual machines. A computer.
Presentation transcript:

Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato /CERN

 Brief introduction to Virtualization ◦ Taxonomy ◦ Hypervisors  Usages of Virtualization  The CernVM project ◦ Application Appliance ◦ Specialized file system  CernVM as job hosting environment ◦ Clouds, Grids and Volunteer Computing  Summary Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 2

 Credit for bringing virtualization into computing goes to IBM  IBM VM/370 a reimplementation of CP/CMS, and was made available in 1972 ◦ added virtual memory hardware and operating systems to the System/370 series.  Even in the 1970s anyone with any sense could see the advantages virtualization offered ◦ It separates applications and OS from the hardware ◦ In spite of that, VM/370 was not a great commercial success  The idea of abstracting computer resources continued to develop 3 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 Virtualization of system computer resources such as ◦ Memory virtualization  Aggregates RAM resources from networked systems into virtualized memory pool ◦ Network virtualization  Creation of a virtualized network addressing space within or across network subnets  Using multiple links combined to work as though they offered a single, higher-bandwidth link ◦ Virtual memory  Allows uniform, contiguous addressing of physically separate and non-contiguous memory and disk areas ◦ Storage virtualization  Abstracting logical storage from physical storage  RAID, disk partitioning, logical volume management Storage Networking Memory 4 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 This is what most people today identify with term “virtualization” ◦ Also known as server virtualization ◦ Hides the physical characteristics of computing platform from the users ◦ Host software (hypervisor or VMM) creates a simulated computer environment, a virtual machine, for its guest OS ◦ Enables server consolidation  Platform virtualization approaches ◦ Operating system-level virtualization ◦ Partial virtualization ◦ Paravirtualization ◦ Full virtualization ◦ Hardware-assisted virtualization 5 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 Virtual machine simulates enough hardware to allow an unmodified "guest" OS  A key challenge for full virtualization is the interception and simulation of privileged operations ◦ The effects of every operation performed within a given virtual machine must be kept within that virtual machine ◦ The instructions that would "pierce the virtual machine" cannot be allowed to execute directly; they must instead be trapped and simulated.  Examples ◦ Parallels Workstation, Parallels Desktop for Mac, VirtualBox, Virtual Iron, Oracle VM, Virtual PC, Virtual Server, Hyper-V, VMware Workstation, VMware Server (formerly GSX Server), QEMU 6 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 To create several virtual servers on one physical machine we need a hypervisor or Virtual Machine Monitor (VMM). ◦ The most important role is to arbitrate the access to the underlying hardware, so that guest OSes can share the machine. ◦ VMM manages virtual machines (Guest OS + applications) like an OS manages processes and threads.  Most modern operating system work with two modes: ◦ kernel mode  allowed to run almost any CPU instructions, including "privileged" instructions that deal with interrupts, memory management… ◦ user mode  allows only instructions that are necessary to calculate and process data, applications running in this mode can only make use of the hardware by asking the kernel to do some work (a system call). 7 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 A technique that all (software based) virtualization solutions use is ring deprivileging: ◦ the operating system that runs originally on ring 0 is moved to another less privileged ring like ring 1. ◦ This allows the VMM to control the guest OS access to resources. ◦ It avoids one guest OS kicking another out of memory, or a guest OS controlling the hardware directly. 8 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 Virtualization technique that presents a software interface to virtual machines that is similar but not identical to that of the underlying hardware. ◦ Guest kernel source code modification instead of binary translation ◦ The paravirtualization provides specially defined 'hooks' to allow the guest(s) and host to request and acknowledge these tasks, which would otherwise be executed in the virtual domain (where execution performance is worse) ◦ Paravirtualized platform may allow the virtual machine monitor (VMM) to be simpler (by relocating execution of critical tasks from the virtual domain to the host domain) and faster  Paravirtualization requires the guest operating system to be explicitly ported for the para-API ◦ a conventional O/S distribution which is not paravirtualization aware cannot be run on top of a paravirtualized VMM. 9 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 With hardware-assisted virtualization, the VMM can efficiently virtualize the entire x86 instruction set by handling these sensitive instructions using a classic trap-and-emulate model in hardware, as opposed to software ◦ System calls do not automatically result in VMM interventions: as long as system calls do not involve critical instructions, the guest OS can provide kernel services to the user applications.  Intel and AMD came with distinct implementations of hardware-assisted x86 virtualization, Intel VT-x and AMD-V, respectively 10 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 Two strategies to reduce total overhead ◦ Total Overhead = Frequency of "VMM to VM" events * Latency of event ◦ Reducing the number of cycles that the VT-x instructions take.  VMentry latency was reduced from 634 (Xeon 70xx) to 352 cycles in the (Xeon 51xx, Xeon 53xx, Xeon 73xx) ◦ Reducing frequency of VMM to VM events  Virtual Machine Control Block contains the state of the virtual CPU(s) for each guest OS allowing them to run directly without interference from the VMM. 11 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 Software virtualization is very mature, but there is very little headroom left to improve ◦ Second generation hardware virtualization (VT-x+EPT and AMD-V+NPT) is promising ◦ it is not guaranteed that it will improve performance across all applications due to the heavy TLB miss cost  The smartest way is to use a hybrid approach like VMware ESX  paravirtualized drivers for the most critical I/O components  emulation for the less important I/O  Binary Translation to avoid the high "trap and emulate" performance penalty  hardware virtualization for 64-bit guests 12 Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN

 Virtual machines can cut time and money out of the software development and testing process  Great opportunity to test software in a large variety of ‘platforms’ ◦ Each platform can be realized by a differently configured virtual machines ◦ Easy to duplicate same environment in several virtual machines ◦ Testing installation procedures from well defined ‘state’ ◦ Etc.  Example: Execution Infrastructure in ETHICS (spin-off of the EGEE project) ◦ Set of virtual machines that run a variety of platforms attached to an Execution Engine where Build and Test Jobs are executed on behalf of the submitting users Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 14

 Installing the “complete software environment” in the Physicist’s desktop/laptop [or Grid] to be able to do data analysis for any of the LHC experiments is complex and manpower intensive ◦ In some cases not even possible if the desktop/laptop OS does not match any of the supported platforms ◦ Application software versions change often ◦ Only a tiny fraction of the installed software is actually used  High cost to support large number of compiler- platform combinations  The system infrastructure cannot evolve independently from the evolution of the application ◦ The coupling between OS and application is very strong Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 15

 Traditional model ◦ Horizontal layers ◦ Independently developed ◦ Maintained by the different groups ◦ Different lifecycle  Application is deployed on top of the stack ◦ Breaks if any layer changes ◦ Needs to be certified every time when something changes ◦ Results in deployment and support nightmare Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN OS Application Libraries Tools Databases Hardware 16

 Application driven approach ◦ Analyzing application requirements and dependencies ◦ Adding required tools and libraries ◦ Building minimal OS ◦ Bundling all this into Virtual Machine image  Virtual Machine images should be versioned just like the applications ◦ Assuring accountability to mitigate possible negative aspects of newly acquired application freedom Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN Virtual Machine OS Libraries Tools Databases Application 17

 Emphasis in the ‘Application’ ◦ The application dictates the platform and not the contrary  Application (e.g. simulation) is bundled with its libraries, services and bits of OS ◦ Self-contained, self-describing, deployment ready  What makes the Application ready to run in any target execution environment? ◦ e.g. Traditional, Grid, Cloud  Virtualization is the enabling technology Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 18 Virtual Machine OS Libraries Tools Databases Application

Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 19

 Aims to provides a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid independent of physical software and hardware platform (Linux, Windows, MacOS) ◦ Code check-out, edition, compilation, local small test, debugging,… ◦ Grid submission, data access… ◦ Event displays, interactive data analysis, … ◦ Suspend, resume…  Decouple application lifecycle from evolution of system infrastructure  Reduce effort to install, maintain and keep up to date the experiment software Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 20

 R&D Project in CERN Physics Department ◦ Hosted in the SFT Group ( ) ◦ The same group that takes care of ROOT & Geant4, looks for common projects and seeks synergy between experiments  CernVM Project started in 01/01/2007, funded for 4 years ◦ Good collaboration with ATLAS, LHCb and starting with CMS Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 21

Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN  Installable CD/DVD  Stub Image  Raw Filesystem Image  Netboot Image  Compressed Tar File  Demo CD/DVD (Live CD/DVD)  Raw Hard Disk Image  Vmware ® Virtual Appliance  Vmware ® ESX Server Virtual Appliance  Microsoft ® VHD Virtual Apliance  Xen Enterprise Virtual Appliance  Virtual Iron Virtual Appliance  Parallels Virtual Appliance  Amazon Machine Image  Update CD/DVD  Appliance Installable ISO Starting from experiment software… …ending with a custom Linux specialised for a given task 22

Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN Every build and every file installed on the system is automatically versioned and accounted for in a database 23

Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 24

Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 1. Login to Web interface 2. Create user account 3. Select experiment, appliance flavor and preferences 25

 CernVM defines a common platform that can be used by all experiments/projects ◦ Minimal OS elements (Just-enough-OS) ◦ Same CernVM virtual image for ALL experiments  It downloads only what is really needed from the experiment software and puts it in the cache ◦ Does not require persistent network connection (offline mode) ◦ Minimal impact on the network Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 26

 CernVM comes with the read-only file system (CVMFS) optimized for software distribution ◦ Very little fraction of the experiment software is actually used (~10%) ◦ Very aggressive local caching, web proxy cache (squids) ◦ Transparent file compression ◦ Integrity checks using checksums, signed file catalog ◦ Operational in off-line mode  No need to install any experiment software ◦ ‘Virtually’ all versions of all applications are already installed ◦ The user just needs to start using it to trigger the download Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 27

Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN ~1000 different IP addresses ~2000 different IP addresses 28

Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN Proxy Server Proxy Server Proxy Server Proxy Server CernVM HTTP server HTTP server HTTP server HTTP server Proxy Server Proxy Server Proxy and slave servers could be deployed on strategic locations to reduce latency and provide redundancy Working with ATLAS & CMS Frontier teams to reuse already deployed squid proxy infrastructure 29

Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN HTTP server HTTP server HTTP server HTTP server Proxy Server Proxy Server CernVM Content Distribution Network WAN Use Content Delivery Network (such as SimpleCDN) to remove a single point of failure and fully mirror the central distribution to at least one more site. LAN CROWD: P2P like mechanism for discovery of nearby CernVMs and cache sharing between them. No need to manually setup proxy servers (but they could still be used where exist) Proxy Server Proxy Server HTTP server HTTP server Proxy Server Proxy Server HTTP server HTTP server HTTP server HTTP server HTTP server HTTP server MIRORRINGMIRORRING 30

 Is the convergence of three major trends ◦ Virtualization - Applications separated from infrastructure ◦ Utility Computing – Capacity shared across the grid ◦ Software as a Service – Applications available on demand  Commercial Cloud offerings can be integrated for several types of work such as simulations or compute- bound applications ◦ Pay-as-you-go model ◦ Question remains in their data access capabilities to match our requirements ◦ Good experience from pioneering experiments (e.g. STAR MC production on Amazon EC2) ◦ Ideal to absorb computing peak demands (e.g. before conferences)  Science Clouds start to provide compute cycles in the cloud for scientific communities Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 31

 CernVM as job hosting environment on Cloud/Grid ◦ Ideally, users would like to run their applications on the grid (or cloud) infrastructure in exactly the same conditions in which they were originally developed  CernVM already provides development environment and can be deployed on cloud (EC2) ◦ One image supports all four LHC experiments ◦ Easily extensible to other communities Fermilab, March

 Exactly the same environment for development (user desktop/laptop) and large job execution (grid) and final analysis (local cluster)  Software can be efficiently installed using CVMFS ◦ HTTP proxy assures very fast access to software even if VM cache is cleared  Can accommodate multi-core jobs  Deployment on EC2 or alternative clusters ◦ Nimbus, Elastic Fermilab, March

 BOINC ◦ Open-source software for volunteer computing and grid computing ( ) ◦ Ongoing development to use VirtualBox running CernVM as a job container  ◦ Adds possibility to run unmodified user applications ◦ Better security due to guest OS isolation BOINC PanDA Pilot

 Cloud computing (IaaS, Infrastructure as a Service) should enable us to ‘instantiate’ all sort of virtual clusters effortless ◦ PROOF clusters for individuals or for small groups ◦ Dedicated Batch clusters with specialized services ◦ Etc.  Turnkey, tightly-coupled cluster ◦ Shared trust/security context ◦ Shared configuration/context information  IaaS tools such as Nimbus would allow one-click deployment of virtual clusters ◦ E.g. the OSG STAR cluster: OSG head-node (gridmapfiles, host certificates, NFS, Torque), worker nodes: SL4 + STAR Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 36

 Virtualization is a broad term that refers to the abstraction of computer resources ◦ Old technology making a comeback thanks to breakdown in frequency scaling and appearance of multi and many core CPU technology ◦ Enabling vertical software integration ◦ Enabling technology of Cloud computing ◦ Virtualization is here to stay for a foreseeable future  CernVM ◦ A way simplify software deployment and jump on the Cloud-wagon ◦ User environment petty well understood, evolving towards a job hosting environment (grid, cloud, volunteering computing) Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato/CERN 37