Xen 3.0 and the Art of Virtualization

Slides:



Advertisements
Similar presentations
Live migration of Virtual Machines Nour Stefan, SCPD.
Advertisements

Content Overview Virtual Disk Port to Intel platform
Virtualization Dr. Michael L. Collard
Virtualization Technology
Status Report Ian Pratt University of Cambridge and Founder of XenSource Inc. Computer Laboratory.
Xen and the Art of Virtualization Ian Pratt University of Cambridge and Founder of XenSource Inc. Computer Laboratory.
Xen and the Art of Virtualization Ian Pratt University of Cambridge and Founder of XenSource Inc. Computer Laboratory.
Xen 3.0 and the Art of Virtualization Ian Pratt XenSource Inc. and University of Cambridge Keir Fraser, Steve Hand, Christian Limpach and many others…
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Virtualisation From the Bottom Up From storage to application.
XEN AND THE ART OF VIRTUALIZATION Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, lan Pratt, Andrew Warfield.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
Embedded System Lab. Yoon Jun Kee Xen and the Art of Virtualization.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE USC CSci599 Trusted Computing Lecture Four –
G Robert Grimm New York University Disco.
Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)
Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield.
Virtualizzazione: Xen. Tipi di virtualizzazione Singola immagine di SO (Virtuozo,…) –Usa container di risorse –Poco isolamento Virtualizzazione piena:VirtualBox,
Virtualization for Cloud Computing
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Tanenbaum 8.3 See references
Zen and the Art of Virtualization Paul Barham, et al. University of Cambridge, Microsoft Research Cambridge Published by ACM SOSP’03 Presented by Tina.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
SAIGONTECH COPPERATIVE EDUCATION NETWORKING Spring 2010 Seminar #1 VIRTUALIZATION EVERYWHERE.
Virtualization The XEN Approach. Virtualization 2 CS5204 – Operating Systems XEN: paravirtualization References and Sources Paul Barham, et.al., “Xen.
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Xen Overview for Campus Grids Andrew Warfield University of Cambridge Computer Laboratory.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
The Best of Both Worlds with On-Demand Virtualization Thawan Kooburat and Michael M. Swift On-Demand Virtualization allows systems to benefit from virtualization.
Virtual Machine Monitors: Technology and Trends Jonathan Kaldor CS614 / F07.
Xen and The Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt & Andrew Warfield.
Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.
CS533 Concepts of Operating Systems Jonathan Walpole.
Nathanael Thompson and John Kelm
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Outline for Today Announcements –1 st programming assignment coming soon. Objective of the lecture –OS and Virtual Machines.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Introduction to virtualization
Full and Para Virtualization
Lecture 12 Virtualization Overview 1 Dec. 1, 2015 Prof. Kyu Ho Park “Understanding Full Virtualization, Paravirtualization, and Hardware Assist”, White.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Virtual Machines (part 2) CPS210 Spring Papers  Xen and the Art of Virtualization  Paul Barham  ReVirt: Enabling Intrusion Analysis through Virtual.
Xen 3.0 and the Art of Virtualization Ian Pratt Keir Fraser, Steven Hand, Christian Limpach, Andrew Warfield, Dan Magenheimer (HP), Jun Nakajima (Intel),
Open Source Virtualization Andrey Meganov RHCA, RHCX Consultant / VDEL
Open Source Virtualisation and Consolidation. Whoami ● Senior Linux and Open Source Consultant/ X-Tend ● „Infrastructure Architect“ ● Linux since.
Open Source Virtualisation and Consolidation. Whoami ● Linux and Open Source Consultant ● „Infrastructure Architect“ ● Linux since 0.98 ● IANAKH ● Senior.
XEN – The Art of Virtualisation. So what is Virtualisation? ● Makes use of spare capacity ● Run multiple instances of OSes simultaneously ● Multitasking.
Virtualization-optimized architectures
Virtualization for Cloud Computing
Virtualization.
Virtual Machine Monitors
Virtualization Technology
Xen and the Art of Virtualization
Presented by Yoon-Soo Lee
Virtualization Dr. Michael L. Collard
Virtualization overview
Xen: The Art of Virtualization
Group 8 Virtualization of the Cloud
OS Virtualization.
Xen 3.0 and the Art of Virtualization
Xen and the Art of Virtualization
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Xen and the Art of Virtualization
Presentation transcript:

Xen 3.0 and the Art of Virtualization Ian Pratt Keir Fraser, Steven Hand, Christian Limpach, Andrew Warfield, Dan Magenheimer (HP), Jun Nakajima (Intel), Asit Mallick (Intel) Gerd Knor of Suse, Rusty Russell, Jon Mason and Ryan Harper of IBM Computer Laboratory

Outline Virtualization Overview Xen Architecture New Features in Xen 3.0 VM Relocation Xen Roadmap

Virtualization Overview Single OS image: Virtuozo, Vservers, Zones Group user processes into resource containers Hard to get strong isolation Full virtualization: VMware, VirtualPC, QEMU Run multiple unmodified guest OSes Hard to efficiently virtualize x86 Para-virtualization: UML, Xen Run multiple guest OSes ported to special arch Arch Xen/x86 is very close to normal x86 POPF doesn’t trap if executed with insufficient privilege

Virtualization in the Enterprise Consolidate under-utilized servers to reduce CapEx and OpEx X Avoid downtime with VM Relocation Dynamically re-balance workload to guarantee application SLAs X X Enforce security policy

Xen Today : Xen 2.0.6 Secure isolation between VMs Resource control and QoS Only guest kernel needs to be ported User-level apps and libraries run unmodified Linux 2.4/2.6, NetBSD, FreeBSD, Plan9, Solaris Execution performance close to native Broad x86 hardware support Live Relocation of VMs between Xen nodes Commercial / production use Released under the GPL; Guest OSes can be any licensce

Para-Virtualization in Xen Xen extensions to x86 arch Like x86, but Xen invoked for privileged ops Avoids binary rewriting Minimize number of privilege transitions into Xen Modifications relatively simple and self-contained Modify kernel to understand virtualised env. Wall-clock time vs. virtual processor time Desire both types of alarm timer Expose real resource availability Enables OS to optimise its own behaviour X86 hard to virtualize so benefits from more modifications than is necessary for IA64/PPC popf Avoiding fault trapping : cli/sti can be handled at guest level

Xen 2.0 Architecture Xen Virtual Machine Monitor GuestOS Manager & Event Channel Virtual MMU Virtual CPU Control IF Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) Native Device Driver GuestOS (XenLinux) Manager & Control s/w VM0 Unmodified User Software VM1 Front-End Device Drivers VM2 (XenBSD) VM3 Safe HW IF Xen Virtual Machine Monitor Back-End

Xen 3.0 Architecture AGP ACPI PCI SMP VT-x x86_32 x86_64 IA64 VM0 VM1 VM2 VM3 Device Manager & Control s/w Unmodified User Software Unmodified User Software Unmodified User Software GuestOS (XenLinux) GuestOS (XenLinux) GuestOS (XenLinux) Unmodified GuestOS (WinXP)) AGP ACPI PCI Back-End Back-End SMP Native Device Driver Native Device Driver Front-End Device Drivers Front-End Device Drivers VT-x x86_32 x86_64 IA64 Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU Xen Virtual Machine Monitor Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE)

x86_32 Xen reserves top of VA space Segmentation protects Xen from kernel System call speed unchanged Xen 3 now supports PAE for >4GB mem 4GB Xen S Kernel S 3GB ring 0 ring 1 User U ring 3 0GB

x86_64 Large VA space makes life a lot easier, but: 264 Large VA space makes life a lot easier, but: No segment limit support Need to use page-level protection to protect hypervisor Kernel U Xen S 264-247 Reserved 247 User U

x86_64 Run user-space and kernel in ring 3 using different pagetables Two PGD’s (PML4’s): one with user entries; one with user plus kernel entries System calls require an additional syscall/ret via Xen Per-CPU trampoline to avoid needing GS in Xen r3 User U r3 Kernel U syscall/sysret r0 Xen S

Para-Virtualizing the MMU Guest OSes allocate and manage own PTs Hypercall to change PT base Xen must validate PT updates before use Allows incremental updates, avoids revalidation Validation rules applied to each PTE: 1. Guest may only map pages it owns* 2. Pagetable pages may only be mapped RO Xen traps PTE updates and emulates, or ‘unhooks’ PTE page for bulk updates Important for fork

Writeable Page Tables : 1 – Write fault guest reads Virtual → Machine first guest write Guest OS page fault RaW hazards Xen VMM Hardware MMU

Writeable Page Tables : 2 – Emulate? guest reads Virtual → Machine first guest write Guest OS yes emulate? Use update_va_mapping in preference Xen VMM Hardware MMU

Writeable Page Tables : 3 - Unhook guest reads X Virtual → Machine guest writes Guest OS typically 2 pages writeable : one in other page table Xen VMM Hardware MMU

Writeable Page Tables : 4 - First Use guest reads X Virtual → Machine guest writes Guest OS page fault RaW hazards Xen VMM Hardware MMU

Writeable Page Tables : 5 – Re-hook guest reads Virtual → Machine guest writes Guest OS validate RaW hazards Xen VMM Hardware MMU

MMU Micro-Benchmarks 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Most benchmark results identical. In addition to these, some slight slowdown 0.1 0.0 L X V U L X V U Page fault (µs) Process fork (µs) lmbench results on Linux (L), Xen (X), VMWare Workstation (V), and UML (U)

SMP Guest Kernels Xen extended to support multiple VCPUs Virtual IPI’s sent via Xen event channels Currently up to 32 VCPUs supported Simple hotplug/unplug of VCPUs From within VM or via control tools Optimize one active VCPU case by binary patching spinlocks Xen has always supported SMP hosts Important to make virtualization ubiquitous, enterprise

SMP Guest Kernels Takes great care to get good SMP performance while remaining secure Requires extra TLB syncronization IPIs Paravirtualized approach enables several important benefits Avoids many virtual IPIs Allows ‘bad preemption’ avoidance Auto hot plug/unplug of CPUs SMP scheduling is a tricky problem Strict gang scheduling leads to wasted cycles Xen has always supported SMP hosts Important to make virtualization ubiquitous, enterprise

I/O Architecture Xen IO-Spaces delegate guest OSes protected access to specified h/w devices Virtual PCI configuration space Virtual interrupts (Need IOMMU for full DMA protection) Devices are virtualised and exported to other VMs via Device Channels Safe asynchronous shared memory transport ‘Backend’ drivers export to ‘frontend’ drivers Net: use normal bridging, routing, iptables Block: export any blk dev e.g. sda4,loop0,vg3 (Infiniband / Smart NICs for direct guest IO)

VT-x / (Pacifica) Enable Guest OSes to be run without para-virtualization modifications E.g. legacy Linux, Windows XP/2003 CPU provides traps for certain privileged instrs Shadow page tables used to provide MMU virtualization Xen provides simple platform emulation BIOS, Ethernet (ne2k), IDE emulation (Install paravirtualized drivers after booting for high-performance IO) Pacifica

Control Panel (xm/xend) Front end Virtual Drivers Domain 0 Domain N Guest VM (VMX) (32-bit) Guest VM (VMX) (64-bit) Linux xen64 Control Panel (xm/xend) Device Models Unmodified OS Unmodified OS 3P 3D Linux xen64 FE Virtual Drivers FE Virtual Drivers Front end Virtual Drivers Virtual driver Backend Guest BIOS Guest BIOS 0D Native Device Drivers Native Device Drivers 1/3P Virtual Platform Virtual Platform Callback / Hypercall VMExit VMExit Event channel 0P I/O: PIT, APIC, PIC, IOAPIC Processor Memory Control Interface Hypercalls Event Channel Scheduler Xen Hypervisor

MMU Virtualizion : Shadow-Mode guest reads Virtual → Pseudo-physical guest writes Guest OS Accessed & Updates dirty bits Virtual → Machine At least double the number of faults Memory requirements Hiding which machine pages are in use: may interfere with page colouring, large TLB entries etc. VMM Hardware MMU

VM Relocation : Motivation VM relocation enables: High-availability Machine maintenance Load balancing Statistical multiplexing gain Xen Xen

Assumptions Networked storage Good connectivity NAS: NFS, CIFS SAN: Fibre Channel iSCSI, network block dev drdb network RAID Good connectivity common L2 network L3 re-routeing Xen Xen Storage

Challenges VMs have lots of state in memory Some VMs have soft real-time requirements E.g. web servers, databases, game servers May be members of a cluster quorum Minimize down-time Performing relocation requires resources Bound and control resources used

Relocation Strategy VM active on host A Destination host selected (Block devices mirrored) Stage 0: pre-migration Initialize container on target host Stage 1: reservation Copy dirty pages in successive rounds Stage 2: iterative pre-copy Suspend VM on host A Redirect network traffic Synch remaining state Stage 3: stop-and-copy Activate on host B VM state on host A released Stage 4: commitment

Pre-Copy Migration: Round 1

Pre-Copy Migration: Round 1

Pre-Copy Migration: Round 1

Pre-Copy Migration: Round 1

Pre-Copy Migration: Round 1

Pre-Copy Migration: Round 2

Pre-Copy Migration: Round 2

Pre-Copy Migration: Round 2

Pre-Copy Migration: Round 2

Pre-Copy Migration: Round 2

Pre-Copy Migration: Final

Writable Working Set Pages that are dirtied must be re-sent Super hot pages e.g. process stacks; top of page free list Buffer cache Network receive / disk buffers Dirtying rate determines VM down-time Shorter iterations → less dirtying → …

Writable Working Set Set of pages written to by OS/application Pages that are dirtied must be re-sent Hot pages E.g. process stacks Top of free page list (works like a stack) Buffer cache Network receive / disk buffers

Page Dirtying Rate Dirtying rate determines VM down-time time into iteration #dirty Dirtying rate determines VM down-time Shorter iters → less dirtying → shorter iters Stop and copy final pages Application ‘phase changes’ create spikes

Rate Limited Relocation Dynamically adjust resources committed to performing page transfer Dirty logging costs VM ~2-3% CPU and network usage closely linked E.g. first copy iteration at 100Mb/s, then increase based on observed dirtying rate Minimize impact of relocation on server while minimizing down-time

Web Server Relocation

Iterative Progress: SPECWeb

Iterative Progress: Quake3

Quake 3 Server relocation

Extensions Cluster load balancing Evacuating nodes for maintenance Pre-migration analysis phase Optimization over coarse timescales Evacuating nodes for maintenance Move easy to migrate VMs first Storage-system support for VM clusters Decentralized, data replication, copy-on-write Wide-area relocation IPSec tunnels and CoW network mirroring

Current 3.0 Status Domain 0 Domain U SMP Guests Save/Restore/Migrate   x86_32 x86_32p x86_64 IA64 Power Domain 0 Domain U SMP Guests new! Save/Restore/Migrate ~tools >4GB memory 16GB 4TB ? VT 64-on-64 Driver Domains

3.1 Roadmap Improved full-virtualization support Pacifica / VT-x abstraction Enhanced control tools project Performance tuning and optimization Less reliance on manual configuration Infiniband / Smart NIC support (NUMA, Virtual framebuffer, etc) Framebuffer console NUMA

IO Virtualization IO virtualization in s/w incurs overhead Latency vs. overhead tradeoff More of an issue for network than storage Can burn 10-30% more CPU Solution is well understood Direct h/w access from VMs Multiplexing and protection implemented in h/w Smart NICs / HCAs Infiniband, Level-5, Aaorhi etc Will become commodity before too long

Research Roadmap Whole-system debugging Lightweight checkpointing and replay Cluster/dsitributed system debugging Software implemented h/w fault tolerance Exploit deterministic replay VM forking Lightweight service replication, isolation Secure virtualization Multi-level secure Xen

Conclusions Xen is a complete and robust GPL VMM Outstanding performance and scalability Excellent resource control and protection Vibrant development community Strong vendor support http://xen.sf.net

Thanks! The Xen project is hiring, both in Cambridge UK, Palo Alto and New York ian@xensource.com Computer Laboratory

Backup slides

Isolated Driver VMs Run device drivers in separate domains Detect failure e.g. Illegal access Timeout Kill domain, restart E.g. 275ms outage from failed Ethernet driver 350 300 250 200 150 100 Net-restart 50 5 10 15 20 25 30 35 40 time (s)

Device Channel Interface newring

Scalability Scalability principally limited by Application resource requirements several 10’s of VMs on server-class machines Balloon driver used to control domain memory usage by returning pages to Xen Normal OS paging mechanisms can deflate quiescent domains to <4MB Xen per-guest memory usage <32KB Additional multiplexing overhead negligible We have run 128 guests Tickerless patch may help for large numbers of guests

System Performance 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U L X V U L X V U L X V U SPEC INT2000 (score) Linux build time (s) OSDB-OLTP (tup/s) SPEC WEB99 (score) Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U)

TCP results 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 disk is easy new IO isn’t as fast as not all ring 0 ; win big time 0.1 0.0 L X V U L X V U L X V U L X V U Tx, MTU 1500 (Mbps) Rx, MTU 1500 (Mbps) Tx, MTU 500 (Mbps) Rx, MTU 500 (Mbps) TCP bandwidth on Linux (L), Xen (X), VMWare Workstation (V), and UML (U)

Scalability Simultaneous SPEC WEB99 Instances on Linux (L) and Xen(X) 1000 800 600 400 200 L X L X L X L X 2 4 8 16 Simultaneous SPEC WEB99 Instances on Linux (L) and Xen(X)

Resource Differentation 2.0 1.5 Aggregate throughput relative to one instance 1.0 8(diff) : using QoS controls to split resources in 1:2:3:4:5:6:7:8 ratios. OSDB-OLTP suffers due to current poor disk scheduling – should be fixed in future Xen release 0.5 0.0 2 4 8 8(diff) 2 4 8 8(diff) OSDB-IR OSDB-OLTP Simultaneous OSDB-IR and OSDB-OLTP Instances on Xen