Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt & Andrew Warfield Presented by Anthony So November, 13 2013 CS533 - Concepts of Operating Systems Fall 2013
Presentation Overview Introduction Xen approach Overview Implementation Evaluation Summary CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Introduction CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Monolithic kernel User Apps Non-Privilege File System Virtual Memory IPC Scheduler Device Driver Privilege Hardware (CPU, Physical Memory, Storage, I/O, … etc) CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Virtualization VM/Domain File System Virtual Memory IPC Scheduler Device Driver User Apps Non-Privilege VMM (Virtual CPU, Virtual Physical Memory, Virtual Network, Virtual Block Device … etc) Privilege Hardware (CPU, Physical Memory, Storage, I/O, … etc) CS533 - Concepts of Operating Systems Fall 2013
Non, Full, and Para-Virtualization User Apps User Apps User Apps Non-Privilege OS Modified OS OS VMM VMM Privilege Hardware Hardware Hardware CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Xen - Overview CS533 - Concepts of Operating Systems Fall 2013
Xen Architecture Overview CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Control Transfer Synchronous calls from a domain to Xen may be made using a hypercall Notification are delivered to domains from Xen using an asynchronous event mechanism Domain Synchronous Hypercall Asynchronous Event VMM CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Xen – Implementation CS533 - Concepts of Operating Systems Fall 2013
CPU – Privilege Instruction How x86 architecture handles privileged instructions? Non Full Para User Apps User Apps User Apps Non-Privilege OS Modified OS OS VMM VMM Privilege Hardware Hardware Hardware CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Memory Management Tagged TLB vs No Tagged TLB Tagged TLB is ideal for virtualization because each TLB entry associated with an address-space identifier to allows hypervisor and guest OS entries to coexist even with context switch, thus, avoid complete TLB flush. x86 – No Tagged TLB and must flush after a context switch. Xen exists in a 64MB section a the top of every address space, thus avoiding a TLB flush when entering and leaving the hypervisor. CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Memory Management S/W managed vs H/W managed TLB x86 uses H/W managed TLB. Therefore, TLB management and handling TLB faults are done entirely by the MMU hardware. S/W managed TLB is ideal for virtualization because TLB misses are serviced by the OS. CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Memory Management Xen register guest OS page tables directly with the MMU but restricted guest OS to read-only access. Page Table updates are passed to Xen via hypercall. Request are validated before being applied. Type: writable, page table … etc. Reference count: Must be 0 to switch task type. To minimize hypercall, guest OS locally queue updates before applying an entire batch with a single hypercall. CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Memory Management Shadow Page Table. VMM lookup the real address in memory and update the table VMM lookup the real address in memory and update the table Shadow Page Table Machine To Virtual Guest Page Table Physical To Virtual Pmap Machine To Physical Guest OS wants to update page table Guest OS wants to update page table CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Memory Management Xen Page Table Machine To Virtual Guest OS has direct read access to page table Read Guest OS want to update page table Hypercall VMM do the update on behave of guest OS Write CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Memory Management [3] Balloon Driver is a mechanism to adjust a domain’s memory usage. CS533 - Concepts of Operating Systems Fall 2013
Exception / System Calls / Interrupt Exception: A table describing the handler for each type of exception is registered with Xen for validation. The handler are identical to real x86 hardware (except page faults). System Calls: Xen allows each guest OS to register & install a fast handler to enable direct calls from user apps into its guest OS and avoid routing through Xen on every calls. Interrupt: Hardware interrupts are replaced with a lightweight event system. CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Time and Timers Xen provides guest OS the following notion of time: Real Time: Time that is maintained continuously since machine boot. Virtual Time: Time that a particular domain has executed. It will not advance if the domain is not executing. Wall-Clock Time: Current Real Time + an offset. CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems I/O Ring An asynchronous I/O rings is used for data transfer between Xen and guest OS. (Circular queue) Guest OS Xen CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Network Xen provides the following abstraction: Virtual firewall-router (VFR) Virtual network interfaces (VIF) – Like a modem network interface card Two I/O rings: transmit and receive. Round-Robin packet scheduler. Page flipping: require guest OS to exchange an unused page frame for each packet it receives to avoid copying between Xen and the guest OS (but require page-alignment). CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Disk Domain0 has unchecked access to physical disks. All other domains access persistent storage through Virtual block device (VBD). Domain0 manages VBDs. Ownership and access control information are accessed via the I/O ring. Round-round scheduler. Batching of requests for better access performance. CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Xen - Evaluation CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Hardware Dell 2650 dual processor 2.4GHz Xeon server 2GB RAM Broadcom Tigon 3 Gigabit Ethernet NIC Hitachi DK32EJ 146GB 10k RPM SCSI disk Linux version 2.4.21 RedHat 7.2 CS533 - Concepts of Operating Systems Fall 2013
Virtualization Comparison Native Linux Compiled for i686 XenoLinux Compiled for Xeno-i686 for Xen VMware Workstation User-mode Linux (UML) Compiled for um for UML CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Relative Performance Computation Intensive: Processor & memory w/ minimal I/O or O/S Database: Sync. Disk operation Web server: File Server: Compiling kernel: I/O, scheduler, memory management CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Concurrent Higher overhead from single domain is due to lack of support to SMP guest OS CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Conclusion Xen is a paravirtualization Xen exposes an hypercall interface to Guest OS. Guest OS use it to communicate with Xen to do privileged instructions. As a result, Xen can not use unmodified guest OS. Performance is comparable to native Linux. CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems Learn More The Xen Project at www.xenproject.org CS533 - Concepts of Operating Systems Fall 2013
Type-1 vs Type-2 Hypervisor Guest OS Apps Guest OS Apps Guest OS Apps Guest OS Apps Guest OS Guest OS Guest OS Guest OS Type-2 Hypervisor Host OS Apps Type-1 Hypervisor Host OS Hardware CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems References Edouard Bugnion , Scott Devine , Mendel Rosenblum, Disco: running commodity operating systems on scalable multiprocessors, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.143-156, October 05-08, 1997, Saint Malo, France Carl A. Waldspurger, Introduction to Virtual Machines, VMware Labs. 2010. Carl A. Waldspurger, Memory resource management in VMware ESX server, Proceedings of the 5th symposium on Operating systems design and implementation, December 09-11, 2002, Boston, Massachusetts Kenneth J. Duda , David R. Cheriton, Borrowed-virtual-time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler, Proceedings of the seventeenth ACM symposium on Operating systems principles, p.261-276, December 12-15, 1999, Charleston, South Carolina, United States Fiuczynski , M. E. (2009). Virtual Machine Monitor. Retrieved from Princeton University website: http://www.cs.princeton.edu/courses/archive/fall09/cos318/lectures/VirtualMachine.pdf Gelas, J. D. (2008, March 17). AnandTech | Hardware Virtualization: the Nuts and Bolts. Retrieved November 11, 2013, from http://www.anandtech.com/show/2480 Kurth, L. (2013). 10 Years of Xen and beyond. Linux Foundation Collaborative Projects. (n.d.). Retrieved November 11, 2013 from http://www.xenproject.org/users/virtualization.html CS533 - Concepts of Operating Systems Fall 2013
CS533 - Concepts of Operating Systems References Memory management unit - Wikipedia, the free encyclopedia. (n.d.). Retrieved November 11, 2013, from http://en.wikipedia.org/wiki/Memory_management_unit Microkernel - Wikipedia, the free encyclopedia. (n.d.). Retrieved November 11, 2013, from http://en.wikipedia.org/wiki/Microkernel Paul Barham , Boris Dragovic , Keir Fraser , Steven Hand , Tim Harris , Alex Ho , Rolf Neugebauer , Ian Pratt , Andrew Warfield, Xen and the art of virtualization, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA Smith, J.E.; Nair, R., "The architecture of virtual machines," Computer , vol.38, no.5, pp.32,38, May 2005 Tanenbaum, A. (2008). Modern operating systems. Upper Saddle River, N.J: Pearson Prentice Hall. Timeline of virtualization development - Wikipedia, the free encyclopedia. (n.d.). Retrieved November 11, 2013, from http://en.wikipedia.org/wiki/Timeline_of_virtualization_development Understanding Full Virtualization, Paravirtualization, and Hardware Assist. (2007). Retrieved from VMware website: http://www.vmware.com/files/pdf/VMware_paravirtualization.pdf x86 virtualization - Wikipedia, the free encyclopedia. (n.d.). Retrieved November 11, 2013, from http://en.wikipedia.org/wiki/X86_virtualization CS533 - Concepts of Operating Systems Fall 2013