Operating System Support for Virtual Machines

Slides:



Advertisements
Similar presentations
Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Advertisements

User-Mode Linux Ken C.K. Lee
The Kernel Abstraction. Challenge: Protection How do we execute code with restricted privileges? – Either because the code is buggy or if it might be.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
OS Spring’03 Introduction Operating Systems Spring 2003.
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Fall 2014 Presented By: Probir Roy.
Operating System Support for Virtual Machines Samuel King, George Dunlap, Peter Chen Univ of Michigan Ashish Gupta.
Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
CSE598C Virtual Machines and Their Applications Operating System Support for Virtual Machines Coauthored by Samuel T. King, George W. Dunlap and Peter.
CSE 451: Operating Systems Autumn 2013 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Zen and the Art of Virtualization Paul Barham, et al. University of Cambridge, Microsoft Research Cambridge Published by ACM SOSP’03 Presented by Tina.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
CS533 Concepts of Operating Systems Jonathan Walpole.
CS533 Concepts of Operating Systems Jonathan Walpole.
Operating System Support for Virtual Machines Samuel T. King, George W. Dunlap,Peter M.Chen Presented By, Rajesh 1 References [1] Virtual Machines: Supporting.
Virtualization Concepts Presented by: Mariano Diaz.
Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Architecture Support for OS CSCI 444/544 Operating Systems Fall 2008.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
CS 3204 Operating Systems Godmar Back Lecture 15.
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Introduction to virtualization
Operating Systems Security
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Tim Hamilton.
Operating-System Structures
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Virtual Memory – Paging Techniques
Computer System Structures
Introduction to Virtualization
Virtual Machine Monitors
Introduction to Operating Systems
CS 3214 Computer Systems Lecture 9 Godmar Back.
Kernel Design & Implementation
Introduction to Operating Systems
CS 3214 Introduction to Computer Systems
Protection and OS Structure
CS 6560: Operating Systems Design
CS 5204 Operating Systems Linking and Loading Godmar Back.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors
OS Virtualization.
Introduction to Operating Systems
Chapter 9: Virtual-Memory Management
CS 4284 Systems Capstone Linking and Loading Godmar Back.
Operating System Support for Virtual Machines
Computer-System Architecture
Module 2: Computer-System Structures
A Survey on Virtualization Technologies
CSE 451: Operating Systems Spring 2012 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Chapter 33: Virtual Machines
Process Description and Control
Lecture Topics: 11/1 General Operating System Concepts Processes
Architectural Support for OS
Operating Systems Lecture 3.
Computer Security: Art and Science, 2nd Edition
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Architectural Support for OS
Module 2: Computer-System Structures
Xen and the Art of Virtualization
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Chapter 33: Virtual Machines
Presentation transcript:

Operating System Support for Virtual Machines Samuel King, George Dunlap, Peter Chen

Content Introduction VM and VMM Type II VMM UMLinux and UML Bottlenecks & Solutions Performance Conclusion Evaluation

Virtual Machines Developed 1960’s Multiple VM on single machine Test applications Program / debug OS Simulate networks Isolate applications Monitor for intrusions Inject faults Resource sharing/hosting

Virtual Machine Monitors Layer that emulates hardware for an Operating System The simulated hardware is the Virtual Machine

Types of VMMs Type I VMM Type II VMM Efficient Simple VMM Low overhead VMM runs as a process on the HOS Efficient Low overhead Simple VMM VMM: mediates between host OS & guest-machine

Examples Type I Type II Hybrid (physical hardware + Host I/O) IBM VM/370, VMware ESX Server, Xen Type II SimOS, User-Mode Linux, UMLinux Hybrid (physical hardware + Host I/O) VMware Workstation, VirtualPC

Type II compared to Type I Advantages Designers OS abstractions ~ VM OS signals ~ VM interrupts Virtual timer -> timer interrupt Disable interrupts -> disable signals using a flag to defer signals Users Watch and debug the VM execution from the host Disadvantages Performance 10+ x slower

Sample VMM Implementations

Memory Exception Example Existing OS Abstractions and Signals can be used in VM A guest application attempts to access data that it doesn’t have access to An invalid memory operation occurs and SIGSEGV signal is thrown SIGSEGV makes the data available The data is brought in, transparent to the user

OS Abstraction Code OUTPUT int main (int ac, char *av[]) { struct sigaction sa; int rc; char *p = calloc(16384, 4); int *buffer = (int*)((int)(p + PAGE_SIZE - 1) & ~(PAGE_SIZE-1)); rc = mprotect(buffer, PAGE_SIZE, PROT_NONE); if (rc == -1) { perror("mprotect PROT_NONE"); exit(EXIT_FAILURE); } sa.sa_sigaction = SIGSEGV_handler; sigemptyset (&sa.sa_mask); sa.sa_flags = SA_SIGINFO; if (sigaction (SIGSEGV, &sa, NULL) == -1) { printf ("errno set to: %d\n", errno); printf ("Error registering SIGSEGV sigaction.\n"); exit (EXIT_FAILURE); printf("\nACCESSOR: trying to access %p\n", buffer); *(int*)buffer = 42; printf("ACCESSOR: wrote %d\n", *(int*)buffer); if (*buffer = 42) printf("MAIN: read %d: success!\n", *(int*)buffer); return EXIT_SUCCESS; void SIGSEGV_handler (int signo, siginfo_t *info, void *context) { printf ("ACCESSOR: segfault at address: %p\n", info->si_addr); sigset_t mask; sigemptyset(&mask); sigaddset(&mask, SIGSEGV); sigprocmask(SIG_UNBLOCK, &mask, 0); printf("FIXER: now fixing %p...\n", info->si_addr); char *p = (char*)((int)info->si_addr & ~(PAGE_SIZE-1)); int rc = mprotect(p, PAGE_SIZE, PROT_READ | PROT_WRITE); if (rc == -1) { perror("mprotect PROT_READ | PROT_WRITE"); exit(EXIT_FAILURE); } printf("ACCESSOR: trying again...\n"); Shows how signals such as sigsegv can be used to implement a basic interrupt OUTPUT ACCESSOR: trying to access 0x89ab000 ACCESSOR: segfault at address: 0x89ab000 FIXER: now fixing 0x89ab000... ACCESSOR: trying again... ACCESSOR: wrote 42 MAIN: read 42: success!

Second Classification VMM interface identical to hardware IBM VM/370, VMware Server & Workstation VMM added OS modifications Signal handlers SimOS, UML, UMLinux Virtualization drivers Disco, VAX VMM Microkernels & JVM XEN

UMLinux vs User-Mode Linux (UML) Single machine process for all guest app Guest apps communicate via Guest OS Faster system calls, network transfers, web-server UML Separate machine process for each app Guest apps communicate via shared memory on host Faster context switches, kernel building UML used in project 1 More popular Faster in general

UMLinux Performance - guest OS must simulate crossing the top red line. System call to a library – vertical move Switch applications – horizontal move

User-Mode Linux Email server… Notice the separate VM instances and separation of guest applications

Goal Make Type II VMs useable in production Reduce OH of Type II to that of Type I Done through extension of host OS Performance within 2x standalone

Three Switching Bottlenecks High number of context-switching, to move from guest app to guest OS through VMM Ensuring address protection, switching guest user and guest kernel space Numerous memory mapping ops, switching guest applications Three bottlenecks in type II VMMs VM – VMM has to capture a switch and manage it though Host OS and guest OS calls

1. Guest App. to Kernel Switching VMM uses ptrace to catch system calls and signals from the guest-machine process. Creates context switches between the VMM and guest-machine process for the Host OS. High context-switching Ptrace gives full control of the syscall

1. System Call Control Transfer Guest application is transferring control to the GOS 1. guest application issues system call; intercepted by VMM process via ptrace 2. VMM process changes system call to no-op (getpid) 3. getpid returns; intercepted by VMM process 4. VMM process sends SIGUSR1 signal to guest SIGUSR1 handler 5. guest SIGUSR1 handler calls mmap to allow access to guest kernel data; intercepted by VMM process 6. VMM process allows mmap to pass through to make the guest kernel data available (2nd bottleneck) Ideally, steps 6&7 should have been filtered through automatically 7. mmap returns to VMM process 8. VMM process returns to guest SIGUSR1 handler, which handles the guest application’s system call Figure 4: Guest application system call. This picture shows the steps UMLinux takes to transfer control to the guest operating system when a guest application process issues a system call. The mmap call in the SIGUSR1 handler must reside in guest user space. For security, the rest of the SIGUSR1 handler should reside in guest kernel space. The current UMLinux implementation includes an extra section of trampoline code to issue the mmap; this trampoline code is started by manipulating the guest machine process’s context and finishes by causing a breakpoint to the VMM process; the VMM process then transfers control back to the guest-machine process by sending a SIGUSR1.

1. Optimization VMM process functionality >> VMM loadable kernel module Modify Host OS to give VMM control over the guest-machine process’s system calls and signals

1. Optimization Diagram 1. guest application issues system call; intercepted by VMM kernel module no contact switching, two mode switches as red line crossed twice, going and leaving 2. VMM kernel module calls mmap to allow access to guest kernel data 2&3 do the mmap call 3. mmap returns to VMM kernel module 4. VMM kernel module sends SIGUSR1 to guest SIGUSR1 handler Figure 5: Guest application system call with VMM kernel module. This picture shows the steps taken by UMLinux with a VMM kernel module to transfer control to the guest operating system when a guest application issues a system call.

2. Address Protection Guest-machine process switches between guest user and guest kernel mode Has to protect access to kernel addresses when switching to user mode Has to enable access to kernel addresses when switching to kernel mode This creates a large number of mmaps, reprogramming the page table to switch between R/W and inaccessible

x86 Segmentation and Paging

2. Protection using the Current Privilege Level Ring 0 – used for Host Kernel Ring 1 – … VM Ring 2 – … Ring 3 – user level Supervisor-only bit in the page table prevents code running in CPU privilege ring 3 from accessing the host operating system’s data. Linux implements protection using the concept of rings instead of segments

Standalone Address Protection Linux incurs little overhead when trapping to the kernel Segments allow access to all addresses (1 to 1 mapping, logical to local address) Supervisor-bit on each page table restricts Ring-3 processes from accessing kernel code and data

2. Segmentation Bounds for Address Protection Optimization UMLinux calls map, unmap, and mprotect to simulate the switching on the guest os Linux Solution 1 Bound guest user mode to 0x70000000 segment Allow guest kernel access to user range

2. Alternate Optimization Allow guest OS to occupy range from 0x00000000 to 0xc0000000 Separate guest kernel and user modes by using page table’s supervisor only bit Stops guest kernel pages from being run in ring 3 Runs the guest kernel in ring 1

2. Optimization Comparison Linux Solution 1 Solution 2 Guest kernel can now occupy arbitrary regions instead of only a contiguous block

UML tt/skas3/skas0 Modes Guest Process Layout Guest Process Address Space tt skas3 skas0 tt skas3 skas0 UML UML can separate kernel address spaces to keep processes away from guest kernel skas3 requires a kernel patch - to add a new file - (proc/mm) - allows separation of guest kernel and guest process’s address space in the host skas0 does the same thing, but without a kernel patch, requires two pages for the SIGSEGV signal and UML code Host UML kernel code and data Tracing Thread Process code and data

3. Guest Application Switching Switching guest process address space requires swapping the current mapping between virtual pages and the VM’s physical memory file. munmap called for previous process’s virtual address space mmap called for each virtual page in the next process, as needed on page-faults Basically, switching between different physical files.

Costs of Switching ( --- and | ) Process 1 Process 2 user mode kernel mode Kernel Context switching will cause a change in the available memory addresses and in the current privilege level.

Context Switching intr_entry: (saves entire CPU state) (switches to kernel stack) intr_exit: (restore entire CPU state) (switch back to user stack) iret Process 1 Process 2 user mode kernel mode Kernel switch_threads: (in) (saves caller’s state) switch_threads: (out) (restores caller’s state) (kernel stack switch) CS 3204 Fall 2007

Costs of Switching ( --- and | ) Horizontal switching (between applications) Expensive! Invalidate the first process’ mapping (unmap) Validate the second process’ mapping (map) Vertical switching (to and from OS) Saves the CPU state of the application Make the kernel’s address spaces available

Process 1 Active in user mode FFFFFFFF Process 1 Active in user mode P1 C0400000 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack (1) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata (1) ucode (1) access possible in user mode CS 3204 Fall 2007

Process 1 Active in kernel mode FFFFFFFF Process 1 Active in kernel mode P1 C0400000 access requires kernel mode 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack (1) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata (1) ucode (1) access possible in user mode CS 3204 Fall 2007

Process 2 Active in kernel mode FFFFFFFF Process 2 Active in kernel mode P2 C0400000 access requires kernel mode 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack (2) user (1) user (1) user (1) Unmap process 1, map process 2 kernel 3 GB kernel used kernel kernel udata (2) ucode (2) access possible in user mode CS 3204 Fall 2007

Process 2 Active in user mode FFFFFFFF Process 2 Active in user mode P2 C0400000 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack (2) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata (2) ucode (2) access possible in user mode CS 3204 Fall 2007

3. Guest App. Switching Solution Allow a process to have 1024 different address spaces Each address space is defined by a set of page tables Host OS is modified to switch between address space definitions using switchguest switchguest only has to change the pointer to the current first-level page table Solution was basically proposed by UML creator Jeff Dike in “making Linux safe for virtual machines” UML uses a similar approach, a proc/mm patch New mm_struct for each process Entire set of VMA’s swapped with each struct switch This mm_struct is stored directly into the hardware page table ($cr3)

switchguest Example Page Table Ptr Host operating system Guest OS guest proc b switchguest syscall guest proc a switchguest has to change the hardware’s page table pointer to the next guess process’s page table inside the Host OS

Performance Testing Do the three solutions bring the performance of Type II VMM within 2x that of standalone systems? Test benchmarks: Null system calls Switching between guest applications Transferring data CPU intensive program Kernel building Web-server performance

Testing Setups Standalone (Host OS) VMware Workstation (Type I) UMLinux With optimization 1 (kernel module) With optimization 1 & 2 (bounded segment) With optimization 1, 2, & 3 (address spaces)

Null System Call Guest App has to switch to guest kernel and then back First optimization – less calls needed to switch to kernel Second optimization – switching address protections faster Standalone and Type I don’t have to go through the Host OS and VMM, makes it faster Failed, not within 2x

Switching Apps (Context Switch) First optimization – less calls needed to switch to kernel Second optimization – switching address protections faster Third optimization – additional address spaces makes switching apps faster All three optimizations make context switching faster Notice Type II can be faster than Type I

Network Transfer Appears to hit a limit in transferring data across an Ethernet switch using TCP

CPU-Intensive Program (POV-Ray) Mainly compute-bound Little interaction with the guest kernel Little virtualization overhead

Kernel-build Numerous guest kernel calls Each call is trapped by VMM and signaled to guest kernel Second optimization no need to re-map and protect when switching to kernel Kernel compile benchmark: 22 million guest memory exceptions 1.4 million guest system calls

Web Server (SPECweb99) Numerous guest kernel calls Few application switches The overheads for SPECweb99 and kernel-build are higher because they issue more guest kernel calls, each of which must be trapped by the VMM kernel module and reflected back to the guest kernel by sending a signal. (OPT 1)

Results Five successful benchmarks brought the performance within 2x standalone. One failed benchmark (null system call)

Conclusion from Paper Type II (UMLinux) VMM can be optimized to perform similar to Type I (VMware) Type II VMM can perform within 2x standalone systems in production

Recent Work Renamed FAUmachine Development on FAUmachine continued through 2004 in Germany at the Univ. Erlangen-Nurnberg Virtually all research on UMLinux/FAUmachine was conducted by the CoVirt & ReVirt Project at Univ. Michigan (Usage of VMs for security services) CoVirt project now uses various VMs Fallen behind UML in performance and popularity

Evaluation UMLinux with optimizations or UML could be very useful in various commercial and educational situations. UMLinux - slower than standalone, Type I, and other Type II VMMs, it will not become a leading development or run-time platform in practice. Type II VMMs may dwarf Type I VMMs, due to similar performance and easier to design.