Process Migration Checkpoint/Restart ECI, July 2005
Process Migration Process migration benefits: Tool for load balancing Data access locality Improved system administration Mobile computing
Process Migration Issues Execution model: home, remote Migrating virtual memory Minimizing downtime Cost of migration Run time cost (home, remote) Migration operation Limitations of migration
Checkpoint / Restart Checkpoint/restart benefits: Like migration plus … Fault resilience Fault recovery High availability Gang scheduling Debugging, testing, developing Security (honey-pot)
Checkpoint/restart goals Transparency Support parallel programs Multi-process Multi-node Security Minimize required state Minimize required storage
CKPT: Application Level Efficient Non-preemptive Lack of common API Source code changes Possible compiler support Examples ?
CKPT: Library Level Library level Examples… Typically use a signal handler (callback) Common API Restricts functionality (e.g., no IPC) Relatively portable Examples…
CKPT: Library (contd) Libckpt Condor Score, co-check Memory exclusion, incremental, forked Modify source code, link statically Condor Support memory mapping, shared libraries Relink to special library (needs object file) Score, co-check Parallel applications Modify communication layer
Implementation (contd) Kernel level Loadable kernel module vs. change kernel Preemptive / cooperative Access to entire process state Complex, less portable Examples: Sprite, Zap Virtual machines (soon)
Multi-process Checkpoint Global state A set of states from all processes Consistent global state If the state of A reflects a message received from B, then the state of B reflects sending If the state of A reflect a message sent to B but not yet received, it must be part of the channel state
Consistent Global State
Multi-process Checkpoint Uncoordinated checkpoint Inspect data to find recovery line Processes are independent, efficient Domino effect, much storage
Multi-process Checkpoint Coordinated checkpoint Centrally managed Blocking All processes suspended Flush communication channels Non blocking Delay in triggers may yield inconsistency
Multi-process Checkpoint Communication-induced Piggyback process checkpoint status and requests on messages May require enforcing global checkpoint Unpredictable checkpoint times
Multi-process Checkpoint Summary: Uncoordinated Coordinated Communication induced Domino effect Possible No Management overhead None More Less Decision making Local Central Local/central Checkpoint data stored All Latest only Several
Virtual Machines “Any problem in computer science can be solved by another layer of indirection” ECI, July 2005
What is a Virtual Machine ? An indirection layer below the execution environment seen by applications and OS Decouple architecture and user perceived behavior of SW and HW resources from their physical implementation Provide a uniform view of the underlying resources Multiplex multiple virtual systems on a single (physical) resource
VM History 1960’s – Hypervisors (mainframes) 1980-90’s – Obsolete Time-share expensive hardware No change to legacy software 1980-90’s – Obsolete Proliferation of cheap hardware Hardware support neglected Later 1990’s – Reincarnation For complex MPP lacking OS infrastructure 2000 - Today: Renaissance Consolidation, isolation, reliability
VM Benefits Performance Security Server consolidation Efficient HW utilization Adaptive resource balancing Checkpoint/restart and migration Security Simple (reduced complexity) Encapsulation and isolation Mediation
VM benefits (contd) Reliability And… Redundancy through replication Disaster recovery Deployment testing And… Quality of service Transparent (for legacy SW) Enhanced interoperability Development & testing
Server utilization Cumulative usage of 28 servers: Memory CPU Disk 45% of RAM not used 99.9% of time 25% of RAM never used concurrently CPU 85% of CPU not used 99.9% of time 81% of CPU never used concurrently Disk 68% of storage space never used
Virtualization levels HOST entity: encapsulates the guest GUEST entity: managed by the host Application programs Libraries API Operating system ABI ISA Hardware
Process & System VM Application Process virtual machine Hardware OS VMM Application Application OS OS VMM Virtual machine Hardware
VM at different levels HW level OS level Programming language level VMware, Xen, Denali, Virtual PC, UML OS level Virtual Servers, BSD Jail, Zap Programming language level Java, .NET Network VLAN, VPN
VM Taxonomy Process VM - virtual platform that exists solely to support the process Unix Emulators (interpreters) Dynamic binary translators Optimize by block translation and caching Java – “compile once run everywhere” Intermediate machine code Optimize by native compilation on-the-fly
VM Taxonomy (contd) System VM - complete persistent system environment providing access to virtual hardware Classic - bare HW Hosted VM Easy install and maintenance Leverage native services of underlying OS Multiprocessor virtualization
Hardware Virtualization Challenges to build virtual machines Performance isolation Scheduling priority Memory demand Network traffic Disk Access Support for various OS platforms Small performance overhead
Lack of Hardware Support Ring aliasing Non-faulting access to privileged state Does the guest see the right state ? Address space compression Where does the VMM reside ? Impact on transitions Traps, SYSENTER, SYSEXIT Interrupts masking Hidden state
Now What ? Hardware extensions Software virtualization Change semantics to support VM Intel, AMD Software virtualization Translate code to emulate desired behavior VMware Paravirtualization Xen, Denali
Hardware Extensions for VM Root mode Runs VMM Like ring-0 before Non-Root mode Runs guest OS Less privileged Mask of events to trap
VMware Hardware virtualization Design goals: CPU, memory, I/O Suspend/resume Live migration Design goals: Compatibility Performance Simplicity
VMware: CPU Virtualization Execute guest on bare hardware while retaining control by the VMM Traps privileged ops & emulates their action Challenge: lack of HW support POPF and read access to privileged state Solution: fast binary translation Only kernel mode code Eliminate unnecessary traps
VMware: Memory Virtualization Shadow page tables Challenges: Inefficient page replacement Oversized due to replication Solutions: Ballooning Content based sharing
VMware: I/O Virtualization Challenge: wide variety of devices and interfaces Solution: Hosted architecture Trap through the VMM Export special devices
Xen: Paravirtualization Provide some exposure to the underlying hardware Better performance Must modify OS to adapt No modifications to applications
Xen (contd) Downgrade privilege of guest OS Guest registers syscall and page-fault handlers with Xen Partial access to page tables Fast handlers for most exceptions Expose set of simple device abstractions
Xen (contd) The cost of porting an OS to Xen: Privileged instructions Page table access Network driver Block device driver <2% of code-base
Denali Lightweight protection domains Changes: Minimalistic method geared for performance Changes: Idle loops - avoid busy wait Interrupt queueing - save context switch Interrupt semantics – “just”/”recent” No virtual memory (!) No BIOS – no legacy “crap” Generic I/O devices
Virtual Machine Migration Optimizations: Reduce memory state before snapshot ballooning Reduce total cost by incremental updates COW hierarchy Reduce start-up time by paging on-demand Reduce transfer time relying on common data Use hash functions to identify common blocks
Virtual Machine Migration Minimizing down time Reduce size of VM state Pre-copy static parts (or..) Demand-copy static parts Hot-copy dynamic parts
OS Virtualization Confine applications in containers Advantages: Fine granularity Low overhead Easier maintenance Challenges Transparency Correctness Extend OS: Modify kernel, loadable module, library
Isolation – BSD Jail Create an isolated existing environment via software means. Uses chroot (private root per jail) Processes in a jail are isolated from files, processes, or network services in other jails. A jail can be restricted to a single IP address.
Specialized Virtualization – Linux VServer Hosting (consolidation) Experimentation Education (do you trust students … ?) Personal security box Manage several "versions“ Applications Virtual servers Per user firewall Fail over servers Honey-pots
Specialized Virtualization – Linux VServer Isolation Processes, file system, IPC, network, super user capabilities Kernel patch Add a “context” tag per process/resource syscalls to handle contexts (irreversible) Challenges Capture all holes (indirect access !) Efficient storage
General Virtualization – Zap Virtualization for isolation POD – PrOcess Domain Private namespace Virtualization for migration Decouple process from OS Capture state and reconstruct state
Zap – virtualization Process environment File system Network Interpose on system calls File system Rely on “chroot” environment Network Per protocol methods Challenges Race conditions (smp) Life-span of objects Fast translation
Zap – Migration Checkpoint – outside process context Capture process tree Capture pod state Capture per-process state Restart – inside process context Restore process tree Restore processes Example issues Sharing Deleted files