Cellular Disco Resource management using virtual clusters on shared-memory multiprocessors
NUMA - Multiprocessor Systems ● Hardware-based approach – Requires only very small operating system changes – Fault isolation – Lacks resource sharing and flexibility – Example: Unisys' Cellular Multiprocessing architecture ● Software-based approach – Backwards compatibility – Significant development effort – No fault containment – Example: SGI's IRIX, Tornado and K42, Hive
Goals ● Combine advantages of hardware-based approach and software-based approach using a Hypervisor/VMM – Small hypervisor – Hardware fault containment – Ressource management ● Scalability ● Load balancing – Unmodified guest OS
Design Cellular Disco CP U Interconnect Virtual MachineVM ApplicationApp OS Operating System 32-processor SGI Origin
Fault Containment ● Software fault containment – Straight forward due to virtual machines ● Hardware fault containment – Semi-independent cells: ● Duplicated code ● Local data (Pmap, Memmap, Local ready queue) ● Manages all local memory pages – Hardware support needed
Resource Management ➔ Scalability ➔ Overcommitment of resources ➔ Load balancing ● CPU management – VCPU migration – Gang scheduling ● Memory management – Memory borrowing – Avoid double paging overhead
VCPU Migration – Intra-node migration ● 27 μs to update ready queue ● refillt cache – Intra-cell migration ● 520 μ s to copy L2TLB ● Dynamic memory migration and replication to avoid remote cache misses – Inter-call migration ● 1520 μ s to update data structures ● Conflicts with fault containment
CPU Load Balancing VM Cellular Disco CP U Interconnect VM VMVM VMVM
CPU Balancing Policies ● Idle balancer – Intra-cell balancing – Restricted by gang scheduling ● Periodic balancer – Inter-cell balancing – Maintains load tree – Period is tunable A0A0 CP U 0 CP U 1 CP U CP U 3 A1A1 B0B0 B1B1 B1B1 fault containment boundary
Memory Balancing Policy ● Every VM has allocation preference list ● Threshold: local free memory drops below 16MB ● Borrow 4MB of memory from each cell in the list ● Cells loan memory as long as they have more than 32MB available ● Costs about 758 μ s for 4MB ● Paging as a last recourse
MP Virtualization Overhead ● Worst case uniprocessor overhead only 9% +10 % +20 % +1 % +4 %
Performance Results ● CPU utilization: 31% (HW) vs. 58% (VC)
Questions ● Conflicting mechanisms? ● Can we build a scalable system without global heuristics? ● How to fit this approach into a microkernel architecture?