Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei Kuo Paper by: Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum
Introduction Pierre LaBorde
Introduction CC-NUMA o Cache-Coherent Non-Uniform Memory Access Coupling with standard distributed protocols o TCP/IP o NFS Global Buffer Cache
Introduction Hide NUMA-ness o Page placement o Dynamic page migration o Dynamic page replication
Problem Operating systems for innovative hardware o Scalable shared memory multiprocessors Significant changes required o OS typically have millions of lines of code
Solution
Virtual Machine Monitors Instead of modifying existing OS o Additional layer of software between hardware and OS o Multiple copies of existing operating systems Support a variety of workloads o Virtualizes all of the resources Exports conventional hardware interface o Schedules virtual resources on the physical Processor Memory
Virtual Machine Monitor Monitor and distributed protocols need to scale o Simplicity of the monitor o Fault-containment o NUMA memory management issues Global policies o Fine-grained resource sharing
Challenges Overheads o Privileged instructions o I/O Devices Resource Management o Instruction execution stream Idle loop Lock busy-waiting Communication and Sharing o Virtual disk
Disco: A Virtual Machine Monitor Jordan Deveroux
Disco's Interface Processors o Abstraction of MIPS R10000 processor o Does not support complete virtualization of kernel virtual address space o Extends architecture to support efficient access to some processor functions Physical Memory o Abstraction of main memory that resides in contiguous physical address space o Uses dynamic page migration and replication to export nearly uniform memory architecture to the software I/O Devices o Each virtual machine has specified set of I/O devices o Intercepts communication from all of it's I/O devices for translation or emulation o Virtualizes access to the networking devices of the underlying system
Implementing Disco Multithreaded, shared memory program Disco vs. Other Systems o NUMA memory placement o cache-aware data structures o interprocessor communication patterns NUMA memory management o Copy DISCO into all memories of FLASH machine Cache-aware data structures o Partitioned so that parts accessed only by a certain processor are in memory near that processor Interprocessor communication patterns o Very few locks o Wait-free synchronization
Implementing Disco: Virtual CPU's Emulates virtual CPU's by using direct execution of real CPU's Same execution speed as running on real CPU's Each virtual CPU has a data structure like a process table entry in traditional O.S. o Contains state of virtual CPU Runs in kernel mode with full access Simple scheduler allows virtual processors to be shared
Implementing Disco: Virtual Physical Memory Add a level of address translation and maintains physical-to-machine address mappings Translation performed using translation-lookaside buffer Memory references are translated through this mapping from now on Each TLB entry is marked with an address space identifier to avoiding the flushing the TLB on context switches Each miss is more expensive o emulation of trap architecture o emulation of privileged instructions o remapping of physical addresses
Implementing Disco: NUMA Memory Management Optimization that enhances data locality Fast translation of virtual-to-physical addresses Allocation of real memory to virtual machines Only moves pages that will have performance benefit Contains a memmap data structure with an entry for each real machine memory page
Two different virtual processors of the same virtual machine logically read-share the same physical page, but each virtual processor accesses a local copy
Implementing Disco: Virtual I/O Intercepts all device access from the virtual machine and forwards them to the physical devices Each disco device defines a monitor call used by the device driver to pass all command arguements Disks and network interfaces include a map as part of their arguements o list of address pairs that specify the source and destination of I/O operations
VM Sharing Imran Ali
Copy-on-Write Disks Uses Virtual Memory Addressing to Map Data to physical Memory Multiple Virtual Machines(VM) Share Machine Memory Copy on write means that VM is unaware of Machine Memory being shared
VM Sharing Pages
Virtual Network Interfaces Virtual Machines are not allowed to communicate with each other Uses Standard Protocols to communicate through Ethernet- type addressing All read only pages can be shared through virtual machines reducing memory overhead Pages are shared whenever possible and are replicated when needed to improve proformance
Transparent Sharing of Pages
Experimental Results Yazen Ghannam
Experimental Setup Experiments are Simulated, not using real hardware Used four different workloads o Software Development (Pmake) OS, I/O Intensive o Hardware Development (Engineering) OS light; Large memory footprint o Scientific Computing (Raytrace, Radix) OS light; uses shared memory regions o Commercial Database I/O light; Single memory intensive
Execution Overheads
Memory Overheads
Scalability
Page Migration and Replication
Experiences and Related Work Tzu-Wei Kuo
Experiences on Real Hardware Disco was ported to run on a real hardware in order to confirm the simulation test results Run on SGI Origin200 board which forms the basis of the FLASH machine o Single - 180MHz MIPS R10000 processor o 128MB of memory
Experiences on Real Hardware Overheads of Virtualization Two workloads o Pmake: compiles Disco itself using the SGI development tools, two files at a time o Engineering: simulates the memory system of the FLASH machine
Experiences on Real Hardware This table shows a breakdown of the execution time for the two workloads and a comparison between IRIX and Disco running IRIX. The execution time is broken down into the user, system, and idle time.
Related Work System Software for Scalable Shared Memory Machines Virtual Machine Monitors Other System Software Structuring Techniques CC-NUMA Memory Management
Conclusion Developing system software for scalable shared memory multiprocessors without massive development effort Experimental results shows that the overhead of virtualization is modest in both processing time and memory footprints Disco provides simple solution for scalability and reliability Lower implementation cost
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei Kuo Paper by: Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum
Title Text