Download presentation
Presentation is loading. Please wait.
Published byClaud Campbell Modified over 8 years ago
1
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum Summary By A. Vincent Rayappa
2
2 Motivation Large shared memory multi-processor systems are now commonly available. Porting Operating Systems to these platforms is expensive, difficult and error- prone. Instead of porting OS’ to these systems, partition these systems into Virtual Machines (VM) and run essentially unmodified OS on the VM’s.
3
3 Back to the future: VMM VM’s run on top of the Virtual Machine Monitor (VMM) software. VMM is a really old idea. –A survey paper from 1974 lists over 70 references! [1] Disco applies the idea of a VMM to a modern shared memory multiprocessor system.
4
4 Things a VMM needs to do For an OS running on top of a VM (guest OS) to have the same semantics and behavior as a corresponding OS running on top of a real machine: –VMM needs to present an abstraction where the OS seems to have exclusive access to CPU, memory and I/O devices. –One guest OS cannot interfere with another or with the VMM. In other words, the VMM needs to virtualize CPU, memory and I/O to provide a suitable abstraction to the guest OS.
5
5 Virtualizing CPU (1) Most things can run in real mode directly on real CPU. –Give good performance. –Assumes host OS/apps written for this ISA. To schedule virtual CPU, set CPU registers to those of VCPU and jump to VCPU’s PC. What if attempt is made to modify TLB or access or access physical memory? –Privileged instructions need to be trapped and simulated by VMM.
6
6 Virtualizing CPU (2) MIPS R10K processor has three modes[2] from most to least privileged: –Kernel Mode (Used by VMM; full HW access) –Supervisor Mode (Used by Guest OS kernel) –User Mode (Used by Guest OS apps) Attempts by Guest OS to run privileged instructions will cause a trap into kernel mode and a call to the VMM. I presume that in MIPS R10K it is clear which instructions are privileged and which aren't. –Not the case with x86. VMWare VMM uses dynamic recompilation to get around this [3].
7
7 Virtualizing Memory Disco presents to the VM the abstraction of a certain amount of memory starting at zero. VM thinks in terms of Virtual to Physical mapping. Disco VMM maps Physical address to a 40-bit machine address.
8
8 Virtualizing Memory (Details) Attempt by OS to insert Virtual-to-physical TLB entry trapped. Disco consults pmap to translate physical address to 40-bit machine address. Machine address (with protection bits) inserted into TLB. TLB misses higher due to OS references & flushing of TLB on all context switches. To mitigate this, a second-level TLB maintains virtual-to-machine mapping. TLB CPU For Node 1 Attempt to add V1→ P1 & P Bits pmap VMM P1→ M1 (V1) & Orig P Bits V1 → M1 & P Bits Trapped
9
9 Second-level TLB Disco adds a second-level TLB which is used to cache recent virtual-to-machine mappings. On a TLB miss, this second-level TLB of the VCPU is consulted first. If entry is not in the second-level TLB, then the pmap is used for the determine the machine address. This effectively makes the TLB seem much larger than it really is.
10
10 TLB, second-level TLB & pmap[4]
11
11 Hiding NUMA-ness of Memory Most commodity OS and applications running on top of Disco VMM assume uniform memory access. But the DISCO hardware (FLASH) is a NUMA architecture. –Means some memory locations have slower access time compared to others. –NUMA causes performance degradation. DISCO uses page migration and replication to improve locality and to hide the NUMA nature of the underlying HW.
12
12 V2→ L’’ Memory with faster access to Node 1 Memory with slower access to Node 1 V1→ L’ V2→ R TLB L’ R L’’ Page Migration For a page heavily accessed by one node. Copy page from remote node to local node. Update TLB & invalidate TLB entries on other CPU’s CPU For Node 1
13
13 Page Replication For a page accessed by multiple nodes in R/O mode. Down-grade page to read- only Copy data to local memory Update TLB V1→ M TLB M Memory local to Node 1 Memory local to Node 2 V1→ M TLB R/O M’ V1→ M’ CPU For Node 1 CPU For Node 2
14
14 memmap Data Structure memmap: Makes efficient TLB changes done during migration and replication. One entry per real machine page. Each entry contains: –Virtual machine ID(s) using this machine page and –Virtual address on VM that maps to this physical page.
15
15 Virtualizing I/O Special device drivers installed. –Invoke VMM via trap and pass all arguments. Disco interacts with I/O devices on behalf of VM’s. Disco intercepts all DMA requests to translate Virtual-to-Physical mappings to Virtual-to-machine mapping. If a device is used by just one VM, Disco simply ensures exclusive access to device by that VM.
16
16 Copy-on-write disks (1) After reading disk blocks Disco caches it in memory. If another VM requires access to the same disk, it can be satisfied simply by mapping into the cache. No disk access is needed. If there is an attempt to write COW fault handler invoked: –Write operations are not actually written to disk, but kept in memory. –Good for non-persistent disks (e.g. /tmp area) Useful when multiple VM’s share kernel and application code which is R/O anyway. Persistent disk can only be mounted by one VM. Other VM’s can access via NFS. –So COW not implemented for persistent disks?
17
17 Copy-on-write Disks (2)
18
18 Virtualizing Network Interface Disco creates a virtual network device and subnet and allows communication with data sharing. –Behaves similar to Ethernet segment with no limit of packet size. When data sent from one node to another and data spans a multiple of memory pages, data is simply mapped into the destination instead of being copied.
19
19 NFS over Virtual Network
20
20 Changes to IRIX To run IRIX on top of DISCO, some changes had to be made: –IRIX kernel code and data by default is placed in a memory location that by passes TLB. This has to be changed to put kernel code and data in a location where VMM could intercept all address translations. –Device drivers rewritten. –Some synchronization routines were using protected registers causing frequent trap into the VMM. These access had no other adverse side-effects. These were rewritten to use an non-privileged load/store. Most changes were restricted to the HAL.
21
21 Measurements All measurements were done using an emulator since the target HW was available at the time the VMM was developed. Because a SW simulator was used, only short running benchmarks were used.
22
22 Disco runtime overhead Overhead due to I/O virtualization Overhead due to TLB misses and emulation
23
23 Memory Benefit Due To Data Sharing V: pmake memory used if there had been no sharing. M: pmake memory used with sharing. 6 different configurations
24
24 Scalability ←Baseline Partitioning of problem into different VM’s increases scalability. Kernel synchronization time becomes smaller.
25
25 Benefit Of Migration/Replication ←Improvement due to migration/replication Cache ← misses satisfied locally
26
26 Conclusions Effectively managed to use a large MP system using a small (13KLOC) VMM. How well will other commodity OS’ run on top of DISCO? –This paper describes the best case scenario. IRIX, an OS designed for MIPS microprocessor running on MIPS with the DISCO VMM!
27
27 References [1] Robert P. Goldberg. Survey of Virtual Machine Research. IEEE Computer Magazine 7(6), pp. 34-45, Jun. 1974. [2] MIPS R10000 Microprocessor User Guide, Version 2.0. Available online.online [3] Presentation by Rosenblum on VMWare at the 1999 HotChips Conference. Available online.online [4] E. Bugnion, Commodity Operating Systems on Scalable Multiprocessors, ACM Transactions on Computer Systems, Vol. 15, No. 4, November 1997, pp 412-447.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.