Disco Running Commodity Operating Systems on Scalable Multiprocessors.

Slides:



Advertisements
Similar presentations
Disco: Running Commodity Operation Systems on Scalable Multiprocessors E Bugnion, S Devine, K Govil, M Rosenblum Computer Systems Laboratory, Stanford.
Advertisements

1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and.
Multiprocessors CS 6410 Ashik Ratnani, Cornell University.
Shared Memory Multiprocessors Ravikant Dintyala. Trends Higher memory latencies Large write sharing costs Large secondary caches NUMA False sharing of.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
G Robert Grimm New York University Disco.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Translation Buffers (TLB’s)
1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum, Stanford University, 1997.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Computer Organization and Architecture
Caching and Demand-Paged Virtual Memory
CS533 Concepts of OS Class 16 ExoKernel by Constantia Tryman.
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
Multiprocessors Deniz Altinbuken 09/29/09.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Cellular Disco: resource management using virtual clusters on shared memory multiprocessors Published in ACM 1999 by K.Govil, D. Teodosiu,Y. Huang, M.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Computer Architecture Lecture 28 Fasih ur Rehman.
Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum
Kit Cischke 09/09/08 CS Overview  Background  What are we doing here?  A Return to Virtual Machine Monitors  What does Disco do?  Disco: A.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui COMP 203 / NWEN 201 Computer Organisation / Computer Architectures Virtual.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum Summary By A. Vincent Rayappa.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
Supporting Multi-Processors Bernard Wong February 17, 2003.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Processes and Virtual Memory
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Memory Resource Management in VMware ESX Server By Carl A. Waldspurger Presented by Clyde Byrd III (some slides adapted from C. Waldspurger) EECS 582 –
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Virtual Machine Monitors
Virtual Machines Disco and Xen (Lecture 10, cs262a)
CS161 – Design and Architecture of Computer
A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors
CMSC 611: Advanced Computer Architecture
Page Replacement.
Outline Midterm results summary Distributed file systems – continued
Virtual Machines Disco and Xen (Lecture 10, cs262a)
Virtual Memory Hardware
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Prof. Leonardo Mostarda University of Camerino
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Lecture 8: Efficient Address Translation
Translation Buffers (TLBs)
Xen and the Art of Virtualization
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Review What are the advantages/disadvantages of pages versus segments?
Presentation transcript:

Disco Running Commodity Operating Systems on Scalable Multiprocessors

FLASH cache coherent non-uniform multiprocessor developed at Stanford not available at the time paper was written

Problems with other approaches many other experimental systems require large changes to uniprocessor OSes development lags behind delivery of hardware high costs mean system will likely introduce instabilities, which may break legacy applications often hardware vendors are not the software vendors (think Intel/Microsoft)

Virtual Machine Monitors nanokernel software layer between hardware and heavyweight OS by running multiple copies of traditional OSes, scalability issues are confined to the much smaller VM monitor VM are natural software fault boundaries and again the size of the monitor makes hardware fault tolerance easier to implement

Virtual Machine Monitors nanokernel monitor handles all the NUMA related issues so that UMA OSes do not need to be made aware of non-uniformity multiple OSes allow legacy applications to continue to run while newer versions are phased in. Could allow more experimentation with new technologies.

Disco architecture

Virtual Machine Challenges overhead –runtime privileged instructions must be emulated inside monitor I/O requests must be intercepted and de/re - virtualized by monitor –memory code/data must be replicated for the different copies of OS each OS may have its own file system buffer, for instance

Virtual Machine Challenges resource management –monitor will not have all the information that the OS does, so it may make poor decision. Think of spin-locking. sound familiar? This is the same as the argument against kernel-level threads.

Virtual Machine Challenges communication and sharing –if OSes are separated by virtual machine boundaries how do they share resources and how does information cross those VM boundaries. VMs aren’t aware they are actually on the same machine. sound familiar? This issue motivated LRPC and URPC, specializing communication protocols for the situation where server and client reside on the same physical machine

Disco implementation Disco emulates the MMU and the trap architecture, allowing unmodified applications and OSes to run on the VM frequently used kernel operations can be optimized. For instance interrupt disabling is done by the OSes by load and storing to special addresses

Disco implementation all I/O devices are virtualized, including network connections and disks, and all access to them must pass through Disco to be translated or emulated.

Disco implementation at only 13,000 lines there is a higher ability to hand tune code. small image size, only 72KB, also means that copies of Disco can reside in every local node, so Disco text never has to be fetched at lower non-uniform rate machine-wide data structures are partitioned so parts that are currently being used by processor can reside in local memory

Disco implementation scheduling VMs is similar to traditional kernels scheduling processes, eg quanta size considerations, saving state in data structures, processor affinity, etc

Disco implementation Virtual Physical Memory Disco maintains a physical-to-machine address mapping. machine addresses are FLASH’s 40 bit addresses

Disco implementation Virtual Physical Memory when a heavy weight OS tries to update the TLB, Disco steps in and applies the physical-to-machine translation. Subsequent memory accesses then can go straight thru the TLB each VM has an associated pmap in the monitor pmap also has a back pointer to its virtual address to help invalidate mappings in the TLB

Disco implementation Virtual Physical Memory MIPS has a tagged TLB, called address space identifier (ASID). ASIDs are not virtualized, so TLB must be flushed on uberweight VM context switches 2 nd level software TLB?

Disco implementation Hiding NUMA cache misses are served faster from local memory rather than remote memory read and read-shared pages are migrated to all nodes that frequently access them write-shared are not, since maintaining consistency requires remote access anyway migratation and replacement policy is driven by cache miss counting

Disco implementation Hiding NUMA

memmap tracks which virtual page references each physical page. Used during TLB shootdown.

Disco implementation Virtualizing I/O all device access is intercepted by the monitor disk reads can be serviced by monitor and if request size is a multiple of the machines’s page size, monitor only has to remap machine pages into the VM physical memory address space. pages are read-only and will generate a copy-on-write fault if written to

Virtualizing I/O

Virtual Networks

IRIX small changes were required to IRIX kernel, but were due to a MIPS pecularity new device drivers were not needed hardware abstraction layer is where the trap, the zeroed page, unused page, and VM de-scheduling optimizations were implemented

SPLASHOS thin OS, supported by disco used for parallel scientific applications

Experimental Results since FLASH was not available experiments were run on SimOS, a machine simulator simulator was too slow, compared to actual machine, to allow long work loads to be studied

Workloads

Single VM ran the four worloads in plain IRIX inside simulator and with a single VM running IRIX, 3% - 16% slowdown

Memory Overhead ran pmake with 8 physical processors with six different configurations, plain IRIX, 1VM, 2VMs, 4VMs, 8VMs, and 8VMs communicating with NFS demonstrates –8VMs required less than twice the physical memory as plain IRIX –physical to machine mapping is a useful optimization

Scalability tests compared the performance of pmake under the previously described configurations summary: while 1VM showed a significant slowdown(36%), using 8VMs showed a significant speedup(40%) also ran radix sorting algorithm on plain IRIX and on SPLASHOS/Disco. Reduced run time by 2/3

Page Migration and Replication engineering ran on 8 processors, raytrace on 16 UMA machine is theoretical lower bound

Conclusion nanokernel is several orders of magnitude smaller than heavyweight OS, yet can run virtually unmodified OSes in virtual machine monitors problems of execution overhead and memory footprint were addressed