Supporting Multi-Processors Bernard Wong February 17, 2003.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

Threads, SMP, and Microkernels

Disco: Running Commodity Operation Systems on Scalable Multiprocessors E Bugnion, S Devine, K Govil, M Rosenblum Computer Systems Laboratory, Stanford.

1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented.

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Multiple Processor Systems

Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-

Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and.

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Multiprocessors CS 6410 Ashik Ratnani, Cornell University.

Shared Memory Multiprocessors Ravikant Dintyala. Trends Higher memory latencies Large write sharing costs Large secondary caches NUMA False sharing of.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.

G Robert Grimm New York University Disco.

1: Operating Systems Overview

Memory Management 2010.

Disco Running Commodity Operating Systems on Scalable Multiprocessors.

Tornado: Maximizing Locality and Concurrency in a SMMP OS.

Disco Running Commodity Operating Systems on Scalable Multiprocessors.

1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum, Stanford University, 1997.

1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.

1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.

Multiprocessors Deniz Altinbuken 09/29/09.

CS533 Concepts of Operating Systems Jonathan Walpole.

Computer System Architectures Computer System Software

Microkernels, virtualization, exokernels Tutorial 1 – CSC469.

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael.

Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)

CS533 Concepts of Operating Systems Jonathan Walpole.

LOGO OPERATING SYSTEM Dalia AL-Dabbagh

Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.

Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.

Types of Operating Systems

Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum Summary By A. Vincent Rayappa.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.

 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.

Memory Management Fundamentals Virtual Memory. Outline Introduction Motivation for virtual memory Paging – general concepts –Principle of locality, demand.

Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.

1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.

PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.

Processes and Virtual Memory

Full and Para Virtualization

Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,

1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.

Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.

Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.

Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.

Introduction to Operating Systems Concepts

Tornado Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

Introduction to Operating Systems

Threads, SMP, and Microkernels

Page Replacement.

TORNADO OPERATING SYSTEM

Shared Memory Multiprocessors

CS510 - Portland State University

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Presentation transcript:

Supporting Multi-Processors Bernard Wong February 17, 2003

Uni-processor systems Began with Uni-processor systems Simple to implement uni-processor OS, allows for many assumptions UMA, efficient locks(small impact on throughput), straight forward cache coherency Hard to make faster

Small SMP systems Multiple symmetric processors Requires some modifications to the OS Still allows for UMA System/Memory bus becomes a contended resource Locks have larger impact on throughput e.g. A lock on one process can block another process (running on another processor) from making progress Must introduce finer grain locks to improve scalability System bus limits system size

Large Shared Memory Multi-processor Consist of many nodes, each of which may be a uni-processor or an SMP Access to memory often NUMA, sometimes does not even provide cache coherency Performance very poor if used with an off the shelf SMP OS Requirement for good performance: Locality of service to request Independence between services

DISCO Uses Virtual Machine Monitors to run multiple commodity OSes on a scalable multi-processor Virtual Machine Monitor Additional layer between OS and hardware Virtualizes processor, memory, I/O OS unaware of virtualization (ideally) Exports a simple general interface to the commodity OS

DISCO Architecture DISCO PE Interconnect ccNUMA Multiprocessor OSSMP-OSOS Thin OS

Implementation Details Virtual CPUs Uses direct execution on real CPU Fast, most instructions run at native speeds Must detect and emulate operations that can not be safely exported to VM Primary privilege instructions: TLB modification, direct physical memory or I/O operations Must also keep data-structure to save registers and other state For when virtual CPU not scheduled to real CPU Virtual CPUs uses affinity scheduling to maintain cache locality

Implementation Details Virtual Physical Memory Adds a level of address translation Maintains physical-to-machine address mappings Because VMs use physical addresses that start from 0 and continuing for size of VM’s memory address Performed via emulating TLB instructions When OS tries to insert entry into TLB, DISCO intercepts it and insert translated version TLB flushed on virtual CPU switches TLB lookup also more expensive due to required trap Second level software TLB added to improve performance

Implementation Details Virtual I/O Intercepts all device accesses from VM through special OS device drivers Virtualizes both disk and network I/O DISCO allows persistent disks and non- persistent disks Persistent disks cannot be shared Non-persistent disk implemented via copy- on-write

Why use a VMM? DISCO aware of NUMA-ness Hides NUMA-ness from commodity OS Requires less work than engineering a NUMA- aware OS Performs better than NUMA-unaware OS Good middle ground How? Dynamic page migration and page replication Maintain locality between virtual CPU’s cache miss and memory pages to which cache miss occur

Memory Management Pages heavily accessed by only one node are migrated to that node Change physical to machine address mapping Invalidates TLB entries that point to old location Copy page to local machine Pages that are heavily read-share and replicated to nodes move heavily accessing them Downgrade TLB entries pointing to page to read-only Copy pages Update relevant TLB entries to local machine version and remove read-only

Page Replication

Aren’t VMs memory inefficient? Traditionally, VMs tend to replicate memory used for each system image Additionally, structures such as disk cache not shared DISCO uses notion of global buffer cache to reduce memory footprint

Page sharing DISCO keeps a data structure that maps disk sectors to memory addresses If two VMs request for same disk sector, both assigned to same read-only buffer page Modifications to pages performed via copy-on- write Only works for non-persistent copy-on-write disks

Page sharing

Sharing effective even via packets when sharing data over NFS

Virtualization overhead

Data sharing

Workload scalability

Performance Benefits of Page Migration/Replication

Tornado OS designed to take advantage of shared memory multi-processors Object Oriented structure Every virtual and physical resource represented by an independent object Ensure natural locality and independence Resource lock and data structure stored on some node as resource Resources manage independently and at a fine grain No global source of contention

OO structure Example: Page fault Separate File Cache Manager(FCM) object for different regions of memory COR -> Cached Object Representative All objects are specific to either the faulting process or the file(s) backing the process Problem: Hard to make global policies

Clustered objects Even with OO, widely shared objects can be expensive due to contention Need replication, distribution, partition to reduce contention Clustered Objects systematic way to do this Gives illusion of a single object, but is actual composed of multiple component (rep) objects Each component handle a subset of processors Must handle consistency across reps

Clustered objects

Clustered object implementation Per-processor translation table Contains pointer for to local rep of each clustered object Created on demand via a combination of global miss handling object and clustered object specific miss handling object

Memory Allocation Need an efficient, highly concurrent allocator that maximizes locality Use local pools of memory However, for small block allocation, still have problem of false sharing Additional small pool of strictly local memory used

Synchronization Use of objects, and additional clustered object reduces scope of lock and limits lock contention to that of a rep Existence guarantees hard A thread must determine whether an object is currently being de-allocated by another thread Often require lock hierarchy where root is a global lock DISCO uses semi-automatic garbage collector Thread never worries needs to test for existence, no locking required

Protected Procedure Calls Since Tornado is a microkernel, IPC traffic is significant Need a fast IPC mechanism that maintains locality Protected Procedure Calls (PPC) maintains locality by: Spawning a new server thread in the same processor as client to service client request Keeping all client specific data in data- structures stored on the client

Protected Procedure Calls

Performance Comparison to other large shared- memory multi-processors

Performance (n threads in 1 process)

Performance (n threads in n process)

Conclusion Illustrated two different approach to make efficient use of shared memory multi-processors DISCO adds extra layer between hardware and OS Less engineering effort, more overhead Tornado redesigns an OS to take advantage of locality and independence More engineering effort, less overhead but local and independent algorithms may work poorly with real world loads