By Brian N. Bershad, Thomas E. Anderson, Edward D

Slides:

Advertisements

Similar presentations

Threads, SMP, and Microkernels

Advertisements

URPC for Shared Memory Multiprocessors Brian Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy ACM TOCS 9 (2), May 1991.

Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.

User-Level Interprocess Communication for Shared Memory Multiprocessors Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Akbar.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.

Lightweight Remote Procedure Call BRIAN N. BERSHAD THOMAS E. ANDERSON EDWARD D. LAZOWSKA HENRY M. LEVY Presented by Wen Sun.

Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.

Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.

G Robert Grimm New York University Lightweight RPC.

CS533 Concepts of Operating Systems Class 20 Summary.

User-Level Interprocess Communication for Shared Memory Multiprocessors Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Chris.

Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,

CS533 Concepts of Operating Systems Class 8 Shared Memory Implementations of Remote Procedure Call.

User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.

Dawson R. Engler, M. Frans Kaashoek, and James O'Tool Jr.

CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.

Scheduler Activations Effective Kernel Support for the User-Level Management of Parallelism.

3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.

Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, Henry M. Levy ACM Transactions Vol. 8, No. 1, February 1990,

Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,

3.5 Interprocess Communication

USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.

A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.

1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.

CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.

CS533 Concepts of OS Class 16 ExoKernel by Constantia Tryman.

1 Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy Presented by: Karthika Kothapally.

CS533 Concepts of Operating Systems Class 9 Lightweight Remote Procedure Call (LRPC) Rizal Arryadi.

CS510 Concurrent Systems Jonathan Walpole. Lightweight Remote Procedure Call (LRPC)

Lightweight Remote Procedure Call (Bershad, et. al.) Andy Jost CS 533, Winter 2012.

Scheduler Activations: Effective Kernel Support for the User- Level Management of Parallelism. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,

Lightweight Remote Procedure Call BRIAN N. BERSHAD, THOMAS E. ANDERSON, EDWARD D. LASOWSKA, AND HENRY M. LEVY UNIVERSTY OF WASHINGTON "Lightweight Remote.

EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM

CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.

Networking Implementations (part 1) CPS210 Spring 2006.

Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck.

M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.

The Mach System Silberschatz et al Presented By Anjana Venkat.

Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.

Brian Bershad, Thomas Anderson, Edward Lazowska, and Henry Levy Presented by: Byron Marohn Published: 1991.

The Structuring of Systems Using Upcalls David D. Clark (Presented by John McCall)

Jonas Johansson Summarizing presentation of Scheduler Activations – A different approach to parallelism.

Last Class: Introduction

Kernel Design & Implementation

CS 6560: Operating Systems Design

CS533 Concepts of Operating Systems

CS399 New Beginnings Jonathan Walpole.

B. N. Bershad, T. E. Anderson, E. D. Lazowska and H. M

Threads, SMP, and Microkernels

Outline Midterm results summary Distributed file systems – continued

Sarah Diesburg Operating Systems COP 4610

Lecture 4- Threads, SMP, and Microkernels

Fast Communication and User Level Parallelism

Operating Systems : Overview

Threads Chapter 4.

Multiprocessor and Real-Time Scheduling

Multithreaded Programming

Structuring Of Systems Using Upcalls - By David D. Clark

Presented by Neha Agrawal

Operating Systems : Overview

Presented by: SHILPI AGARWAL

Operating Systems : Overview

Thomas E. Anderson, Brian N. Bershad,

Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M

Operating Systems : Overview

CS510 Operating System Foundations

CS533 Concepts of Operating Systems Class 11

Threads CSE 2431: Introduction to Operating Systems

Presentation transcript:

User-Level Interprocess Communication for Shared Memory Multiprocessors By Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by Damon Tyman (all figures/tables from the paper)

The progression LRPC (Lightweight Remote Procedure Call; Feb 1990) Optimize RPC for performance in “common case” cross-domain (same-machine) communication. URPC (User-Level Remote Procedure Call; May 1991) Eliminate the involvement of the kernel in cross-address space communication and thread management, utilizing shared memory and user-level threads.

Introduction Efficient interprocess communication encourages system decomposition across address space boundaries. Advantages of decomposed systems: Failure isolation Extensibility Modularity Unfortunately, when cross-address space communication is slow, these advantages are usually traded for better system performance.

Kernel-level interprocess communication Traditionally, interprocess communication has been the responsibility of the kernel. Problems with kernel-based communication: Architectural performance barriers Performance limited by cost of invoking the kernel and reallocating processor to a different address space. In authors’ earlier work, LRPC, 70% of overhead can be attributed to kernel mediation. Interaction with high-performance user-level threads For satisfactory performance, medium- and fine-grained parallel applications need user-level thread management. Costs (performance and system complexity) high for partitioning strongly interdependent communication and thread management across protection boundaries.

User-level cross-address space communication With shared memory multiprocessor, kernel can be eliminated from cross- address space communication. User-level managed cross-address space communication: Shared memory can be utilized as the data transfer channel. Messages sent between address spaces directly—no kernel intervention. Frequency of processor reallocation can be reduced by preferring threads from the same address space as the processor is already active in. The inherent parallelism of sending and receiving messages can be exploited for improved call performance.

User-level Remote Procedure Call (URPC) URPC provides safe and efficient communication between address spaces on the same machine without kernel mediation. Three components of interprocess communication isolated: processor reallocation, thread management, and data transfer. Control transfer between address spaces handled by thread management and processor reallocation. Compared to traditional “small kernel systems” in which the kernel handles all three components, in URPC only processor reallocation involves the kernel. URPC implementation cross-address space procedure call latency 93 microseconds compared to LRPC’s (using kernel-managed communication) 157 microseconds.

Messages vs. Procedure Calls Commonly, in existing systems communication between applications achieved by sending messages over a narrow channel that supports just a few operations. While powerful construct, message passing may require thinking in a paradigm not supported by the programming language in use. RPC better supports the programming paradigm supported by the programming language. Message-passing may serve as the underlying technique, but programming approach more familiar. With many programming languages, communication within an address space has synchronous procedure call, data typing, and shared memory features. RPC reconciles the above with the untyped, asynchronous messages of cross- address space communication by using an abstraction that hides these characteristics of the cross-address space communication from the user.

Synchronization To the programmer, cross-address space procedure calls appear synchronous (calling thread blocks waiting for return). At the thread management level and below, however, the call is asynchronous. While blocked, another thread can be run. URPC favors threads in the same address space as the processor is already running. As soon as the reply arrives, the blocked thread can be scheduled to any processor assigned to its address space. All the involved scheduling operations can be handled by a user-level thread management system, avoiding the need to reallocate any processors to a new address space (if the callee address space has at least one processor assigned to it).

User view/system components Two software packages used by stub and application code: FastThreads (user-level threads scheduled on top of middleweight kernel threads) and URPC package

Processor reallocation overhead There is significantly less overhead in switching a processor over to a thread in the same address space (context switching) than reallocating to a thread in a different address space (processor reallocation). Processor reallocation high overhead: Requires changing the protected mapping registers that define the virtual address space context. Scheduling costs to choose address space for reallocation. Substantial long-term costs due to poor cache and TLB performance from constant locality switches (greater for processor reallocation than just context switching). Minimal latency same-address space context switch takes about 15 microseconds on the C-VAX while cross-address space processor reallocation takes 55 microseconds (doesn’t consider long-term costs!).

Processor Reallocation Policy An optimistic reallocation policy is employed: threads from the same address space are scheduled whenever possible. Assumes: The client has other work to do. The server will soon have a processor with which to service a request. Optimistic assumptions may not always hold (allowing for a processor reallocation to server) Single-threaded applications High-latency I/O operations Real-time applications (bounded call latency) Server may not have its own processor. Voluntary return of processors cannot be guaranteed. No way to enforce policies regarding return of processors. If client gives processor up to server, may never get back.

Sample Execution One client (an editor) and two servers (window manager and file manager) each have their own address space. T1 and T2 are threads in the editor. Two available processors.

Data transfer using shared memory Data flows between URPC packages in different address spaces over a bidirectional shared memory queue with non-spinning test-and-set locks on either end. Receiver should never have to delay (spin-wait) while sender holds the lock. “Abusability factor” not increased by shared memory message channels Denial of service, bogus results, communication protocol violation still possible. Up to higher-level protocols to filter abuses up to application layer. Communication safety responsibility of stubs Arguments passed in shared memory buffers during binding phase. No kernel copying necessary.

Cross-address space call and thread management Strong correlation between communications functions and thread management synchronization functions RPCs synchronous with respect to calling thread Each communication function has corresponding thread management function (send <-> start, receive <-> stop) Thread classification: Heavyweight: kernel makes no distinction between thread and address space Middleweight: kernel-managed threads but decoupled from address spaces (can be multiple threads per address space) Lightweight: threads managed by user-level libraries Fine-grained parallel applications demand high-performance thread management, only possible with user-level threads. Two-level scheduling: lightweight user-level threads are scheduled on top of weightier kernel-managed threads.

URPC Performance Processing times close for client and server Nearly half of time goes to thread management (dispatch)

URPC Performance C: client processors, S: server processors; T:runnable threads in client address space Latency is traded for throughput; T=2 from T=1 means ~20% latency increase, but ~75% throughput increase

And beyond… LRPC: improve RPC for cross-domain communication performance. URPC: move inter-process communication to user-level as much as possible. Threads managed at user-level but scheduled on top of middleweight threads. Scheduler Activations (Feb 1992): provide better abstraction (than kernel threads) for kernel support of user-level threads.