Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M

Slides:



Advertisements
Similar presentations
URPC for Shared Memory Multiprocessors Brian Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy ACM TOCS 9 (2), May 1991.
Advertisements

User-Level Interprocess Communication for Shared Memory Multiprocessors Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Akbar.
Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.
Lightweight Remote Procedure Call BRIAN N. BERSHAD THOMAS E. ANDERSON EDWARD D. LAZOWSKA HENRY M. LEVY Presented by Wen Sun.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.
Extensibility, Safety and Performance in the SPIN Operating System Department of Computer Science and Engineering, University of Washington Brian N. Bershad,
User-Level Interprocess Communication for Shared Memory Multiprocessors Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Chris.
SCHEDULER ACTIVATIONS Effective Kernel Support for the User-level Management of Parallelism Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, Henry.
CS533 Concepts of Operating Systems Class 8 Shared Memory Implementations of Remote Procedure Call.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, Henry M. Levy ACM Transactions Vol. 8, No. 1, February 1990,
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.
Scheduler Activations Jeff Chase. Threads in a Process Threads are useful at user-level – Parallelism, hide I/O latency, interactivity Option A (early.
1 Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy Presented by: Karthika Kothapally.
CS533 Concepts of Operating Systems Class 9 Lightweight Remote Procedure Call (LRPC) Rizal Arryadi.
CS510 Concurrent Systems Jonathan Walpole. Lightweight Remote Procedure Call (LRPC)
Lightweight Remote Procedure Call (Bershad, et. al.) Andy Jost CS 533, Winter 2012.
Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV.
Operating Systems Part III: Process Management (Process States and Transitions)
Scheduler Activations: Effective Kernel Support for the User- Level Management of Parallelism. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,
Lightweight Remote Procedure Call BRIAN N. BERSHAD, THOMAS E. ANDERSON, EDWARD D. LASOWSKA, AND HENRY M. LEVY UNIVERSTY OF WASHINGTON "Lightweight Remote.
CSE 60641: Operating Systems Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism. Thomas E. Anderson, Brian N.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
Networking Implementations (part 1) CPS210 Spring 2006.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck.
Mark Stanovich Operating Systems COP Primitives to Build Distributed Applications send and receive Used to synchronize cooperating processes running.
The Mach System Silberschatz et al Presented By Anjana Venkat.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Brian Bershad, Thomas Anderson, Edward Lazowska, and Henry Levy Presented by: Byron Marohn Published: 1991.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Jonas Johansson Summarizing presentation of Scheduler Activations – A different approach to parallelism.
Object Interaction: RMI and RPC 1. Overview 2 Distributed applications programming - distributed objects model - RMI, invocation semantics - RPC Products.
Process Management Process Concept Why only the global variables?
Multi-processor Scheduling
CS533 Concepts of Operating Systems
CS399 New Beginnings Jonathan Walpole.
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Chapter 4 Threads.
Out-of-Process Components
B. N. Bershad, T. E. Anderson, E. D. Lazowska and H. M
OPERATING SYSTEMS PROCESSES
Sarah Diesburg Operating Systems COP 4610
Multiple Processor Systems
CS533 Concepts of Operating Systems Class 10
By Brian N. Bershad, Thomas E. Anderson, Edward D
Thread Implementation Issues
Fast Communication and User Level Parallelism
Threads Chapter 4.
Multiprocessor and Real-Time Scheduling
Multithreaded Programming
Structuring Of Systems Using Upcalls - By David D. Clark
Presented by Neha Agrawal
Presented by: SHILPI AGARWAL
Thomas E. Anderson, Brian N. Bershad,
Out-of-Process Components
Chapter 2 Processes and Threads 2.1 Processes 2.2 Threads
CS510 Operating System Foundations
CS533 Concepts of Operating Systems Class 11
CS703 – Advanced Operating Systems
CSE 153 Design of Operating Systems Winter 2019
Threads CSE 2431: Introduction to Operating Systems
Presentation transcript:

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Arthur Strutzenberg

Interprocess Communication In the LRPC paper/presentation, it discussed the need for Failure Isolation Extensibility Modularity Usually a balance between the 3 needs and performance This will is a central theme for this paper as well.

Interprocess Communication Traditionally this is the responsibility of the Kernel This suffers from two problems Architectural Performance Interaction between kernel based communication and user level threads Generally designers use a pessimistic (non cooperative) approach This begs the following question “How can you have your cake and eat it too?”

Interprocess Communication What if the Communication layer is extracted out of the kernel, and made part of the User level This can increase performance by allowing Messages sent between address spaces directly Elimination of unnecessary processor reallocation Amortization (processor reallocation (when needed) is spread over several independent calls) Parallelism in message passing is exploited

User-Level Remote Procedure Call (URPC) Allows communication between address spaces without kernel mediation Isolates Processor Reallocation Thread Management Data Transfer Kernel is ONLY responsible for allocating processors to the address space

URPC & Communication Application OS Communication typically is Narrow Channel (Ports) Limited Number of Operations Create Send Receive Destroy Most modern OS have support for RPC

URPC & Communication What does this buy URPC? RPC is generally limited in definition about how the channels of communication operate Also the definition generally does not specify how processor scheduling (reallocation) will interact with the data transfer

URPC & Communication URPC exploits this information by Messages passed through logical channels are kept in memory that is shared between client and server This memory once allocated is kept intact Thread management is User Level (lightweight instead of “Kernel weight”) (Haven’t we read this in another paper?)

URPC & Thread Management There is less overhead involved in switching a processor to another thread in the same address space (context switching) versus reallocating it to another thread in a different address space (Processor Reallocation) URPC uses this along with the user level scheduler to always give preference to threads within the same address space

URPC & Thread Management Some numbers for comparison: A context switch within the address space 15 microseconds A processor reallocation 55 microseconds

URPC & Processor Allocation What happens when a client invokes a procedure on a server process and the server has no processors allocated to it? URPC calls this “underpowered” The paper identifies this as a load balancing problem The solution is reallocation from client to server A client with an idle processor can elect to reallocate the idle processor to the server This is not without cost, as this is expensive and requires a call to the kernel

Rationale for URPC The design of the URPC package presented in this paper has three main components Thread Management Data Transfer Processor Reallocation

Lets kill two birds with one stone URPC uses an “optimistic reallocation policy” which makes the following assumptions The Client will always have other work to do The server will (soon) have a processor available to service messages This leads to the “amortization of cost” The cost of a processor reallocation is spread over several calls

Why the optimistic approach doesn’t always hold This approach does not work as well when the application Runs as a single thread Is Real time Has high latency I/O Priority Invocations URPC handles this by allowing the client’s address space to force a processor reallocation to the server’s even though there might still be work to do

The Kernel handles Processor Reallocation URPC handles this through call called “Processor.Donate” This passes control of an idle processor down to the kernel, and then back up to a specified address in the receiving space

Voluntary Return of Processors The policy of URPC on its server processors is “…Upon receipt of a processor from a client address, return the processor when all outstanding messages from the client have generated replies, or when the server determines that the client has become ‘underpowered’….”

Parallels to the User Threads Paper Even though URPC implement a policy/protocol, there is absolutely no way to enforce it. This has the potential to lead to some interesting side effects. This is extremely similar to some of the problems discussed in the User Threads paper For example, a server thread could conceivably continue to hold a donated processor and handle requests from other clients

What this leads to… One word: STARVATION URPC handles this by only directly reallocating processors to load balance. In other words, the system also needs the notion of preemptive reallocation The Preemptive reallocation must also adhere to No higher priority thread waits while a lower priority thread runs No processor idles when there is work for it to do (even if the work is in another address space)

Controlling Channel Access Data flows in URPC involving different address spaces use a bidirectional shared memory queue. The queues have a test and set lock on either end, which the papers specifically state must be NON SPINNING The protocol is, if the lock is free, acquire it, otherwise go on and do something else Remember this protocol operates under the assumption that there is always work to do!!

Data Transfer Using Shared Memory There is still the risk of what the paper refers to as the “abusability factor” with RPC, where Clients & Servers can Overload each other Deny service Provide bogus results Violate communication protocols URPC passes the responsibility to handle this off to the stubs.

Cross-Address Space Procedure Call and Thread Management This section of the paper identifies that there is a correspondence between Send  Receive (messaging) And Start  Stop (Threads) Does this not remind everybody of a classic paper that we had to read?

Another link to the User Threads Paper Additionally the paper identifies three arguments with the thread—message relationship High performance thread management facilities are needed for fine-grained parallel programs High performance can only be provided at the user level The close interaction between communication and thread management can be exploited

URPC Performance Some comparisons: (values are in microseconds) Test URPC Fast Threads Taos Threads Ratio of Taos Cost to URPC Cost Procedure Call 7 1.0 Fork 43 1192 27.7 Fork;Join 102 1574 15.4 Yield 37 57 1.5 Acquire, Release 27 PingPong 53 271 5.1

(values are in microseconds) URPC Performance URPC can be broken down into 4 components Send Poll Receive Dispatch (values are in microseconds) Component Client Server Poll 18 13 Send 6 Receive 10 9 Dispatch 20 25 Total 54 53

Call Latency and Throughput Call Latency is the time from which a thread calls into the stub until control returns from the stub. These are load dependent, and depend on Number of Client Processors (C) Number of Server Processors (S) Number of runnable threads in the client’s Address Space (T) The graphs measure how long it takes to make 100,000 “Null” procedure calls into the server in a “tight loop”

Call Latency and Throughput

Conclusions In certain circumstances, it makes sense to move the Communication layer from the kernel to user space. Most OS’s are designed for a uniprocessor system, and are ported over to an SMMP system. URPC is one example of a system that is designed for SMMP directly, and takes direct advantage of the characteristics of the system

Conclusions As a lead in to Professor Walpoles Discussion and Q&A, lets conclude by trying to fill out the following table: RPC Type Similarities Differences Generic RPC LRPC URPC