Improving IPC by Kernel Design

Slides:



Advertisements
Similar presentations
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Advertisements

Slide 19-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 19.
Threads, SMP, and Microkernels
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Improving IPC by Kernel Design Jochen Liedtke Slides based on a presentation by Rebekah Leslie.
MICRO-KERNELS New Generation Innovation. The contents Traditional OS view & its problems. Micro-kernel : introduction to concept. First generation micro-kernels.
Outline of the Paper Introduction. Overview Of L4. Design and Implementation Of Linux Server. Evaluating Compatibility Performance. Evaluating Extensibility.
Chorus Vs Unix Operating Systems Overview Introduction Design Principles Programmer Interface User Interface Process Management Memory Management File.
Chorus and other Microkernels Presented by: Jonathan Tanner and Brian Doyle Articles By: Jon Udell Peter D. Varhol Dick Pountain.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.
Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.
CS533 Concepts of Operating Systems Class 6 Micro-kernels Mach vs L3 vs L4.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
Improving IPC by Kernel Design Jochen Liedtke Presented by Ahmed Badran.
Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.
Improving IPC by Kernel Design Jochen Liedtke Shane Matthews Portland State University.
USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.
Microkernels: Mach and L4
Figure 1.1 Interaction between applications and the operating system.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
CS533 Concepts of Operating Systems Class 6 The Performance of Micro- Kernel Based Systems.
The Design of Robust and Efficient Microkernel ManRiX, The Design of Robust and Efficient Microkernel Presented by: Manish Regmi
The Mach System "Operating Systems Concepts, Sixth Edition" by Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne Presentation by Jonathan Walpole.
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.
Improving IPC by Kernel Design
CS533 Concepts of Operating Systems Jonathan Walpole.
1 Micro-kernel. 2 Key points Microkernel provides minimal abstractions –Address space, threads, IPC Abstractions –… are machine independent –But implementation.
Chapter 2: Operating-System Structures. 2.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 14, 2005 Operating System.
The Performance of Microkernel-Based Systems
CS533 Concepts of Operating Systems Jonathan Walpole.
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.
Processes Introduction to Operating Systems: Module 3.
The Mach System Abraham Silberschatz, Peter Baer Galvin, Greg Gagne Presentation By: Agnimitra Roy.
The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe.
1.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Lecture 2: OS Structures (Chapter 2.7)
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Tim Hamilton.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Introduction to Operating Systems Concepts
Computer System Structures
Operating System Structures
Kernel Design & Implementation
Module 12: I/O Systems I/O hardware Application I/O Interface
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Processes and threads.
CS 6560: Operating Systems Design
The Mach System Sri Ramkrishna.
Lesson Objectives Aims Key Words
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Day 12 Threads.
Process Management Presented By Aditya Gupta Assistant Professor
KERNEL ARCHITECTURE.
B. N. Bershad, T. E. Anderson, E. D. Lazowska and H. M
Mach Kernel Kris Ambrose Kris Ambrose 2003.
Threads and Data Sharing
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CS703 - Advanced Operating Systems
By Brian N. Bershad, Thomas E. Anderson, Edward D
Fast Communication and User Level Parallelism
Improving IPC by Kernel Design
Multithreaded Programming
Presented by Neha Agrawal
Outline Chapter 2 (cont) OS Design OS structure
Presented by: SHILPI AGARWAL
Improving IPC by Kernel Design
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Outline Operating System Organization Operating System Examples
System calls….. C-program->POSIX call
Operating Systems Structure
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993 I will be discussing two papers: Improving IPC by Kernel Design And 2. The Performance of u-Kernel-Based Systems

The Performance of u-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Proceedings of the 16th Symposium on Operating Systems Principles October 1997, pp. 66-77

Jochen Liedtke (1953 – 2001) 1977 – Diploma in Mathematics from University of Beilefeld. 1984 – Moved to GMD (German National Research Center). Build L3. Known for overcoming ipc performance hurdles. 1996 – IBM T.J Watson Research Center. Developed L4, a 12kb second generation microkernel. 1997 – Diploma in Mathematics from University of Beilefeld. Thesis was programming language ELAN. First operating system was EUMEL. L3 was a “native code” version of EUMEL, a 1st generation micro-kernel Note that L4 will completely fit in the 1st-level cache of processors of the day.

The IPC Dilemma Inter-process communication (ipc) by message passing is one of the central paradigms of u-kernel and client / server architectures. Increase modularity, flexibility, security and scalability. But, most ipc implementations of the time performed poorly (1st generation micro-kernels such as Mach or Chorus). Really fast message passing systems were needed to run device drivers and other performance critical components at the user-level. So, programmers started to circumvent ipc. For example, co-locating device drivers and other components back into the kernel. To gain acceptance, ipc has to become a very efficient basic mechanism. The IPC Dilemma. Inter-process communication (ipc) by message passing is one of the central paradigms of u-kernel and client / server architectures. It helps to increase modularity, flexibility, security and scalability and is key for distributed computing. But, most ipc implementations perform poorly, so programmers try to circumvent ipc. To gain acceptance, ipc has to become a very efficient basic mechanism.

What to Do? The author sets out to construct a u-kernel that will achieve a tenfold improvement in ipc performance over comparable systems. “ipc performance is the master” is a key design principle. Result is L3 is micro-kernel based operating system built by GMD (German National Research Center for Computer Science) and finally L4. Use a synergistic approach, no single “silver bullet” exists.

Summary of Techniques Seventeen Total

Measured Performance Gains Note synergistic effect. For 8-byte ipc; 49% + 23% + 21% + 18% + 13% + 10% = 134% 49% means that that removing that item would increase ipc time by 49%.

Standard System Calls (Send, Receive) Kernel entered and exited four times, 107 cycles each time. L4_ipc_send ( ); system call, Enter kernel Exit kernel Client (Sender) Server (Receiver) L4_ipc_receive ( ); system call, Client is not Blocked

Add New System Calls Kernel entered and exited two times, half as much. L4_ipc_call ( ); system call, Enter kernel Allocate Processor to Server Suspend Client (Sender) Server (Receiver) L4_ipc_reply_and_wait ( ); Resume from being suspended Return to user (exit kernel) Send Reply Wait for next message L4_ipc_receive ( ); system call, Processor allocate to Client Exit kernel Client IS Blocked Inspect message We can reduce system calls from 4 to 2 by this technique. A similar blocking technique was seen before with LRPC.

Complex Message Structure Combine a sequence of send operations into a single operation by supporting complex messages. Benefit: reduces number of sends.

Direct Transfer by Temporary Mapping LRPC and RPC share user level memory of client and server to transfer messages. But this may effect security. Other micro-kernels transfer messages by a twofold copy, process A space into kernel space into process b space. L4 provides single-copy transfers by temporarily sharing the target region with the sender.

Scheduling, Conventional Conventionally, ipc operations call or reply & receive requires scheduling actions: Delete sending thread from the ready queue. Insert sending thread into the waiting queue Delete the receiving thread from the waiting queue. Insert receiving thread into the ready queue. These operations, together with 4 expected TLB misses will take at least 1.2 us (23%T).

Solution, Lazy Scheduling Conventional IPC requires updating of thread scheduler queues. Performance can be improved by delaying the movement of threads within/between queues until the queues are queried. This ``lazy'' scheduling is achieved by setting state flags (ready / waiting) in the Thread Control Blocks (tcb – contains basic information about a thread) and then scanning queues at query time for threads which should be moved to different queues.

Pass Short Messages in Register Typically, a high proportion of messages are very short, 8 bytes (plus 8 bytes of sender id). Examples would be ack/error replies from device drivers or hardware initiated interrupt messages. The 486 processor had enough registers to allow direct transfer of short messages via cpu registers. Performance gain of 2.4 us or 48%T.

IPC Performance For an eight byte message, ipc time for L3 is 5.2 us compared to 115 us for Mach, a 22 fold improvement. For large message (4K) a 3 fold improvement is seen.

Monolithic Kernel vs. Microkernel While improved ipc performance is good, what is the impact of the improvements on real applications? This is what the user is really concerned with! Linux – native Linux L4Linux – Linux on top of L4 microkernel MkLinux – Linux on top of mach like 1st generation microkernel L4 incorporates the improvements outlined in the first paper and these benchmarks demonstrate that running a real operating system on top of the micro-kernel results in a maximum throughput penalty of 5% - 10% compared to native Linux. The corresponding penalty for MkLinux, a Linux version running on top of a first generation Mach-derived microkernel is 5X to 7X.

L4 Performance

L4 Performance

Conclusion Use a synergistic approach to achieve greater ipc performance, a single “silver bullet” may not exist. A thorough understanding of the interaction between the hardware architecture and the operating system is key to many of the improvements. Microkernels are not portable between hardware architectures. L4 demonstrated the viability of running applications on top of a micro-kernel.

References http://i30www.ira.de/aboutus/people/liedtke/inmemoriam.php Microkernels; Ulfar Erlingsson, Athanasios Kyparlis Monolithic Kernel vs. Microkernel; Benjamin Roch; TU Wien