Improving IPC by Kernel Design

Improving IPC by Kernel Design
Jochen Liedtke Proceeding of the 14th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993 I will be discussing two papers: Improving IPC by Kernel Design And 2. The Performance of u-Kernel-Based Systems

The Performance of u-Kernel-Based Systems
H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Proceedings of the 16th Symposium on Operating Systems Principles October 1997, pp

Jochen Liedtke (1953 – 2001) 1977 – Diploma in Mathematics from University of Beilefeld. 1984 – Moved to GMD (German National Research Center). Build L3. Known for overcoming ipc performance hurdles. 1996 – IBM T.J Watson Research Center. Developed L4, a 12kb second generation microkernel. 1997 – Diploma in Mathematics from University of Beilefeld. Thesis was programming language ELAN. First operating system was EUMEL. L3 was a “native code” version of EUMEL, a 1st generation micro-kernel Note that L4 will completely fit in the 1st-level cache of processors of the day.

The IPC Dilemma Inter-process communication (ipc) by message passing is one of the central paradigms of u-kernel and client / server architectures. Increase modularity, flexibility, security and scalability. But, most ipc implementations of the time performed poorly (1st generation micro-kernels such as Mach or Chorus). Really fast message passing systems were needed to run device drivers and other performance critical components at the user-level. So, programmers started to circumvent ipc. For example, co-locating device drivers and other components back into the kernel. To gain acceptance, ipc has to become a very efficient basic mechanism. The IPC Dilemma. Inter-process communication (ipc) by message passing is one of the central paradigms of u-kernel and client / server architectures. It helps to increase modularity, flexibility, security and scalability and is key for distributed computing. But, most ipc implementations perform poorly, so programmers try to circumvent ipc. To gain acceptance, ipc has to become a very efficient basic mechanism.

What to Do? The author sets out to construct a u-kernel that will achieve a tenfold improvement in ipc performance over comparable systems. “ipc performance is the master” is a key design principle. Result is L3 is micro-kernel based operating system built by GMD (German National Research Center for Computer Science) and finally L4. Use a synergistic approach, no single “silver bullet” exists.

Summary of Techniques Seventeen Total

Measured Performance Gains
Note synergistic effect. For 8-byte ipc; 49% + 23% + 21% + 18% + 13% + 10% = 134% 49% means that that removing that item would increase ipc time by 49%.

Standard System Calls (Send, Receive)
Kernel entered and exited four times, 107 cycles each time. L4_ipc_send ( ); system call, Enter kernel Exit kernel Client (Sender) Server (Receiver) L4_ipc_receive ( ); system call, Client is not Blocked

Add New System Calls Kernel entered and exited two times, half as much. L4_ipc_call ( ); system call, Enter kernel Allocate Processor to Server Suspend Client (Sender) Server (Receiver) L4_ipc_reply_and_wait ( ); Resume from being suspended Return to user (exit kernel) Send Reply Wait for next message L4_ipc_receive ( ); system call, Processor allocate to Client Exit kernel Client IS Blocked Inspect message We can reduce system calls from 4 to 2 by this technique. A similar blocking technique was seen before with LRPC.

Complex Message Structure
Combine a sequence of send operations into a single operation by supporting complex messages. Benefit: reduces number of sends.

Direct Transfer by Temporary Mapping
LRPC and RPC share user level memory of client and server to transfer messages. But this may effect security. Other micro-kernels transfer messages by a twofold copy, process A space into kernel space into process b space. L4 provides single-copy transfers by temporarily sharing the target region with the sender.

Scheduling, Conventional
Conventionally, ipc operations call or reply & receive requires scheduling actions: Delete sending thread from the ready queue. Insert sending thread into the waiting queue Delete the receiving thread from the waiting queue. Insert receiving thread into the ready queue. These operations, together with 4 expected TLB misses will take at least 1.2 us (23%T).

Solution, Lazy Scheduling
Conventional IPC requires updating of thread scheduler queues. Performance can be improved by delaying the movement of threads within/between queues until the queues are queried. This ``lazy'' scheduling is achieved by setting state flags (ready / waiting) in the Thread Control Blocks (tcb – contains basic information about a thread) and then scanning queues at query time for threads which should be moved to different queues.

Pass Short Messages in Register
Typically, a high proportion of messages are very short, 8 bytes (plus 8 bytes of sender id). Examples would be ack/error replies from device drivers or hardware initiated interrupt messages. The 486 processor had enough registers to allow direct transfer of short messages via cpu registers. Performance gain of 2.4 us or 48%T.

IPC Performance For an eight byte message, ipc time for L3 is 5.2 us compared to 115 us for Mach, a 22 fold improvement. For large message (4K) a 3 fold improvement is seen.

Monolithic Kernel vs. Microkernel
While improved ipc performance is good, what is the impact of the improvements on real applications? This is what the user is really concerned with! Linux – native Linux L4Linux – Linux on top of L4 microkernel MkLinux – Linux on top of mach like 1st generation microkernel L4 incorporates the improvements outlined in the first paper and these benchmarks demonstrate that running a real operating system on top of the micro-kernel results in a maximum throughput penalty of 5% - 10% compared to native Linux. The corresponding penalty for MkLinux, a Linux version running on top of a first generation Mach-derived microkernel is 5X to 7X.

L4 Performance

Conclusion Use a synergistic approach to achieve greater ipc performance, a single “silver bullet” may not exist. A thorough understanding of the interaction between the hardware architecture and the operating system is key to many of the improvements. Microkernels are not portable between hardware architectures. L4 demonstrated the viability of running applications on top of a micro-kernel.

References http://i30www.ira.de/aboutus/people/liedtke/inmemoriam.php
Microkernels; Ulfar Erlingsson, Athanasios Kyparlis Monolithic Kernel vs. Microkernel; Benjamin Roch; TU Wien

Improving IPC by Kernel Design

Similar presentations

Presentation on theme: "Improving IPC by Kernel Design"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improving IPC by Kernel Design

Similar presentations

Presentation on theme: "Improving IPC by Kernel Design"— Presentation transcript:

Similar presentations

About project

Feedback