Improving IPC by Kernel Design

Slides:



Advertisements
Similar presentations
Slide 19-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 19.
Advertisements

Threads, SMP, and Microkernels
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Improving IPC by Kernel Design Jochen Liedtke Slides based on a presentation by Rebekah Leslie.
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
MACHINE-INDEPENDENT VIRTUAL MEMORY MANAGEMENT FOR PAGED UNIPROCESSOR AND MULTIPROCESSOR ARCHITECTURES R. Rashid, A. Tevanian, M. Young, D. Golub, R. Baron,
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Computer Systems/Operating Systems - Class 8
Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.
1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.
CS533 Concepts of Operating Systems Class 6 Micro-kernels Mach vs L3 vs L4.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
Scheduler Activations Effective Kernel Support for the User-Level Management of Parallelism.
Improving IPC by Kernel Design Jochen Liedtke Presented by Ahmed Badran.
Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.
Inter Process Communication:  It is an essential aspect of process management. By allowing processes to communicate with each other: 1.We can synchronize.
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
Improving IPC by Kernel Design Jochen Liedtke Shane Matthews Portland State University.
Microkernels: Mach and L4
1 I/O Management in Representative Operating Systems.
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
CS533 Concepts of Operating Systems Class 6 The Performance of Micro- Kernel Based Systems.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
CS533 Concepts of Operating Systems Jonathan Walpole.
1 Micro-kernel. 2 Key points Microkernel provides minimal abstractions –Address space, threads, IPC Abstractions –… are machine independent –But implementation.
Operating System 4 THREADS, SMP AND MICROKERNELS
Providing Policy Control Over Object Operations in a Mach Based System By Abhilash Chouksey
The Performance of Microkernel-Based Systems
CS533 Concepts of Operating Systems Jonathan Walpole.
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.
Processes Introduction to Operating Systems: Module 3.
A summary by Nick Rayner for PSU CS533, Spring 2006
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
1.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Lecture 2: OS Structures (Chapter 2.7)
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Tim Hamilton.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
Introduction to Kernel
The Mach System Sri Ramkrishna.
Lecture 12 Virtual Memory.
Memory Caches & TLB Virtual Memory
Chapter 9: Virtual-Memory Management
Recap OS manages and arbitrates resources
Improving IPC by Kernel Design
Chapter 1 Introduction to Operating System Part 5
Lecture 4- Threads, SMP, and Microkernels
Fast Communication and User Level Parallelism
Improving IPC by Kernel Design
Translation Buffers (TLB’s)
Chapter 2: Operating-System Structures
Outline Chapter 2 (cont) OS Design OS structure
Improving IPC by Kernel Design
System calls….. C-program->POSIX call
Outline Chapter 2 (cont) Chapter 3: Processes Virtual machines
Chapter 2: Operating-System Structures
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Improving IPC by Kernel Design Jochen Liedtke German National Research for Computer Science Presented by Emalayan

Agenda Introduction to Micro Kernal Design Objectives L3 & Mach Architecture Design Architectural Level Algorithamatic Level Interface Level Coding Level Resutls Conclusion

Micro Kernels Micro kernels Popular Micro kernels - L3 & Mach Microkernel architectures introduce a heavy reliance on IPC, particularly in modular systems Mach pioneered an approach to highly modular and configurable systems, but had poor IPC performance Poor performance leads people to avoid microkernels entirely, or architect their design to reduce IPC Paper examines a performance oriented design process and specific optimizations that achieve good performance Popular Micro kernels - L3 & Mach - The rationale behind its design is to separate mechanism from policy allowing many policy decisions to be made at user level

Design Objectives IPC performance is the primary objective Design discussions before implementation Poor performance replacements Top to Bottom detailed design (From architecture to coding level) Consider synergetic effects Design with concrete basis Design with concrete performance

L3 & Mach Architecture Similarities Difference Tasks, threads Virtual memory subsystem with external pager interface IPC via messages L3 - Synchronous IPC via threads Mach – Asynchronous IPC via ports Difference Ports L3 is used as the workbench

Performance Objective Finding out the best possible case - NULL message transfer using IPC Goal - 350 cycles (7 us) per short message transfer

Approach Needed Synergetic approach in Design and Implementation guided by IPC requirements Architectural Level Algorithm Level Interface Level Coding Level

Architectural Level Design System Calls - System calls are expensive (2 us) - IPC implementation with minimum system calls. - New system calls introduced. call() reply & receive next() - Totally around 4 us saved by reducing two system calls (instead of send(), receive(), call(), reply() )

Architectural Level Design … b) Messages - A message is a collection of objects passed to the kernel - Kernel is responsible for delivering the message - A message can contain direct strings, indirect strings, flex pages, data spaces. - Buffer dope and Send dope are compulsory for messages Sending Complex messages reduces system calls and IPC because of reduction in address space crossing

Architectural Level Design … c) Direct Transfer by temporary mapping - Two copy message transfer costs 20 + 0.75n cycles - L3 copies data once to a special communication window in kernel space - Communication Window is mapped to the receiver for the duration of the call (page directory entry) B kernel A copy mapped with kernel-only permission add mapping to space B

Architectural Level Design … d) Thread Control Blocks - Hold kernel and hardware level thread specific data - Every operation on a thread requires lookup, and possibly modification, of that thread’s TCB - Provide faster access by means of array offsets - It saves 3 TLB misses per IPC

Algorithamic Level Thread Identifiers Handling virtual queues Timeouts and Wakeups Direct Process Switch - Scheduler is not invoked between process context switch - No sender may dominate the receiver - Polling threads are queued, kernel does not buffer messages e) Short Messages via Registers - direct transfers of messages - gain of 2.4 us (48%) achieved

Algorithamic Level … f) Lazy Scheduling Scheduler maintains several queues to keep track relevant thread-state information Ready queue stores threads that are able to run Wakeup queues store threads that are blocked waiting for an IPC operation to complete or timeout (organized by region) Polling-me queue stores threads waiting to send to some thread Efficient representation of data structures Queues are stored as doubly-linked lists distributed across TCBs Scheduling never causes page faults

Interface Level Design Avoiding unnecessary copies Parameter passing - Registers can be used - Input and /Output parameters in register give better chance for compilers

Coding Level Design Cache aware design b) Architecture aware design - Reducing Cache misses - Minimizing TLB misses b) Architecture aware design - Segment Registers - General Registers - Avoiding jumps and checks

Summary of Techniques Table completely ignores the synergetic effects. Direct message transfer dominates for large messages For short messages register transfer works well

Performance Comparison Measured using pingpong micro-benchmark that makes use of unified send/receive calls For an n-byte message, the cost is 7 + 0.02n s in L3

Performance Comparison Same benchmark with larger messages. For n-byte messages larger than 2k, cache misses increase and the IPC time is 10 + 0.04n s Slightly higher base cost Higher per-byte cost By comparison, Mach takes 120 + 0.08n s

Conclusion Efficient and effective IPC is mandatory for micro kernel design , which was a major limitation of Mach L3 demonstrates that good performance (22 times faster) by means of above techniques. Techniques demonstrated in the paper can be employed in any system, even if the specific optimizations cannot

THANK YOU

Questions Monolithic Kernels perform better than L3 or even L4 (written in assembly!). Why should I even bother with micro kernels and their ever-increasing IPC performance. The whole idea of IPC message passing seems to be plagued by performance problems. It would be interesting to know if some of the assembly-level hacks (discussed in this paper) been implemented in other production OS?

Questions The paper talks about an IPC message having a "direct string", some number of "indirect strings", and some number of "memory objects". Then they later discuss the idea of optimizing through registers the case of an IPC with less than 8 bytes of payload. How exactly does this work? Is there a separate system call for "small IPC"? It's not obvious how two registers containing arbitrary integer values map to the structured message concept involving strings and memory objects.

Questions Majority of the performance gain in L3 is by pushing out ports right checking and message validity checking.How does this pushing of checking code into user space affect the performance(assuming similar checking is done in user space) and security?

Questions In section 5.6, the authors discuss how Mach's "mach_thread_self“ call took nine times more time than the bare machine time for a user/kernel/user call sequence. However, this system call is complete unrelated to IPC. In fact, it shows that Mach's system call mechanism is somewhat inefficient in general. Can we blame this for the difference between Mach's and L3's IPC performance, rather than blaming the implementation of IPC itself?

Questions Can the techniques proposed here in the paper be utilized in state-of-the-art mutli-core platforms, will these optimizations totally valid or should there still be some subtle problems that we may need to take care of? Can the techniques proposed be adopted into a regular kernel based system?

Questions I do not think using register as a mechanism to pass data, is a good idea. To use that don't we have to have intelligent scheduler for that to make sure the correct thread access the register to read the data ? Doing so many modifications to the OS, wouldn't it have an impact on the other operations in the OS ? I mean to which extent IPC counts towards the overall performance of the OS ?