Download presentation
Presentation is loading. Please wait.
1
Presented By Srinivas Sundaravaradan
2
MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar to MACH Hardware Interrupts delivered through messages No Ports
3
Design Philosophy Focus on IPC Any Feature that will increase cost must be closely evaluated. When in doubt, design in favor of IPC Design for Performance A poorly performing technique is unacceptable Evaluate feature cost compared to concrete baseline Aim for a concrete performance goal Comprehensive Design Consider synergistic effects of all methods and techniques Cover all levels of implementation, from design to code
4
Making IPC faster Fewer Call / Reply & Receive Next Combining messages Faster 15 other optimizations Architectural level Use redesign of L3 as opportunity to change kernel design
5
Methodology Theoretical minimum Null message between address spaces receiver is ready to receive it 107 cycles to enter & leave kernel 45 cycles for TLB misses 172 cycles Goal 350 cycles Achieved 250 cycles = T
6
Minimize system calls Why minimize system calls ? 60% of T Traditional IPC 4 system calls Solution Call Reply & Receive next
7
Minimize system calls Unblocked Blocked Send Receive (reply) Send (reply) Receive (next) Blocked Unblocked ClientServer Call Reply and receive next Receive
8
Complex Message Direct String Data to be transferred directly from send buffer to receive buffer Indirect String Location and size of data to be transferred by reference Memory Object Description of a region of memory to be mapped in receiver address space (shared memory) A Complex Message
9
Ways of Message Transfer Twofold Message Copy user space A -> kernel space -> user space B LRPC mechanism share user-level memory secure ? does not support variable-to-variable transfer
10
Temporary Mapping… Two copy message transfer costs 20 + 0.75n cycles L3 copies data once to a special communication window in kernel space Window is mapped to the receiver for the duration of the call (page directory entry) kernel copy mapped with kernel-only permission add mapping to space B
11
Temporary Mapping… Top-level Page table 2nd-level tables frames in memory
12
Temporary Mapping
13
Lazy Scheduling Scheduler overhead is significant component of IPC cost Threads doing IPC are often moved to wait queue only to be inserted back again onto the ready queue. Lazy Scheduling avoid locking of queues queue manipulation is avoided instruction execution TLB misses
14
Use registers for short messages Messages are usually short ! ack/error replies from drivers hardware interrupt messages Intel 486 processor 7 general purpose registers sender info, data May not work for CPU’s with fewer registers
15
Summary of Optimizations Architectural System Calls, Messages, Direct Transfer, Strict Process Orientation, Thread Control Blocks Algorithmic Thread Identifier, Virtual Queues, Timeouts/Wakeups, Lazy Scheduling, Direct Process Switch, Short messages Interface Unnecessary Copies, Parameter passing Coding Cache misses, TLB misses, Segment registers, General registers, Jumps and Checks, Process Switch
16
Results…
17
Results
18
Conclusions L3’s message passing was 22 times faster than that of MACH Kernel redesign focused mainly on IPC Caveats Ports and Buffering Specific to the architecture
19
Thank You !
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.