Download presentation
Presentation is loading. Please wait.
Published byClemence Underwood Modified over 6 years ago
1
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schupbach, and Akhilesh Singhania The multikernel: a new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP '09). ACM, New York, NY, USA,
2
Introduction Increasing number of multicore systems
Similar to HPC systems Generic more OS intensive workload Shared memory w/lock protected structures causes scalability problems Solutions: Message passing between cores Hardware independent system structure Replicated not shared data state
3
Motivations Diverse hardware Optimization Issues
Specific hardware exploits and optimizations cannot be used for other hardware Heterogeneous cores Interconnects mirror messaging passing Optimization Issues Sun Niagara read-writer lock Windows 7 Dispatcher lock
4
Messaging vs. Shared Memory
Messaging Issues: No access to shared data Event styled programming Claims: Shared data convenience is superficial Fine tuning requires developers think in terms of cache-coherency messages and protocols. Monolithic kernels are essentially event-driven Message passing used in user interfaces, some network servers, and large scale computations
5
Multikernel All inter-core communication is explicit
Only shared memory are message passing channels Exposes what is accessed by who and when Pipelining and batching optimizations Calls can be made and work can continue (like FlexSC) OS structure is hardware independent Only CPUs, devices, and message passing systems are architecture specific Replicated not shared data state No shared memory and Increases scalability
6
Goals Comparable performance to existing systems
Evidence of scalability Retargeted to different hardware Can use pipelined and batched messages to improve performance Adapt OS functionality to load or hardware
7
Implementation Test platforms System Structure 2x4 - Intel Xeon
2x2, 4x4, 8x4 - AMD Opteron System Structure Privileged mode CPU driver (local to core) User mode monitor (inter-core communication) Microkernel design
8
Details CPU Driver Monitors Processes
Provides protection, time slices, access to hardware Event driven, single threaded, non-preemptable Local messaging between processes Monitors Coordinate system state User space, schedulable State replication/Data consistency by agreement protocol Process wakeup, IPC setup, Core idle Processes Dispatch objects Scheduled by local CPU driver
9
Details Inter-core communication
User level RPC (shared memory channel) Receiving through polling Optimized for cache coherency protocol Fewer intra-core context switches
10
Details Memory management Shared address space
Need consistent memory allocation Capability based Shared address space Either share a hardware page table or replicate with messages Also need shared capabilities between cores Monitors provide sharing between cores Dispatchers can start/stop and migrate threads
11
Details System Knowledge base Other thoughts
Used to store hardware information Other thoughts CPU/Monitor separation, Network stack, Shared state
12
TLB Shootdown Requires global synchronization
Messages instead of IPI (IPI low latency, but invasive)
13
Computation Used OpenMP and Splash-2 on a 4x4 AMD system
Showed similar performance to Linux on FFT, Barnes Hut and radiosity computations
14
Other Measurements Network throughput equal ≈ 951 Mbit/s Web server
Barrelfish – 640 Mbit/s Linux – 316 Mbit/s Avoids context switches by running in user-space IP Loopback Barrelfish – 2154 Mbit/s Linux – 1823 Mbit/s Also avoids kernel crossings and shared memory
15
Related Work Performance optimization by reducing sharing
Tornado, K42, Corey Microkernel vs Multikernel Similar message passing design Core management is different Other distributed systems Higher latency links
16
Future Work Improving the message queues
Efficient message passing based on cache coherence Port to ARM processors File systems – Current is backed by NFS
17
Critique Good idea with message passing between multiple individually managed cores and state replication Separation of CPU driver and monitor Similar to microkernel design Message passing in Linux? HPC/server specific benchmarks Multicore OS benchmarks: we can do better Possible approach to improve virtualization?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.