K42: Building a Complete OS Orran Krieger, Marc Auslander, Bryan Rosenburg, Robert Wisniewski, Jimi Xenidis, Dilma Da Silva, Michal Ostrowski, Jonathan Appavoo, Maria Butrico, Mark Mergen, Amos Waterland, Volkmar Uhlig
How it all started
Our Predictions 1996 Microsoft Windows will dominate Large-scale SMMP increasingly important Within 5 years multi-core pervasive Traditional OS structures not maintainable Customizability and extensibility critical Within 5 years 64-bit pervasive Sufficient motivation to design entirely new OS. Small aggressive research team.
Resulting K42 Goals performance/scalability: –up to large MP and large applications –down for small-scale MP and small apps on large-scale MP flexibility/customizability: –policies/implementations of resource instances can be customized to application needs –system can adapt without penalizing common case performance applicability –full functionality with multiple personalities –support client to embedded to server wide availability –release open source and build community –highly maintainable/extensible structure enable problem domain experts re-enable architectural innovation re-enable OS research community
Technical directions 1996 Micro-kernel design User-level implementation OO design Extensive infrastructure & programming model Pervasive exploitation of 64 bits Application manager for fault containment Micro-kernel Servers Legacy OS emulation K42 lib Application Legacy OS emulation K42 lib Application
Key technology/work
Memory Processors Service Interface External Service Requests Software Structures Memory Processors Add brick Memory Processors Scale up Scaling existing OSes Incremental approach of optimizing global data structures/policies … focuses on concurrency rather than locality. Poor scaling of SW requires major HW investments to compensate, resulting in: Systems that are not cost competitive. Limits to the system scalability.
Memory Processors Service Interface External Service Requests Memory Processors Add brick Memory Processors Scale up Software Structures Our solution Key elements of our solution: System services that avoid sharing when possible. OO design with per-resource instance objects Exploit sharing where workload demands or where performance is not critical. Tools to identify sharing problems and develop basic design methodology and set of tools to simplify the task of fixing the SW.
Independent workloads
Memclone benchmark: Memory intensive parallel application
Customization User-level implementation allows per-application customizations. Framework per service designed to: –Separate mechanism/policy that can be independently customized. –Application or agents can determine which implementation to use for workload. Dynamic customizations: patches/updates, adaptive algorithms, specializing common case, monitoring, application optimizations –Hot swapping: replacing O1 with O2 to adapt to new demands –Dynamic upgrade: replace all objects of a type
Hot-swapping Adaptive paging Adaptive file imp.
Infrastructure & Programming model Clustered objects Pervasive use of RCU to avoid existence locks Event based programming model Performance monitoring Scalable services –Protected Procedure calls –Locality aware memory allocation –Processor specific memory Automated interface generator/xobject services automate security, garbage collecting, …
Massive investment in/on Linux In late 90s Linux appeared to be taking off & we abandoned multiple personalities Linux API/ABI compatibility largely in library, exceptions: –Server code for process groups, ptys… –Fork has had way too pervasive impact on kernel MM (we violated our programming style). Support both unmodified glibc via trap reflection, and modified glibc. Applications with specialized needs can reach past Linux personality, e.g., to instantiate object, handle events… We are also compatible with Linux kernel modules, including device drivers, FS & TCP/IP stack: –Tracking Linux is an ongoing nightmare
Bad predictions, mistakes and questions
Our Predictions 1996 Microsoft Windows will dominate Wasted huge amount of time on multiple personality support. Large-scale SMMP increasingly important. –True, but much slower than expected. –Massive investment in HW: allows existing OSes to run reasonably well Makes SMMP not cost effective Within 5 years multi-core pervasive –Only common today, not compelling differentiator until now Traditional OS structures not maintainable. Customizability and extensibility critical Within 5 years 64-bit pervasive. –Only common today, this has been a huge barrier to building community
Mistakes/Questions We should have had a 32-bit version. Application manager was a bad idea, we totally missed on virtualization: –Gets rid of the device driver nightmare –Can deploy new OS to solve subset of problem. While user-level implementation & micro-kernel clean, continuous challenge & orthogonal to OO design We implemented fork wrong!!! OO design, and infrastructure, obscures control flow: –Much more difficult for Linux hacker to gain broad understanding. –Requires more sophisticated debugging tools. Does OO really help maintainability?
Concluding remarks
The good news High degree of functionality: –32 & 64 bit apps, support standard gentoo tree, MPI. –Applications/benchmarks include SPEC SDET, ReAIM, SPECfp, many HPC apps (DARPA & DOE) –Recently provided enough support to run commercial JVM (J9) and DB2. Object-oriented design has advantages... –have found special casing easy –hot-swapping simpler than adaptive algorithm –Clustered objects relatively simple to do –local fixes, publish interface not structure –Domain experts/students can easily develop specialized component. Have been able to work around global policies, e.g., paging.
The good news General performance monitoring infrastructure key to identifying problems. We achieved excellent base performance (although since degraded); can compensate for intrinsic overheads: –advantages of Linux's hierarchical page tables: exception level traversal, identify PT entry for fast unmap and avoid segment unmapping, aggressive fork pre-mapping for anonymous memory –user-level implementation: cost initialization, page fault costs on fork –OO design: indirections, code replication, poor instruction cache locality, per-object data structures… –initialization costs of scalable implementations compensate by lazy initialization & hot swapping/specialization…
Ongoing projects IBM PERCS for DARPA HPCS –PEM and CPO & architectural evaluation DOE/FastOS –HEC with K42 at LBL, UNM, UofToronto –SmartApps at Texas A&M New South Wales (dynamic upgrade) Device drive I/O & Super page support with LTC
Concluding remarks Sufficiently functionality & performance to run real workloads. A great framework for fundamental OS research and HW architecture studies. Basic architecture/technologies largely successful. Virtualization, pervasive 64-bit processors, and pervasive multi-core makes design more relevant than at any time in project history. Most of IBM team no longer have K42 as day job, but are still passionate about it: –We continue to be excited to support community. –We are actively soliciting people to take over parts of the system.