Group Scheduling in System Software Michael Frisbie, Douglas Niehaus University of Kansas niehaus@ittc.ku.edu Venkita Subramonian, Christopher Gill Computer Science and Engineering Washington University cdgill@cse.wustl.edu
Motivation Real-time and embedded systems have widely varying computation semantics Varying policies for controlling specific computations Varying levels of focus for systems Single purpose embedded systems General purpose systems with a RT application Multi-media, machine control and user-interface No single policy is likely to be appropriate for all
Motivation Computation in computer systems is not exclusively done under the publicly exposed scheduling model OS computational components Interrupts, SoftIRQs, Tasklets, and Bottom halves Execute outside exposed scheduling policies Manifest as noise in the scheduling model Active middleware Linux PThreads library, TAO Manifest as competing load
Goals Highly configurable framework within which a wide range of policies can be specified Selection of predefined scheduling semantics Implementation of customized schedulers Application computation oriented representation Representation of all computation components on system under scheduling framework Current semantics available as default policies Requires some new types of information
Platform Linux Growing popularity for real-time and embedded Middleware version for portability and range of mechanism control KU Real-Time Linux (KURT-Linux) OS computation components integration Interrupt handling modifications Number of related projects Data Streams Performance Evaluation Framework Ability to gather detailed scenario-oriented data
Group Scheduling Application computation centric scheduling view Computations are implemented by a group of one or more computation components Threads, IRQ handlers, SoftIRQ, Tasklets, BH's Flexible framework for composing and configuring the system scheduling decision function (SSDF) SSDF chooses the computation component using the CPU at any given time Framework explicitly supports description of both computations and relations among computations
Group Structure Group: a set of computation components with an associated Scheduling Decision Function (SDF) Elements within a group can be threads, other groups, or other computation components Elements can belong to more than one group Scheduling decision tree (SDT) composed of one or more groups Control semantics for computation components SDT for computations are composed to form the System Scheduling Decision Tree (SSDT)
System Scheduling Decision Tree (SSDT) Controls the system's computation components Explicitly or implicitly Ultimate goal is to make all of it explicit and easily configurable across a wide semantic range Can co-exist with the default system scheduler Semantic hooks SSDT invocation before default scheduler (DS) Method of making DS skip components under SSDT control
First-Refusal (FR) SSDT FR-SSDT uses a sequential SDF at the top level SDT controlling components under group scheduling model has first refusal Linux SDF (default scheduler) makes the decision if no component under the group model should run Exclusion of components from DS ensures precise control as needed
MLFQ SDT Example Top level priority SDF maintains the priority equivalence class view Each priority class is a group using a round-robin policy to share the CPU among members Dynamic priority adjustment of processes can move them among priority classes
Related Work Hierarchical Scheduling Regher and Stankovic (RTSS 2001) Likely computationally equivalent (capability) Distinguished by which abstractions are emphasized CPU Inheritance Scheduling Ford and Susarla (Flux Project) Group scheduling emphasizes Application structure reflected in groups Integration of all computation components Interrupts, tasklets, etc
Kernel Implementation Modifies the default Linux scheduler to permit the GS framework to have a chance to choose Hook to make default scheduler exclude a component is the most subtle change Changes to existing code to consult the exclusion notation, rather than trying to remove the component from the base data structure Control for components other than threads is the most significant feature for real-time systems
Middleware Implementation Currently controls only threads at the user level Part of DARPA PCES2 project Layering on top of supplied Linux scheduler requires indirect control through available mechanisms Separates managing and managed threads into equivalence classes to determine CPU use Uses Fixed Priority POSIX scheduling model as implemented by Linux SCHED_FIFO
Middleware Implementation Scheduler has two threads SSDT thread selects current thread API thread processes group operation requests Block Catcher detects when current thread blocks Signals SSDT thread Uses SIGSTOP and SIGCONT to control availability of thread for execution Model is incomplete because it cannot know when a previously blocked thread becomes unblocked
Thread Priority Classes Reaper spawns scheduler and then blocks Scheduler SSDT thread chooses current thread API thread processes group operation requests Block Catcher detects current thread block, signals SSDT Non-current threads are both at lower priority and SIGSTOP Linux threads at level 0
Context Switch Event Sequence Thread A is current thread Timer or other event blocks or pre-empts Thread A Scheduling Thread runs and selects Thread B, blocks in nanosleep Context switch to Thread B begins its execution
Middleware Implementation Tradeoffs Portable standards based implementation POSIX fixed priority scheduling Socket based group API access Significantly greater context switch delay compared to existing kernel based implementation SSDT thread context switch and Block Catcher as well if current thread blocks Most significant need is SSDT thread notification that a threads unblocks Scheduler Activations
Performance Evaluation Metric Scheduling overhead Context switch latency (A to B) Parameters Number of Processes CPU bound or I/O bound User/Kernel Implementation Others Signal delivery details and semantics
Context Switch Overhead - Kernel
Kernel Performance Constant with respect to CPU or I/O bound Considerably lower than MW version Simple SSDT Does not require signal delivery
Context Switch Overhead – Compute Bound – MW
Context Switch Overhead – Blocking – MW
Middleware Performance Different with respect to CPU or I/O bound Requires signal delivery Block Catcher mechanism adds latency Considerably higher than kernel version Simple SSDT Some extension to existing system semantics required for completeness Unblocking notification upcall
Group Scheduling – Summary Provides a flexible control framework Within which resource control and Distributed end-to-end scheduling constraints can be expressed and enforced Portable middleware version Limited by lack of unblocking notification upcall Implementation under KURT-Linux is simple ACE system call wrappers VxWorks threads state change notification
Current Status Integration of all KURT-Linux OS computational components under group scheduling framework Recently completed Michael Frisbie’s Master’s Thesis topic We are currently working on Group Scheduling control of service classes in Event Channel TAO based computations Includes control of middleware threads, queues, etc.
Future Work Middleware use of group scheduling to provide support for service classes in Event Channel and TAO Concurrency constraint representation in KURT- Linux to permit fine grain computation component control under group scheduling Experimentation with application aware scheduling decision functions Integrated DSKI/DSUI instrumentation to diagnose/deduce scheduling-related optimizations and fine-grain points of inefficiency (cruft sleuthing)