Download presentation
Presentation is loading. Please wait.
Published byNoreen Merritt Modified over 9 years ago
1
Scheduling as a Learned Art* Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius {cdgill, wds, ttidwell, rlg1}@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA Fourth International Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT 2008) July 1, 2008, Prague, Czech Republic *Research supported in part by NSF awards CNS-0716764 ( Cybertrust ) and CCF-0448562 (CAREER)
2
2 - Gill et al. – 9/14/2015 Motivation: Systems with (some) Autonomy Interact with variable environment »Varying degrees of autonomy »Performance is deadline sensitive Many activities must run at once »Device interrupt handing, computation »Comm w/ other systems/operators Need reliable activity execution »Scheduling with shared resources and competing, variable execution times »How to guarantee utilizations? Remote Operator Station (for all but full autonomy) Wireless Communication Lewis Media and Machines Lab Washington University St. Louis, MO, USA
3
3 - Gill et al. – 9/14/2015 More Generally, Open Soft Real-Time Systems Questions of interest are relevant well beyond mobile robotics »Robotics is a good touchstone, though »In many systems, platform features interact with physical environment »Especially with increased embedding of OS/RTOS platforms everywhere ;-) Abstract view of the problem »Diverse concurrent application tasks »Task execution times are variable »(Soft) deadlines on application tasks »Resources shared among tasks »Need methods to design and verify scheduling policies accordingly What Other Kinds of Embedded Systems Have Similar Platform Constraints?
4
4 - Gill et al. – 9/14/2015 Current System Model Threads of execution depend on a shared resource »Require mutually exclusive access (e.g., to a CPU) to run Each thread binds the resource when it runs »A thread binds resource for a duration then releases it »Model duration with integer variables: count time quanta Variable execution times with known distributions »We assume that each thread’s run-time distribution is known and bounded, and independent of the others Non-preemptive scheduler (repeats perpetually) »Scheduler chooses which thread to run (based on policy) »Scheduler dispatches thread which runs until it yields »Scheduler waits until the thread releases the resource
5
5 - Gill et al. – 9/14/2015 Uncertainty (but with Observability Post-Hoc) time probability time probability We summarize system state as a vector of integers »Represent thread utilizations Threads’ run times come from known, bounded distributions Scheduling a thread changes the system’s (utilization) state »Utilization is observed after the thread runs based on its run time »State transition probabilities are based the run time distributions This forms a basis for policy design and optimization From Tidwell et al., ATC 2008
6
6 - Gill et al. – 9/14/2015 From Thread Run Times to a Scheduling Policy We model thread scheduling decisions as a Markov Decision Process (MDP) based on thread run times (From ATC ‘08) MDP is given by 4-tuple: (X,A,R,T) »X: set of process states (i.e., thread utilization states) »A: set of actions (i.e., scheduling a particular thread) »R: reward function for taking an action in a state Expected utility of taking that action Distance of the next state(s) from a desired utilization (vector) »T: transition function For each action, encodes the probability of moving from a given state to another state Solve MDP: optimal (per accumulated reward) policy Fold periodic states: smaller space (recent advance)
7
7 - Gill et al. – 9/14/2015 Partial Observability Local CPU usage is pretty easy to observe exactly »E.g., using Pentium tick counter, or other good time source However, other key properties are noisier »E.g., robot location indoors No GPS “position sensor”, wheel slip etc. adds noise during motion »How does this relate to scheduling? What if we consider robot’s progress along a navigation path … … as an activity which must compete for resources with others? Then, robot’s position becomes part of the scheduling state Similar issues may arise for other scheduling cases (e.g., in CPS) Noise in observation produces partial observability »E.g., multiple different positions can be equally likely » Possible approach: Partially Observable MDPs (POMDPs) Reason on belief states to get MDP transition function (a big space)
8
8 - Gill et al. – 9/14/2015 Observation Lag State observations also may incur temporal lag »E.g., detailed scan of area with a range finding laser »However, during time it takes to scan, time passes »Robot or environment may move while scan is being done As with partial observability, need a new extension to basic MDP model to address observation lag »In Semi-MDPs (SMDPs), an action causes 1 state change »SMDP extensions to MDPs exist for finding optimal policy
9
9 - Gill et al. – 9/14/2015 Neglect Tolerance Need to schedule >1 entire-system behavior at once »Can transform into scheduling interim sub-tasks as before »However, a behavior has own (possibly dynamic) structure »Navigation to cover a room, while mapping its boundary Resource contention, control/data dependence »Scheduling becomes a multi-criteria optimization »Sub-tasks may have (potentially hard) deadlines »E.g., decide to turn or stop before hitting a wall Spectrum: remote control to complete autonomy »Higher neglect tolerance needs more on-board scheduling »Uncertainty, observability, temporal lag issues as before »Open problem: formalize tractably, model parametrically »Multi-disciplinary (RT/ML) approach so far is still needed
10
10 - Gill et al. – 9/14/2015 Learning (aka “Good Scheduler, Bad Scheduler”) We base scheduling decisions on a value function »Captures state-action notion of long-term utility Based on expected rewards from current and future actions »But, knowing complete distributions is daunting in practice Reinforcement learning appears promising for this »A stochastic variant of dynamic programming »Control decisions learned from direct observation Start by dividing time into discrete steps »At each step, system is in one of a discrete set of states »Scheduler observes state, chooses action from finite set »Running action changes system state at next time step »Scheduler receives reward for immediate effect of action »Estimates value function, resulting model is exactly MDP
11
11 - Gill et al. – 9/14/2015 Related Work Reference monitor approaches »Interposition architectures E.g., Ostia: user/kernel-level (Garfinkel et al.) »Separation kernels E.g., ARINC-653, MILS (Vanfleet et al.) Scheduling policy design »Hierarchical scheduling E.g., HLS and its extensions (Regehr et al.) E.g., Group scheduling (Niehaus et al.) State space construction and verification »(Timed automata) model checking E.g., IF (Sifakis et al.) »Quasi-cyclic state space reduction E.g., Bogor (Robby et al.)
12
12 - Gill et al. – 9/14/2015 Concluding Remarks MDP approach maintains rational scheduling control »Even when thread run times vary stochastically »Encodes rather than presupposes utilizations »Allows policy verification (e.g., over utilization states) Ongoing and Future Work »State space reduction via quasi-cyclic structure »Verification over continuous/discrete states »Kernel-level non-bypassable policy enforcement »Automated learning to discover scheduling policies E.g., via RL for MDPs, POMDPs, SMDPs Project web page »Supported by NSF grant CNS-0716764 »http://www.cse.wustl.edu/~cdgill/Cybertrust/ »
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.