Chris Gill Associate Professor

Scheduling Policy Design for Stochastic Non-preemptive Real-time Systems*
Chris Gill Associate Professor Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE 131 Guest Lecture September 12, 2011 *Research supported by NSF grants CNS (Cybertrust) and CCF (CAREER) and driven by numerous contributions from Drs. Robert Glaubius (PhD 2009) and Terry Tidwell (PhD 2011); undergraduates Braden Sidoti, David Pilla, Justin Meden, Eli Lasker, Micah Wylde, Carter Bass, Cameron Cross, and Percy Fang; and Prof. William D. Smart

Why Good Schedules Are Needed
System resources are usually limited Adding more raises cost, weight, power So, there’s rarely enough to go around Activities contend for resources Aiming camera to find/photograph faces Aiming camera to find/avoid obstacles How to share camera between these? Scheduling access to resources Allows resources to be shared Raises many other interesting questions Lewis Media and Machines Lab Washington University

Why Good Schedules are Difficult to Find
How long an activity needs a resource may vary We’ll focus mainly on this issue in today’s talk Issues we’ve addressed beyond that basic problem We may have to learn distributions of times on-line Different distributions in different operating modes probability time Image capture times with occlusion modes

Developing a System Model: a Good Start
A system model helps capture a problem rigorously Gives a sound basis for reasoning about the problem Focuses attention on particular kinds of analysis Identifies the important abstractions to work with For example, resources, activities, and shares Captures key assumptions about the problem E.g., is time treated as discrete or continuous? E.g., is data available before, during, or after run-time?

Basic Scheduling Problem System Model
Time is considered to be discrete E.g., a Linux jiffy is the time quantum Separate activities require a shared resource Access is mutually exclusive (activity binds the resource) Binding intervals are independent and non-preemptive Each activity’s distribution of intervals is known up front Goal: guarantee each activity a utilization fraction For example, 1/2 and 1/2 or 1/3 and 2/3 Want to define a scheduling policy (decides which activity gets the resource when) that best fits that goal

Formal System Model Representation
A state space describes such a system model well Circles represent different combinations of utilizations Lower left corner is (0,0) Vertical transitions give quanta to one resource Horizontal transitions give quanta to the other one Dashed ray shows goal E.g., 1/3 vs 2/3 share (x, y) Number of dimensions is number of activities Generalizes to 3-D, …, n-D 0,3 1,3 2,3 3,3 0,2 1,2 2,2 3,2 0,1 1,1 2,1 3,1 RLG: The story arc here is to point at the image here (prev. seen on slide 5) and notice that the state space extends infinitely. Thus we can’t use policy iteration, since we can’t store the policies and value functions. We need a way to reduce the size of the problem. Our approach is to collapse the state space into a number of equivalent states. The next slide goes into more detail. 0,0 1,0 2,0 3,0

Dealing with Uncertainty
Easy if the resource is bound for one quantum at a time Just move closest to goal ray However, we have a probability distribution of binding times Multiple possibilities per action Need to consider probable consequences of each action Leads to our use of a Markov Decision Process (MDP) approach probability time probability time

From Binding Times to a Scheduling MDP
We model these scheduling decisions as a Markov Decision Process (MDP) over use of the resource The MDP is given by 4-tuple: (X,A,R,T) X: the set of resource utilization states (how much use) A: the set of actions (giving resource to an activity) R: reward function for taking an action in a state (how close to the goal ray are we likely to remain) T: transition function (probability of moving from one state to another state) Want to solve MDP to obtain a locally optimal policy

Policy Iteration Approach
Define a cost function r(x) that penalizes deviation from the target utilization ray Start with some initial policy 0 Repeat for t=0,1,2,… Compute the value Vt(x) -- the accumulated cost of following t -- for each state x. Obtain a new policy, t+1, by choosing the greedy action at each state. Guaranteed to converge to the optimal policy, requires storing Vt and t in lookup tables.

Can’t do Policy Iteration Quite Yet
Unfortunately, the state space we have is infinite Can’t apply MDP solution techniques directly to the state space as it stands Need to bound the state space to solve for a policy Our approach Reduce the state space to a set of equivalence classes RLG: The story arc here is to point at the image here (prev. seen on slide 5) and notice that the state space extends infinitely. Thus we can’t use policy iteration, since we can’t store the policies and value functions. We need a way to reduce the size of the problem. Our approach is to collapse the state space into a number of equivalent states. The next slide goes into more detail.

Insight: State Value Equivalence
Two states co-linear along the target ray have the same cost Also have the same relative distribution of costs over future states (independent actions) Any two states with the same cost have the same optimal value! RLG: Need to make sure to qualify statements on this slide -- the result that two states with the same cost have the same value is NOT generally true, and only works because of the self-similarity of transitions and costs across the entire state space. RLG: I’d tell the story as follows. We have a cost function that is based on the distance between x and the utilization ray (if questioned, the cost is actually the difference between x and the point where the utilization ray has the same cumulative utilization as x, ||x||). Under this definition of cost, states with displacement parallel to the utilization vector (colinear along the utilization ray) have equal cost [advance animation, point out two examples of pairs of states with equal cost]. Since threads in this model have independent invocations, every state has a similar distribution over future states as well. In the paper, we show that these two premises result in states with equal cost having equal value.

Technique: State Wrapping
This lets us collapse the equivalent states down into a set of exemplar states Notice how arrows (successors) wrap back into “earlier” states Now we can add “absorbing” states to bound the space Far enough from target ray, best decision is clear Now we can use policy iteration to obtain a policy RLG: There are a couple of things that are important to stress here. One is that the wrapping action is not sufficient to make the set of states finite, but that there are only finitely many states with cost less than epsilon for any epsilon -- e.g., there are only finitely many states “close” to target utilization for any closeness threshold. For any state sufficiently far away from target utilization, the greedy policy and optimal policy are (I believe) the same. Therefore, we can represent all of the states at sufficient distance by absorbing states, and pass control to the greedy policy if the system ever actually ends up in them (which it shouldn’t if our prior knowledge is correct and the system follows the computed policy).

Automating Model Discovery
ESPI: Expanding State Policy Iteration [3] Start with a policy that only reaches finitely many states from (0,…,0). E.g., always run the most underutilized task. Enumerate enough states to evaluate and improve that policy If policy can not be improved, stop Otherwise, repeat from (2) with newly improved policy

What About Scalability?
MDP representation allows consistent approximation of the optimal scheduling policy Empirically, bounded model and ESPI solutions appear to be near-optimal However, approach scales exponentially in number of tasks so while it may be good for (e.g.) sharing an actuator, it won’t apply directly to larger task sets

What our Policies Say about Scalability
To overcome limitations of MDP based approach, we focus attention on a restricted class of appropriate scheduling policies Examining the policies produced by the MDP based approach gives insights into choosing (and into parameterizing) appropriate policies

Two-task MDP Policy Scheduling policies induce a partition on a 2-D state space with boundary parallel to the share target Establish a decision offset d to identify the partition boundary Sufficient in 2-D, but what about in higher dimensions?

Time Horizons Suggest a Generalization
Ht={x : x1+x2+…+xn=t} u (0,0,2) u (0,2,0) The idea that needs to come across on this slide is that we can think about the problem of scheduling at every moment by partitioning time horizons. H0 H1 (0,0) (2,0,0) H0 H1 H2 H3 H4 H2

Three-task MDP Policy t =10 t =20 t =30 Action partitions meet along a decision ray that is parallel to the utilization ray

Parameterizing a Partition
Specify a decision offset at the intersection of partitions Anchor action vectors at the decision offset to approximate partitions A “conic” policy selects the action vector best aligned with the displacement between the query state and the decision offset a2 a1 x a3

Conic Policy Parameters
Decision offset d Action vectors a1,a2,…,an Sufficient to partition each time horizon into n regions Allows good policy parameters to be found through local search

Comparing Policies Policy found by ESPI (for small numbers of tasks)
πESPI(x) – chooses action at state x per solved MDP Simple heuristics (for all numbers of tasks) πunderused(x) – runs the most underutilized task πgreedy(x) – minimizes immediate cost from state x Conic approach (for all numbers of tasks) πconic(x) – selects action with best aligned action vector

Policy Comparison on a 4 Task Problem
Task durations: random histograms over [2,32] 100 iterations of Monte Carlo conic parameter search ESPI outperforms, conic eventually approximates well

Policy Comparison on a Ten Task Problem
Repeated the same experiment for 10 tasks ESPI is omitted (intractable here) Conic outperforms greedy & underutilized heuristics

Comparison with Varying #s of Tasks
100 independent problems for each # (avg, 95% conf) ESPI only tractable through all 2 and 3 task cases Conic approximates ESPI, then outperforms others

Expanding our Notion of Utility
Previously, utility was proximity to utilization target; now we let tasks’ utility and job availability* vary time-utility function (TUF) name termination time period boundary period boundary termination time Time * Availability variable qi is defined over {0,1}; {0, tmi/pi }; or {0,1} tmi/pi

Utility × Execution  Utility Density
A task’s time-utility function and its execution time distribution (e.g., Di(1) = Di(2) = 50%) give a distribution of utility for scheduling the task

Actions and State Space Structure
State space can be more compact here than before: dimensions are task availability, e.g., over (q1, q2), vs. time Can wrap the state space over the hyper-period of all tasks (e.g., D1(1) = D2(1) = 1; tm1 = p1 = 4; tm2 = p2 = 2) Scheduling actions induce a transition structure over states (e.g., idle action = do nothing; action i = run task i) idle action action 1 action 2 time time time

Reachable States, Successors, Rewards
States with the same task availability and the same relative position within the hyper-period have the same successor state and reward distributions reachable states

Evaluation (downward step) Different TUF shapes are useful to characterize tasks’ utilities (e.g., deadline-driven, work-ahead, jitter-sensitive cases) We chose three representative shapes, and randomized their key parameters: ui, tmi, cpi (we also randomized 80/20 task load parameters: li, thi, wi) (linear drop) termination times utility bounds (target sensitive) critical points

How Much Better is Optimal Scheduling?
Greedy (Generic Benefit*) vs. Optimal (MDP) Utility Accrual 2 tasks 3 tasks TUF nuances matter: e.g., work conserving approach degrades target sensitive policy 4 tasks 5 tasks * P. Li, PhD Dissertation, VA Tech, 2004

Divergence Increases with # of Tasks
Note we can solve 5 task MDPs for periodic task sets (but even representing a policy may be expensive)

How Should Policies be Represented?
Scheduling policy can be stored as a lookup table (size = # states) Tells best action to take in each (modeled) state How to minimize run-time memory cost? What to do about unexpected states? How to take advantage of heuristics? State Action a1 1 a2 2 3 4 5 6 7 8 ? 9 Policy Table

How to minimize memory footprint?
Decision trees compactly encode tabular data Trees can be built to approximate the policy Inner Nodes Contain Predicates Over State Variables x < 5 x < 3 a2 a2 a1 Leaf Nodes Contain Action Mappings (0, a1) (1, a2) (2, a2) (3, a1) (4, a1) (5, a2) (6, a2) (7, a2) (8, ?) (9, a2)

What to do about Unexpected States?
Trees abstract structure of encoded policy State x = 8 assigned a “reasonable” action (a2) Inner Nodes Contain Predicates Over State Variables x < 5 x < 3 a2 a2 a1 Leaf Nodes Contain Action Mappings (0, a1) (1, a2) (2, a2) (3, a1) (4, a1) (5, a2) (6, a2) (7, a2) (8, ?) (9, a2)

How to Take Advantage of Heuristics?
Leaf nodes also can recommend heuristics Trade run-time cost for accuracy of encoding Inner Nodes Contain Predicates Over State Variables x < 5 x < 3 a2 greedy(x) a1 Leaf Nodes Contain Action Mappings (0, a1) (1, a2) (2, a2) (3, a1) (4, a1) (5, a2) (6, a2) (7, a2) (8, ?) (9, a2)

Optimal Tree Size Varies
0.5 0.4 Fraction of Experiments 0.3 0.2 0.1 20 40 60 80 100 Size of Tree

Comparative Performance of Trees
1 Fraction of Experiments 0.8 0.6 optimal best tree heuristic greedy Pseudo 0 0.4 0.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction of Optimal

Publications (so far) T. Tidwell, C. Bass, E. Lasker, M. Wylde, C. Gill, and W. D. Smart, "Scalable Utility Aware Scheduling Heuristics for Real-time Tasks with Stochastic Non-preemptive Execution Intervals," 23rd Euromicro Conference on Real-Time Systems (ECRTS'11), Porto, Portugal, July 6 - 8, 2011. T. Tidwell, R. Glaubius, C. Gill, and W. D. Smart, "Optimizing Expected Time Utility in Cyber-Physical Systems Schedulers," 31st IEEE Real-Time Systems Symposium (RTSS '10), San Diego, CA, USA, November 30 - December 3, 2010. R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Real-Time Scheduling via Reinforcement Learning”, UAI 2010 R. Glaubius, T. Tidwell, B. Sidoti, D. Pilla, J. Meden, C. Gill, and W.D. Smart, “Scalable Scheduling Policy Design for Open Soft Real-Time Systems”, RTAS 2010 (received Best Student Paper award) R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Policy Design for Autonomic Systems”, International Journal on Autonomous and Adaptive Communications Systems, 2(3): , 2009 R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design and Verification for Open Soft Real-Time Systems”, RTSS 2008 T. Tidwell, R. Glaubius, C. Gill, and W.D. Smart, “Scheduling for Reliable Execution in Autonomic Systems”, ATC 2008

Concluding Remarks Markov Decision Process (MDP) models are useful for scheduling stochastic real-time systems Outperform heuristic approaches (sometimes by a lot) Approximations of resulting policies also help Where possible, geometric approximations do very well Otherwise, decision trees offer good trade-offs Research is a team sport Undergraduates on this project outnumbered doctoral students and faculty 4:1:1 Check out the CSE Department’s summer REU program

Chris Gill Associate Professor

Similar presentations

Presentation on theme: "Chris Gill Associate Professor"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chris Gill Associate Professor

Similar presentations

Presentation on theme: "Chris Gill Associate Professor"— Presentation transcript:

Similar presentations

About project

Feedback