Scheduling Policy Design for Stochastic Non-preemptive Real-time Systems* Chris Gill Professor of Computer Science and Engineering Washington University,

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Reinforcement Learning
Artificial Intelligence Presentation
Traveling Salesperson Problem
Markov Decision Process
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Department of Computer Science Undergraduate Events More
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
Scalable Utility Aware Scheduling Heuristics for Real-time Tasks with Stochastic Non-preemptive Execution Intervals* Terry Tidwell 1, Carter Bass 1, Eli.
Decision Theoretic Planning
Optimal Policies for POMDP Presented by Alp Sardağ.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
An Introduction to Markov Decision Processes Sarah Hickmott
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Markov Decision Processes
Infinite Horizon Problems
Planning under Uncertainty
Nov 14 th  Homework 4 due  Project 4 due 11/26.
Evaluating Hypotheses
4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Department of Computer Science Undergraduate Events More
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Cyber-Physical Systems Research* Chris Gill Professor of Computer Science and Engineering Washington University, St. Louis, MO, USA
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
Search and Planning for Inference and Learning in Computer Vision
CPS Scheduling Policy Design with Utility and Stochastic Execution* Chris Gill Associate Professor Department of Computer Science and Engineering Washington.
Scheduling as a Learned Art* Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius {cdgill, wds, ttidwell, Department.
Reinforcement Learning
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Scheduling for Reliable Execution in Autonomic Systems* Terry Tidwell, Robert Glaubius, Christopher Gill, and William D. Smart {ttidwell, rlg1, cdgill,
Non-Preemptive Scheduling Policy Design for Tasks with Stochastic Execution Times* Chris Gill Associate Professor Department of Computer Science and Engineering.
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.
1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
Local Search and Optimization Presented by Collin Kanaley.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Department of Computer Science Undergraduate Events More
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Markov Decision Process (MDP)
Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Lecture 3: Uninformed Search
CS b659: Intelligent Robotics
Making complex decisions
POMDPs Logistics Outline No class Wed
Scalable Scheduling Policy Design for Open Soft Real-Time Systems*
Professor S K Dubey,VSM Amity School of Business
Scheduling Design and Verification for Open Soft Real-time Systems
Markov Decision Processes
Optimizing Expected Time Utility in Cyber-Physical Systems Schedulers
Chris Gill Associate Professor
Markov Decision Processes
 Real-Time Scheduling via Reinforcement Learning
 Real-Time Scheduling via Reinforcement Learning
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Presentation transcript:

Scheduling Policy Design for Stochastic Non-preemptive Real-time Systems* Chris Gill Professor of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE 131 Guest Lecture October 8, 2012 *Research supported by NSF grants CNS (Cybertrust) and CCF (CAREER) and driven by numerous contributions from Drs. Robert Glaubius (PhD 2009) and Terry Tidwell (PhD 2011) ; undergraduates Braden Sidoti, David Pilla, Justin Meden, Eli Lasker, Micah Wylde, Carter Bass, Cameron Cross, and Percy Fang; and Prof. William D. Smart

2 - Gill et al. – 12/14/2015 Why Good Schedules Are Needed System resources are usually limited Adding more raises cost, weight, power So, there’s rarely enough to go around Activities contend for resources Aiming camera to find/photograph faces Aiming camera to find/avoid obstacles How to share camera between these? Scheduling access to resources Allows resources to be shared Raises many other interesting questions Lewis Media and Machines Lab Washington University

3 - Gill et al. – 12/14/2015 Why Good Schedules are Difficult to Find How long an activity needs a resource may vary We’ll focus mainly on this issue in today’s talk Issues we’ve addressed beyond that basic problem We may have to learn distributions of times on-line Different distributions in different operating modes Image capture times with occlusion modes time probability

4 - Gill et al. – 12/14/2015 Developing a System Model: a Good Start A system model helps capture a problem rigorously »Gives a sound basis for reasoning about the problem »Focuses attention on particular kinds of analysis Identifies the important abstractions to work with »For example, resources, activities, and shares Captures key assumptions about the problem »E.g., is time treated as discrete or continuous? »E.g., is data available before, during, or after run-time?

5 - Gill et al. – 12/14/2015 Basic Scheduling Problem System Model Time is considered to be discrete »E.g., a Linux jiffy is the time quantum Separate activities require a shared resource »Access is mutually exclusive (activity binds the resource) »Binding intervals are independent and non-preemptive »Each activity’s distribution of intervals is known up front Goal: guarantee each activity a utilization fraction »For example, 1/2 and 1/2 or 1/3 and 2/3 »Want to define a scheduling policy (decides which activity gets the resource when) that best fits that goal

6 - Gill et al. – 12/14/2015 Formal System Model Representation A state space describes such a system model well »Circles represent different combinations of utilizations »Lower left corner is (0,0) »Vertical transitions give quanta to one resource »Horizontal transitions give quanta to the other one Dashed ray shows goal »E.g., 1/3 vs 2/3 share (x, y) Number of dimensions is number of activities »Generalizes to 3-D, …, n-D 0,1 0,0 1,0 1,1 0,3 0,2 1,2 1,3 2,1 2,0 3,0 3,1 2,3 2,2 3,2 3,3

7 - Gill et al. – 12/14/2015 Dealing with Uncertainty time probability time probability Easy if the resource is bound for one quantum at a time »Just move closest to goal ray However, we have a probability distribution of binding times »Multiple possibilities per action Need to consider probable consequences of each action »Leads to our use of a Markov Decision Process (MDP) approach

8 - Gill et al. – 12/14/2015 From Binding Times to a Scheduling MDP We model these scheduling decisions as a Markov Decision Process (MDP) over use of the resource The MDP is given by 4-tuple: (X,A,R,T) »X: the set of resource utilization states (how much use) »A: the set of actions (giving resource to an activity) »R: reward function for taking an action in a state (how close to the goal ray are we likely to remain) »T: transition function (probability of moving from one state to another state) Want to solve MDP to obtain a locally optimal policy

9 - Gill et al. – 12/14/2015 Policy Iteration Approach Define a cost function r(x) that penalizes deviation from the target utilization ray Start with some initial policy  0 Repeat for t=0,1,2,… Compute the value V t (x) -- the accumulated cost of following  t -- for each state x. Obtain a new policy,  t+1, by choosing the greedy action at each state. Guaranteed to converge to the optimal policy, requires storing V t and  t in lookup tables.

10 - Gill et al. – 12/14/2015 Can’t do Policy Iteration Quite Yet Unfortunately, the state space we have is infinite Can’t apply MDP solution techniques directly to the state space as it stands »Need to bound the state space to solve for a policy Our approach »Reduce the state space to a set of equivalence classes

11 - Gill et al. – 12/14/2015 Insight: State Value Equivalence Two states co-linear along the target ray have the same cost Also have the same relative distribution of costs over future states (independent actions) Any two states with the same cost have the same optimal value!

12 - Gill et al. – 12/14/2015 Technique: State Wrapping This lets us collapse the equivalent states down into a set of exemplar states »Notice how arrows (successors) wrap back into “earlier” states Now we can add “absorbing” states to bound the space »Far enough from target ray, best decision is clear Now we can use policy iteration to obtain a policy

13 - Gill et al. – 12/14/2015 Automating Model Discovery ESPI: Expanding State Policy Iteration [3] 1.Start with a policy that only reaches finitely many states from (0,…,0). E.g., always run the most underutilized task. 2.Enumerate enough states to evaluate and improve that policy 3.If policy can not be improved, stop 4.Otherwise, repeat from (2) with newly improved policy

14 - Gill et al. – 12/14/2015 What About Scalability? MDP representation allows consistent approximation of the optimal scheduling policy Empirically, bounded model and ESPI solutions appear to be near-optimal However, approach scales exponentially in number of tasks so while it may be good for (e.g.) sharing an actuator, it won’t apply directly to larger task sets

15 - Gill et al. – 12/14/2015 What our Policies Say about Scalability To overcome limitations of MDP based approach, we focus attention on a restricted class of appropriate scheduling policies Examining the policies produced by the MDP based approach gives insights into choosing (and into parameterizing) appropriate policies

16 - Gill et al. – 12/14/2015 Two-task MDP Policy Scheduling policies induce a partition on a 2-D state space with boundary parallel to the share target Establish a decision offset d to identify the partition boundary Sufficient in 2-D, but what about in higher dimensions?

17 - Gill et al. – 12/14/2015 Time Horizons Suggest a Generalization H0H0 H1H1 H2H2 H3H3 H4H4 H t ={x : x 1 +x 2 +…+x n =t} H0H0 H1H1 H2H2 (0,0)(2,0,0) (0,2,0) (0,0,2) u u

18 - Gill et al. – 12/14/2015 Three-task MDP Policy Action partitions meet along a decision ray that is parallel to the utilization ray t =10t =20t =30

19 - Gill et al. – 12/14/2015 Parameterizing a Partition Specify a decision offset at the intersection of partitions Anchor action vectors at the decision offset to approximate partitions A “conic” policy selects the action vector best aligned with the displacement between the query state and the decision offset a1a1 a2a2 a3a3 x

20 - Gill et al. – 12/14/2015 Conic Policy Parameters Decision offset d Action vectors a 1,a 2,…,a n Sufficient to partition each time horizon into n regions Allows good policy parameters to be found through local search

21 - Gill et al. – 12/14/2015 Comparing Policies Policy found by ESPI (for small numbers of tasks) π ESPI (x) – chooses action at state x per solved MDP Simple heuristics (for all numbers of tasks) π underused (x) – runs the most underutilized task π greedy (x) – minimizes immediate cost from state x Conic approach (for all numbers of tasks) π conic (x) – selects action with best aligned action vector

22 - Gill et al. – 12/14/2015 Policy Comparison on a 4 Task Problem Task durations: random histograms over [2,32] 100 iterations of Monte Carlo conic parameter search ESPI outperforms, conic eventually approximates well

23 - Gill et al. – 12/14/2015 Policy Comparison on a Ten Task Problem Repeated the same experiment for 10 tasks ESPI is omitted (intractable here) Conic outperforms greedy & underutilized heuristics

24 - Gill et al. – 12/14/2015 Comparison with Varying #s of Tasks 100 independent problems for each # (avg, 95% conf) ESPI only tractable through all 2 and 3 task cases Conic approximates ESPI, then outperforms others

25 - Gill et al. – 12/14/2015 Expanding our Notion of Utility Previously, utility was proximity to utilization target; now we let tasks’ utility and job availability* vary time-utility function (TUF) name period boundary termination time period boundary * Availability variable q i is defined over {0,1}; {0, tm i /p i }; or {0,1} tmi/pi Time

26 - Gill et al. – 12/14/2015 Utility × Execution  Utility Density A task’s time-utility function and its execution time distribution (e.g., D i (1) = D i (2) = 50%) give a distribution of utility for scheduling the task

27 - Gill et al. – 12/14/2015 Actions and State Space Structure State space can be more compact here than before: dimensions are task availability, e.g., over (q 1, q 2 ), vs. time Can wrap the state space over the hyper-period of all tasks (e.g., D 1 (1) = D 2 (1) = 1; tm 1 = p 1 = 4; tm 2 = p 2 = 2) Scheduling actions induce a transition structure over states (e.g., idle action = do nothing; action i = run task i) action 2action 1idle action time

28 - Gill et al. – 12/14/2015 Reachable States, Successors, Rewards States with the same task availability and the same relative position within the hyper-period have the same successor state and reward distributions reachable states

29 - Gill et al. – 12/14/2015 Evaluation (target sensitive) (linear drop) (downward step) Different TUF shapes are useful to characterize tasks’ utilities (e.g., deadline-driven, work- ahead, jitter-sensitive cases) We chose three representative shapes, and randomized their key parameters: u i, tm i, cp i (we also randomized 80/20 task load parameters: l i, th i, w i ) utility bounds critical points termination times

30 - Gill et al. – 12/14/2015 How Much Better is Optimal Scheduling? Greedy (Generic Benefit*) vs. Optimal (MDP) Utility Accrual * P. Li, PhD Dissertation, VA Tech, tasks3 tasks 5 tasks 4 tasks TUF nuances matter: e.g., work conserving approach degrades target sensitive policy

31 - Gill et al. – 12/14/2015 Divergence Increases with # of Tasks Note we can solve 5 task MDPs for periodic task sets (but even representing a policy may be expensive)

32 - Gill et al. – 12/14/2015 How Should Policies be Represented? StateAction 0a1a1 1a2a2 2a2a2 3a1a1 4a1a1 5a2a2 6a2a2 7a2a2 8? 9a2a2 Scheduling policy can be stored as a lookup table (size = # states) »Tells best action to take in each (modeled) state How to minimize run-time memory cost? What to do about unexpected states? How to take advantage of heuristics? Policy Table

33 - Gill et al. – 12/14/2015 How to minimize memory footprint? Decision trees compactly encode tabular data Trees can be built to approximate the policy (0, a 1 )(1, a 2 )(2, a 2 )(3, a 1 )(4, a 1 )(5, a 2 )(6, a 2 )(7, a 2 )(8, ?)(9, a 2 ) x < 5 x < 3 a1 Inner Nodes Contain Predicates Over State Variables Leaf Nodes Contain Action Mappings a2

34 - Gill et al. – 12/14/2015 What to do about Unexpected States? Trees abstract structure of encoded policy State x = 8 assigned a “reasonable” action (a2) (0, a 1 )(1, a 2 )(2, a 2 )(3, a 1 )(4, a 1 )(5, a 2 )(6, a 2 )(7, a 2 )(8, ?)(9, a 2 ) x < 5 x < 3 a1 Inner Nodes Contain Predicates Over State Variables Leaf Nodes Contain Action Mappings a2

35 - Gill et al. – 12/14/2015 How to Take Advantage of Heuristics? Leaf nodes also can recommend heuristics Trade run-time cost for accuracy of encoding (0, a 1 )(1, a 2 )(2, a 2 )(3, a 1 )(4, a 1 )(5, a 2 )(6, a 2 )(7, a 2 )(8, ?)(9, a 2 ) x < 5 x < 3 a1 Inner Nodes Contain Predicates Over State Variables Leaf Nodes Contain Action Mappings greedy(x) a2

36 - Gill et al. – 12/14/2015 Optimal Tree Size Varies 36 Fraction of Experiments Size of Tree

37 - Gill et al. – 12/14/2015 Comparative Performance of Trees Fraction of Experiments Fraction of Optimal optimal best tree heuristic greedy Pseudo 0

38 - Gill et al. – 12/14/2015 Publications (so far) T. Tidwell, C. Bass, E. Lasker, M. Wylde, C. Gill, and W. D. Smart, "Scalable Utility Aware Scheduling Heuristics for Real-time Tasks with Stochastic Non- preemptive Execution Intervals," 23rd Euromicro Conference on Real-Time Systems (ECRTS'11), Porto, Portugal, July 6 - 8, T. Tidwell, R. Glaubius, C. Gill, and W. D. Smart, "Optimizing Expected Time Utility in Cyber-Physical Systems Schedulers," 31st IEEE Real-Time Systems Symposium (RTSS '10), San Diego, CA, USA, November 30 - December 3, R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Real-Time Scheduling via Reinforcement Learning”, UAI 2010 R. Glaubius, T. Tidwell, B. Sidoti, D. Pilla, J. Meden, C. Gill, and W.D. Smart, “Scalable Scheduling Policy Design for Open Soft Real-Time Systems”, RTAS 2010 (received Best Student Paper award) R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Policy Design for Autonomic Systems”, International Journal on Autonomous and Adaptive Communications Systems, 2(3): , 2009 R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design and Verification for Open Soft Real-Time Systems”, RTSS 2008 T. Tidwell, R. Glaubius, C. Gill, and W.D. Smart, “Scheduling for Reliable Execution in Autonomic Systems”, ATC 2008

39 - Gill et al. – 12/14/2015 Concluding Remarks Markov Decision Process (MDP) models are useful for scheduling stochastic real-time systems »Outperform heuristic approaches (sometimes by a lot) Approximations of resulting policies also help »Where possible, geometric approximations do very well »Otherwise, decision trees offer good trade-offs Research is a team sport »Undergraduates on this project outnumbered doctoral students and faculty 4:1:1 »Check out the CSE Department’s summer REU program