CPS Scheduling Policy Design with Utility and Stochastic Execution* Chris Gill Associate Professor Department of Computer Science and Engineering Washington.

Slides:

Advertisements

Similar presentations

Heuristic Search techniques

Advertisements

Markov Decision Process

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

Scalable Utility Aware Scheduling Heuristics for Real-time Tasks with Stochastic Non-preemptive Execution Intervals* Terry Tidwell 1, Carter Bass 1, Eli.

Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)

Decision Theoretic Planning

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.

An Introduction to Markov Decision Processes Sarah Hickmott

Infinite Horizon Problems

Planning under Uncertainty

Kuang-Hao Liu et al Presented by Xin Che 11/18/09.

In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.

Markov Decision Processes

Nov 14 th  Homework 4 due  Project 4 due 11/26.

Evaluating Hypotheses

Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Department of Computer Science Undergraduate Events More

Reinforcement Learning (1)

9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.

Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,

1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo

1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:

Cyber-Physical Systems Research* Chris Gill Professor of Computer Science and Engineering Washington University, St. Louis, MO, USA

Search and Planning for Inference and Learning in Computer Vision

Scheduling as a Learned Art* Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius {cdgill, wds, ttidwell, Department.

Network Aware Resource Allocation in Distributed Clouds.

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Scheduling for Reliable Execution in Autonomic Systems* Terry Tidwell, Robert Glaubius, Christopher Gill, and William D. Smart {ttidwell, rlg1, cdgill,

Non-Preemptive Scheduling Policy Design for Tasks with Stochastic Execution Times* Chris Gill Associate Professor Department of Computer Science and Engineering.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)

Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.

Erasure Coding for Real-Time Streaming Derek Leong and Tracey Ho California Institute of Technology Pasadena, California, USA ISIT

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.

Scheduling Policy Design for Stochastic Non-preemptive Real-time Systems* Chris Gill Professor of Computer Science and Engineering Washington University,

Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.

Markov Decision Process (MDP)

Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

Scalable Scheduling Policy Design for Open Soft Real-Time Systems*

Scheduling Design and Verification for Open Soft Real-time Systems

Optimizing Expected Time Utility in Cyber-Physical Systems Schedulers

Chris Gill Associate Professor

 Real-Time Scheduling via Reinforcement Learning

Approximate POMDP planning: Overcoming the curse of history!

 Real-Time Scheduling via Reinforcement Learning

Linköping University, IDA, ESLAB

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

CS 416 Artificial Intelligence

Markov Decision Processes

Markov Decision Processes

Presentation transcript:

CPS Scheduling Policy Design with Utility and Stochastic Execution* Chris Gill Associate Professor Department of Computer Science and Engineering Washington University, St. Louis, MO, USA Georgia Tech CPS Summer School Atlanta, GA, June 23-25, 2010 *Research supported in part by NSF grants CNS (Cybertrust) and CCF (CAREER) and driven by numerous contributions from post-doctoral student Robert Glaubius; doctoral student Terry Tidwell; undergraduate students Braden Sidoti, David Pilla, Justin Meden, Carter Bass, Eli Lasker, Micah Wylde, and Cameron Cross; and Prof. William D. Smart

2 - Gill et al. – 9/11/2015 Washington University in St. Louis

3 - Gill et al. – 9/11/2015 Dept. of Computer Science and Engineering 24 faculty members and 70 Ph.D. students working in: real-time and embedded systems, robotics, graphics, computer vision, HCI, AI, bioinformatics, networking, high-performance architectures, chip multi-processors, mobile computing, sensor networks, optimization PhD students are fully funded, and we emphasize individual mentorship and interdisciplinary work Recent graduates are on faculty at U. Mass, UT-Austin, Rochester, RIT, CMU, Michigan St., and UNC-Charlotte Graduate study application deadline for Fall 2011 is January 15:

4 - Gill et al. – 9/11/2015 Why Pursue CPS Research? Systems are increasingly being designed to interact with the physical world This trend offers compelling new research challenges that motivate our work Consider for example the domain of mobile robotics my name is Lewis Media and Machines Laboratory Washington University in St. Louis

5 - Gill et al. – 9/11/2015 Why is This Work CPS Research? As in many other systems, resources must be shared among competing tasks Fail-safe modes may reduce consequences of resource- induced timing failures, but precise scheduling matters The physical properties of some resources motivate new models and techniques my name is Lewis Media and Machines Laboratory Washington University in St. Louis

6 - Gill et al. – 9/11/2015 Which Problem Features are Interesting? Sharing e.g., a camera between navigation and image capture tasks (1) in general doesn’t allow efficient preemption (2) involves stochastically distributed durations Also important in general: (3) scalability (many tasks sharing such a resource); (4) task utility/availability Lewis Media and Machines Laboratory Washington University in St. Louis

7 - Gill et al. – 9/11/2015 System Model Assumptions We model time as being discrete »E.g., based on some multiple of the Linux jiffy »States and scheduling decisions align with those quanta Separate tasks require a shared resource »Access is mutually exclusive (a task binds the resource) »Binding durations are independent and non-preemptive »Tasks’ duration distributions are known (or learned [1]) »Each task is always available to run (relaxed in part III) Goal: precise resource allocation among tasks [5] »E.g., 2:1 utilization share targets for tasks A vs. B »Need a deterministic scheduling policy (decides which task gets the resource when) that best fits that goal

8 - Gill et al. – 9/11/2015 Part I Utilization State Spaces and Markov Decision Processes

9 - Gill et al. – 9/11/2015 Towards Optimal Policies A Markov decision process (MDP) is a 4-tuple (X,A,C,T) that matches our system model well: X: a finite set of states (e.g., utilizations of 8 vs. 17 quanta) A: the set of actions (giving resource to a particular task) C: cost function for taking an action in a state T: transition function (probability of moving from one state to another state based on the action chosen) Solving the MDP gives a policy that maps each state to an action to minimize long term expected costs However, to do that we need a finite set of states

10 - Gill et al. – 9/11/2015 Share Aware Scheduling A system state: cumulative resource usage of each task Dispatching a task moves the system stochastically through the state space according to that task’s duration (8,17)

11 - Gill et al. – 9/11/2015 Share Aware Scheduling Utilization target induces a ray { u:  0} through the state space Encode each state’s “goodness” (relative to the share) as a cost Require that costs grow with distance from utilization ray u u=(1/3,2/3)

12 - Gill et al. – 9/11/2015 Transition Structure Transitions are state- independent I.e., relative distribution over successor states is the same in each state

13 - Gill et al. – 9/11/2015 Cost Structure States along same line parallel to the utilization ray have equal cost

14 - Gill et al. – 9/11/2015 Equivalence Classes Transition and cost structure thus induce equivalence classes Equivalent states have the same optimal long- term cost and policy!

15 - Gill et al. – 9/11/2015 Periodicity Periodic structure allows us to represent each equivalence class with a single exemplar [4]

16 - Gill et al. – 9/11/2015 Wrapping the State Model Remove all but one exemplar from each equivalence class Actions and costs remain unchanged Remap any dangling transitions (to removed states) to the corresponding exemplar (0,0)

17 - Gill et al. – 9/11/2015 c(x)=  Truncating the State Model Inexpensive states are nearer the utilization target Good policies should keep costs small Can truncate the state space by bounding sizes of costs considered

18 - Gill et al. – 9/11/2015 Bounding the State Model Map any dangling transitions produced by truncation, to a high-cost absorbing state This guarantees that we will be able to find bounded-cost policies if they exist Bounded costs also guarantee bounded deviation from the resource share (precision)

19 - Gill et al. – 9/11/2015 A Scheduling Policy Design Approach Iteratively increase the bounds and re-solve the resulting MDP As the bounds increase, the bounded model solution converges towards the optimal wrapped model policy

20 - Gill et al. – 9/11/2015 Automating Model Discovery ESPI: Expanding State Policy Iteration [3] 1.Start with a policy that only reaches finitely many states from (0,…,0). E.g., always run the most underutilized task. 2.Enumerate enough states to evaluate and improve that policy 3.If policy can not be improved, stop 4.Otherwise, repeat from (2) with newly improved policy

21 - Gill et al. – 9/11/2015 Policy Evaluation Envelope Enumerate states reachable from the initial state Explore state space breadth-first under the current policy, starting from the initial state (0,0)

22 - Gill et al. – 9/11/2015 Policy Improvement Envelope Consider alternative actions Close under the current policy using breadth- first expansion Evaluate and improve the policy within this envelope

23 - Gill et al. – 9/11/2015 ESPI Termination As long as the initial policy has finite closure, each ESPI iteration terminates (this is satisfied by starting with the heuristic policy that always runs the most underutilized task) Policy strictly improves at each iteration Anecdotally, ESPI terminates on all of the task scheduling MDPs to which we have applied it

24 - Gill et al. – 9/11/2015 Comparing Design Methods Policy performance is shown normalized and centered on the ESPI solution data Larger bounded state models yield the ESPI solution

25 - Gill et al. – 9/11/2015 Part II Scalability and Approximation Techniques

26 - Gill et al. – 9/11/2015 What About Scalability? MDP representation allows consistent approximation of the optimal scheduling policy Empirically, bounded model and ESPI solutions appear to be near-optimal However, approach scales exponentially in number of tasks so while it may be good for (e.g.) sharing an actuator, it won’t apply directly to larger task sets

27 - Gill et al. – 9/11/2015 What our Policies Say about Scalability To overcome limitations of MDP based approach, we focus attention on a restricted class of appropriate scheduling policies Examining the policies produced by the MDP based approach gives insights into choosing (and into parameterizing) appropriate policies [2]

28 - Gill et al. – 9/11/2015 Two-task MDP Policy Scheduling policies induce a partition on a 2-D state space with boundary parallel to the share target Establish a decision offset d to identify the partition boundary Sufficient in 2-D, but what about in higher dimensions?

29 - Gill et al. – 9/11/2015 Time Horizons Suggest a Generalization H0H0 H1H1 H2H2 H3H3 H4H4 H t ={x : x 1 +x 2 +…+x n =t} H0H0 H1H1 H2H2 (0,0)(2,0,0) (0,2,0) (0,0,2) u u

30 - Gill et al. – 9/11/2015 Three-task MDP Policy Action partitions meet along a decision ray that is parallel to the utilization ray Action partitions are roughly cone-shaped t =10t =20t =30

31 - Gill et al. – 9/11/2015 Parameterizing a Partition Specify a decision offset at the intersection of partitions Anchor action vectors at the decision offset to approximate partitions A conic policy selects the action vector best aligned with the displacement between the query state and the decision offset a1a1 a2a2 a3a3 x

32 - Gill et al. – 9/11/2015 Conic Policy Parameters Decision offset d Action vectors a 1,a 2,…,a n Sufficient to partition each time horizon into n regions Allows good policy parameters to be found through local search

33 - Gill et al. – 9/11/2015 Comparing Policies Policy found by ESPI (for small numbers of tasks) π ESPI (x) – chooses action at state x per solved MDP Simple heuristics (for all numbers of tasks) π underused (x) – runs the most underutilized task π greedy (x) – minimizes immediate cost from state x Conic approach (for all numbers of tasks) π conic (x) – selects action with best aligned action vector

34 - Gill et al. – 9/11/2015 Policy Comparison on a 4 Task Problem Task durations: random histograms over [2,32] 100 iterations of Monte Carlo conic parameter search ESPI outperforms, conic eventually approximates well

35 - Gill et al. – 9/11/2015 Policy Comparison on a Ten Task Problem Repeated the same experiment for 10 tasks ESPI is omitted (intractable here) Conic outperforms greedy & underutilized heuristics

36 - Gill et al. – 9/11/2015 Comparison with Varying #s of Tasks 100 independent problems for each # (avg, 95% conf) ESPI only tractable through all 2 and 3 task cases Conic approximates ESPI, then outperforms others

37 - Gill et al. – 9/11/2015 Part III Expanding our Notions of Utility and Availability

38 - Gill et al. – 9/11/2015 Time-Utility Functions Previously, utility was proximity to utilization target; now we let tasks’ utility and job availability* vary time-utility function (TUF) name period boundary termination time period boundary * Availability variable q i is defined over {0,1}; {0, tm i /p i }; or {0,1} tmi/pi Time

39 - Gill et al. – 9/11/2015 Utility × Execution  Utility Density A task’s time-utility function and its execution time distribution (e.g., D i (1) = D i (2) = 50%) give a distribution of utility for scheduling the task

40 - Gill et al. – 9/11/2015 Actions and State Space Structure State space can be more compact here than in parts I and II: dimensions are task availability (e.g., over (q 1, q 2 )) vs. time Can wrap the state space over the hyper-period of all tasks (e.g., D 1 (1) = D 2 (1) = 1; tm 1 = p 1 = 4; tm 2 = p 2 = 2) Scheduling actions induce a transition structure over states (e.g., idle action = do nothing; action i = run task i) action 2action 1idle action time

41 - Gill et al. – 9/11/2015 Reachable States, Successors, Rewards States with the same task availability and the same relative position within the hyper-period have the same successor state and reward distributions reachable states

42 - Gill et al. – 9/11/2015 Evaluation (target sensitive) (linear drop) (downward step) Different TUF shapes are useful to characterize tasks’ utilities (e.g., deadline-driven, work- ahead, jitter-sensitive cases) We chose three representative shapes, and randomized their key parameters: u i, tm i, cp i (we also randomized 80/20 task load parameters: l i, th i, w i ) utility bounds critical points termination times

43 - Gill et al. – 9/11/2015 How Much Better is Optimal Scheduling? Greedy (Generic Benefit*) vs. Optimal (MDP) Utility Accrual * P. Li, PhD Dissertation, VA Tech, tasks3 tasks 5 tasks 4 tasks TUF nuances matter: e.g., work conserving approach degrades target sensitive policy

44 - Gill et al. – 9/11/2015 Divergence Increases with # of Tasks Note we can solve 5 task MDPs for periodic task sets (smaller state spaces; scalability is an ongoing issue)

45 - Gill et al. – 9/11/2015 Conclusions We have developed new techniques for designing non-preemptive scheduling policies for tasks with stochastic resource usage durations MDP-based methods are effective for 2 or 3 task utilization share problems (e.g., for an actuator) Conic policy performance is competitive with ESPI for smaller problems, and for larger problems improves on the underutilized and greedy policies Ongoing work is focused on identifying and evaluating important categories of time-utility functions and tailoring our approach to address their nuances

46 - Gill et al. – 9/11/2015 Publications [1] R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Real-Time Scheduling via Reinforcement Learning”, UAI 2010 [2] R. Glaubius, T. Tidwell, B. Sidoti, D. Pilla, J. Meden, C. Gill, and W.D. Smart, “Scalable Scheduling Policy Design for Open Soft Real-Time Systems”, RTAS 2010 (received Best Student Paper award) [3] R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Policy Design for Autonomic Systems”, International Journal on Autonomous and Adaptive Communications Systems, 2(3): , 2009 [4] R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design and Verification for Open Soft Real-Time Systems”, RTSS 2008 [5] T. Tidwell, R. Glaubius, C. Gill, and W.D. Smart, “Scheduling for Reliable Execution in Autonomic Systems”, ATC 2008

Thanks, and hope to see you at CPSWeek 2011! Chris Gill Associate Professor of Computer Science and Engineering