Scheduling as a Learned Art* Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius {cdgill, wds, ttidwell, Department.

Slides:

Advertisements

Similar presentations

Lecture 8: Three-Level Architectures CS 344R: Robotics Benjamin Kuipers.

Advertisements

Partially Observable Markov Decision Process (POMDP)

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Scalable Utility Aware Scheduling Heuristics for Real-time Tasks with Stochastic Non-preemptive Execution Intervals* Terry Tidwell 1, Carter Bass 1, Eli.

Dynamic Bayesian Networks (DBNs)

1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

LCSLCS 18 September 2002DARPA MARS PI Meeting Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack.

A Hierarchical Framework for Composing Nested Web Processes Haibo Zhao, Prashant Doshi LSDIS Lab, Dept. of Computer Science, University of Georgia 4 th.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

An Introduction to Markov Decision Processes Sarah Hickmott

Planning under Uncertainty

Reinforcement Learning

Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.

CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.

Nov 14 th  Homework 4 due  Project 4 due 11/26.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Markov Decision Processes

9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.

Computer System Architectures Computer System Software

Real-Time Software Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.

Decision-Making on Robots Using POMDPs and Answer Set Programming Introduction Robots are an integral part of many sectors such as medicine, disaster rescue.

Search and Planning for Inference and Learning in Computer Vision

CPS Scheduling Policy Design with Utility and Stochastic Execution* Chris Gill Associate Professor Department of Computer Science and Engineering Washington.

1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own.

Reinforcement Learning

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Scheduling for Reliable Execution in Autonomic Systems* Terry Tidwell, Robert Glaubius, Christopher Gill, and William D. Smart {ttidwell, rlg1, cdgill,

Real-Time Performance and Middleware for Multiprocessor and Multicore Linux Platforms* Yuanfang Zhang, Christopher Gill, and Chenyang Lu Department of.

Non-Preemptive Scheduling Policy Design for Tasks with Stochastic Execution Times* Chris Gill Associate Professor Department of Computer Science and Engineering.

1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.

Decision Making in Robots and Autonomous Agents Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy.

1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.

Tufts University School Of Engineering Tufts Wireless Laboratory TWL Direction Almir Davis 09/28/20091.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.

1 Scheduling The part of the OS that makes the choice of which process to run next is called the scheduler and the algorithm it uses is called the scheduling.

Reference: Ian Sommerville, Chap 15  Systems which monitor and control their environment.  Sometimes associated with hardware devices ◦ Sensors: Collect.

Solving POMDPs through Macro Decomposition

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.

Scheduling Policy Design for Stochastic Non-preemptive Real-time Systems* Chris Gill Professor of Computer Science and Engineering Washington University,

Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Engineering the Advanced Power Grid: Research Challenges and Tasks M. L. Crow, F. Liu, B. McMillin, D. Tauritz {crow, fliu, ff, University.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.

Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.

Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.

1.  System Characteristics  Features of Real-Time Systems  Implementing Real-Time Operating Systems  Real-Time CPU Scheduling  An Example: VxWorks5.x.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

CT101: Computing Systems Introduction to Operating Systems.

CS b659: Intelligent Robotics

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Scalable Scheduling Policy Design for Open Soft Real-Time Systems*

Biomedical Data & Markov Decision Process

Scheduling Design and Verification for Open Soft Real-time Systems

Markov Decision Processes

Optimizing Expected Time Utility in Cyber-Physical Systems Schedulers

Markov Decision Processes

 Real-Time Scheduling via Reinforcement Learning

 Real-Time Scheduling via Reinforcement Learning

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Scheduling as a Learned Art* Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius {cdgill, wds, ttidwell, Department of Computer Science and Engineering Washington University, St. Louis, MO, USA Fourth International Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT 2008) July 1, 2008, Prague, Czech Republic *Research supported in part by NSF awards CNS ( Cybertrust ) and CCF (CAREER)

2 - Gill et al. – 9/14/2015 Motivation: Systems with (some) Autonomy Interact with variable environment »Varying degrees of autonomy »Performance is deadline sensitive Many activities must run at once »Device interrupt handing, computation »Comm w/ other systems/operators Need reliable activity execution »Scheduling with shared resources and competing, variable execution times »How to guarantee utilizations? Remote Operator Station (for all but full autonomy) Wireless Communication Lewis Media and Machines Lab Washington University St. Louis, MO, USA

3 - Gill et al. – 9/14/2015 More Generally, Open Soft Real-Time Systems Questions of interest are relevant well beyond mobile robotics »Robotics is a good touchstone, though »In many systems, platform features interact with physical environment »Especially with increased embedding of OS/RTOS platforms everywhere ;-) Abstract view of the problem »Diverse concurrent application tasks »Task execution times are variable »(Soft) deadlines on application tasks »Resources shared among tasks »Need methods to design and verify scheduling policies accordingly What Other Kinds of Embedded Systems Have Similar Platform Constraints?

4 - Gill et al. – 9/14/2015 Current System Model Threads of execution depend on a shared resource »Require mutually exclusive access (e.g., to a CPU) to run Each thread binds the resource when it runs »A thread binds resource for a duration then releases it »Model duration with integer variables: count time quanta Variable execution times with known distributions »We assume that each thread’s run-time distribution is known and bounded, and independent of the others Non-preemptive scheduler (repeats perpetually) »Scheduler chooses which thread to run (based on policy) »Scheduler dispatches thread which runs until it yields »Scheduler waits until the thread releases the resource

5 - Gill et al. – 9/14/2015 Uncertainty (but with Observability Post-Hoc) time probability time probability We summarize system state as a vector of integers »Represent thread utilizations Threads’ run times come from known, bounded distributions Scheduling a thread changes the system’s (utilization) state »Utilization is observed after the thread runs based on its run time »State transition probabilities are based the run time distributions This forms a basis for policy design and optimization From Tidwell et al., ATC 2008

6 - Gill et al. – 9/14/2015 From Thread Run Times to a Scheduling Policy We model thread scheduling decisions as a Markov Decision Process (MDP) based on thread run times (From ATC ‘08) MDP is given by 4-tuple: (X,A,R,T) »X: set of process states (i.e., thread utilization states) »A: set of actions (i.e., scheduling a particular thread) »R: reward function for taking an action in a state Expected utility of taking that action Distance of the next state(s) from a desired utilization (vector) »T: transition function For each action, encodes the probability of moving from a given state to another state Solve MDP: optimal (per accumulated reward) policy Fold periodic states: smaller space (recent advance)

7 - Gill et al. – 9/14/2015 Partial Observability Local CPU usage is pretty easy to observe exactly »E.g., using Pentium tick counter, or other good time source However, other key properties are noisier »E.g., robot location indoors No GPS “position sensor”, wheel slip etc. adds noise during motion »How does this relate to scheduling? What if we consider robot’s progress along a navigation path … … as an activity which must compete for resources with others? Then, robot’s position becomes part of the scheduling state Similar issues may arise for other scheduling cases (e.g., in CPS) Noise in observation produces partial observability »E.g., multiple different positions can be equally likely » Possible approach: Partially Observable MDPs (POMDPs) Reason on belief states to get MDP transition function (a big space)

8 - Gill et al. – 9/14/2015 Observation Lag State observations also may incur temporal lag »E.g., detailed scan of area with a range finding laser »However, during time it takes to scan, time passes »Robot or environment may move while scan is being done As with partial observability, need a new extension to basic MDP model to address observation lag »In Semi-MDPs (SMDPs), an action causes  1 state change »SMDP extensions to MDPs exist for finding optimal policy

9 - Gill et al. – 9/14/2015 Neglect Tolerance Need to schedule >1 entire-system behavior at once »Can transform into scheduling interim sub-tasks as before »However, a behavior has own (possibly dynamic) structure »Navigation to cover a room, while mapping its boundary Resource contention, control/data dependence »Scheduling becomes a multi-criteria optimization »Sub-tasks may have (potentially hard) deadlines »E.g., decide to turn or stop before hitting a wall Spectrum: remote control to complete autonomy »Higher neglect tolerance needs more on-board scheduling »Uncertainty, observability, temporal lag issues as before »Open problem: formalize tractably, model parametrically »Multi-disciplinary (RT/ML) approach so far is still needed

10 - Gill et al. – 9/14/2015 Learning (aka “Good Scheduler, Bad Scheduler”) We base scheduling decisions on a value function »Captures state-action notion of long-term utility Based on expected rewards from current and future actions »But, knowing complete distributions is daunting in practice Reinforcement learning appears promising for this »A stochastic variant of dynamic programming »Control decisions learned from direct observation Start by dividing time into discrete steps »At each step, system is in one of a discrete set of states »Scheduler observes state, chooses action from finite set »Running action changes system state at next time step »Scheduler receives reward for immediate effect of action »Estimates value function, resulting model is exactly MDP

11 - Gill et al. – 9/14/2015 Related Work Reference monitor approaches »Interposition architectures E.g., Ostia: user/kernel-level (Garfinkel et al.) »Separation kernels E.g., ARINC-653, MILS (Vanfleet et al.) Scheduling policy design »Hierarchical scheduling E.g., HLS and its extensions (Regehr et al.) E.g., Group scheduling (Niehaus et al.) State space construction and verification »(Timed automata) model checking E.g., IF (Sifakis et al.) »Quasi-cyclic state space reduction E.g., Bogor (Robby et al.)

12 - Gill et al. – 9/14/2015 Concluding Remarks MDP approach maintains rational scheduling control »Even when thread run times vary stochastically »Encodes rather than presupposes utilizations »Allows policy verification (e.g., over utilization states) Ongoing and Future Work »State space reduction via quasi-cyclic structure »Verification over continuous/discrete states »Kernel-level non-bypassable policy enforcement »Automated learning to discover scheduling policies E.g., via RL for MDPs, POMDPs, SMDPs Project web page »Supported by NSF grant CNS » »