An Overview of Dynamic Programming Seminar Series Joe Hartman ISE October 14, 2004.

Slides:

Advertisements

Similar presentations

Dynamic Programming Rahul Mohare Faculty Datta Meghe Institute of Management Studies.

Advertisements

Markov Decision Process

Partially Observable Markov Decision Process (POMDP)

MIT and James Orlin © Dynamic Programming 1 –Recursion –Principle of Optimality.

Decision Theoretic Planning

Computational Stochastic Optimization:

Optimal Policies for POMDP Presented by Alp Sardağ.

Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.

An Introduction to Markov Decision Processes Sarah Hickmott

Markov Decision Processes

Infinite Horizon Problems

Planning under Uncertainty

POMDPs: Partially Observable Markov Decision Processes Advanced AI

CPSC 322, Lecture 35Slide 1 Finish VE for Sequential Decisions & Value of Information and Control Computer Science cpsc322, Lecture 35 (Textbook Chpt 9.4)

© 2003 Warren B. Powell Slide 1 Approximate Dynamic Programming for High Dimensional Resource Allocation NSF Electric Power workshop November 3, 2003 Warren.

Markov Decision Processes

Approximate Dynamic Programming for High-Dimensional Asset Allocation Ohio State April 16, 2004 Warren Powell CASTLE Laboratory Princeton University

Computational Methods for Management and Economics Carla Gomes

Nov 14 th  Homework 4 due  Project 4 due 11/26.

4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)

D Nagesh Kumar, IIScOptimization Methods: M5L4 1 Dynamic Programming Other Topics.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

By Saparila Worokinasih

MAKING COMPLEX DEClSlONS

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.

Modeling and simulation of systems Simulation optimization and example of its usage in flexible production system control.

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.

SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.

Linear Programming Topics General optimization model LP model and assumptions Manufacturing example Characteristics of solutions Sensitivity analysis Excel.

CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina

Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 31 Alternative System Description If all w k are given initially as Then,

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:

Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)

Practical Dynamic Programming in Ljungqvist – Sargent (2004) Presented by Edson Silveira Sobrinho for Dynamic Macro class University of Houston Economics.

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.

Dynamic Programming Applications Lecture 6 Infinite Horizon.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

ECE 466/658: Performance Evaluation and Simulation Introduction Instructor: Christos Panayiotou.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Dynamic Programming Discrete time frame Multi-stage decision problem Solves backwards.

© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

Operational Research & ManagementOperations Scheduling Economic Lot Scheduling 1.Summary Machine Scheduling 2.ELSP (one item, multiple items) 3.Arbitrary.

Tuesday, April 30 Dynamic Programming – Recursion – Principle of Optimality Handouts: Lecture Notes.

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Dynamic Programming. A Simple Example Capital as a State Variable.

1 Inventory Control with Time-Varying Demand. 2  Week 1Introduction to Production Planning and Inventory Control  Week 2Inventory Control – Deterministic.

Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.

Markov Decision Process (MDP)

DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.

Introduction and Preliminaries D Nagesh Kumar, IISc Water Resources Planning and Management: M4L1 Dynamic Programming and Applications.

1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Analytics and OR DP- summary.

Markov Decision Processes

Markov Decision Processes

Chapter 6. Large Scale Optimization

Announcements Homework 3 due today (grace period through Friday)

Markov Decision Problems

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

Markov Decision Processes

Markov Decision Processes

Discrete Optimization

Presentation transcript:

An Overview of Dynamic Programming Seminar Series Joe Hartman ISE October 14, 2004

Goals of this Talk Overview of Dynamic Programming Benefits of DP Difficulties of DP –Art vs. Science –Curse of Dimensionality Overcoming Difficulties –Approximation Methods

Dynamic Programming Introduced by Richard Bellman in the 1950s DP has many applications, but is best known for solving Sequential Decision Processes Equipment Replacement was one of the first applications.

Sequential Decision Processes At each stage in a process, a decision is made given the state of the system. Based on the decision and state, a reward or cost in incurred and the system transforms to another state where the process is repeated at the next stage. Goal is to find the optimal policy, which is the best decision for each state of the system

Stages Stages define when decisions are to be made. These are defined such that decisions can be ordered. Stages are generally discrete and numbered accordingly (1,2,3,…), however they may be continuous if decisions are made at arbitrary times

States A state is a description of one of the variables that describe the condition (state) of the system under study State space defined by all possible states which the system can achieve States may be single variables, vectors, or matrices States may be discrete or continuous, although usually made discrete for analysis

Decisions For each given state, there is a set of possible decisions that can be made Decisions are defined ONLY by the current state of the system at a given stage A decision or decision variable is one of the choices available from the decision set defined by the state of the system

Rewards and/or Costs Generally, a reward or cost is incurred when a decision is made for a given state in a given stage This reward is only based on the current state of the system and the decision

Transformation Once a decision has been made, the system transforms from an initial state to its final state according to a transformation function The transformation function and decision define how states change from stage to stage These transformations may be deterministic (known) or stochastic (random)

Policies A decision is made at each stage in the process As a number of stages are evaluated, the decisions for each state in each stage comprise a policy The set of all policies is the policy space

Returns A return function is defined for a given state and policy. The return is what is obtained if the process starts at a given state and decisions associated with the policy are used at each state which the process progresses through. The optimal policy achieves the optimal return (depends on min or max)

Functional Equation These terms are all defined in the functional equation, which is used to evaluate different policies (sets of decisions) State Stage Decision Set Decision Reward } } Transformation Function Discount Factor

Functional Equation May be stochastic in that the resulting state is probabilistic. Note the recursion is backwards here. } S represents set of possible outcomes with probability p for each outcome

Principle of Optimality Key (and intuitive) to Dynamic Programming: If we are in a given state, a necessary condition for optimality is that the remaining decisions must be chosen optimally with respect to that state.

Principle of Optimality Requires: Separability of the objective function –Allows for process to be analyzed in stages State separation property –Decisions for a given stage are only dependent on the current state of the system (not the past) –Markov property

Why Use DP? Extremely general in its ability to model systems Can tackle various “difficult” issues in optimization (i.e. non-linearity, integrality, infinite horizons) Ideal for “dynamic” processes

Why NOT Use DP? Curse of dimensionality: each dimension in the state space generally leads to an explosion of possible states = exponential run times There is no “software package” for solution Modeling is often an art… not science

Art vs. Science Many means to an end…. Let’s look at an equipment replacement problem.

Replacement Analysis Let’s put this all in the context of replacement analysis. Stage: Periods when keep/replace decisions are to be made. Generally years or quarters. State: Information to describe the system. For simplest problem, all costs are defined by the age of the asset. Thus, age is the state variable Decisions: Keep or replace the asset at each stage.

Replacement Analysis Reward and/or Costs: –Keep Decision: pay utilization cost –Replace Decision: receive salvage value, pay purchase and utilization cost Transformation: –Keep Decision: asset ages one period from stage to stage –Replace Decision: asset is new upon purchase, so it is one period old at end of stage Goal: Min costs or max returns over horizon

Replacement Analysis Let’s start easy, assume stationary costs. Assume the following notation: –Age of asset: i –Purchase Cost: P –Utilization Cost: C(i) –Salvage Value: S(i) Assume S and P occur at beginning of period and C occurs at end of period.

Example Many solutions approaches to problem -- even with DP! Map out decision possibilities and analyze by solving recursion backwards. Define the initial state and solve forwards (with reaching)

Decision Map i i+1 i+2 i+3 K K K K K K R R R R R R 0123T

Example Decision Map K K K K K K R R R R R R 0123T

Functional Equation Write functional equation: Write a boundary condition for the final period (where we sell the asset): Traditional approach: solve backwards.

Functional Equation Or the problem can be solved forwards, or with reaching. Functional equation does not change: Write a boundary condition for the initial period: Benefit: don’t have to build network first.

Art vs. Science However, there are more approaches…

Replacement Analysis II A new approach which mimics that of lot- sizing: Stage: Decision Period. State: Decision Period. Decisions: Number of periods to retain an asset.

Example Decision Map K1K1 K1K1 K1K1 K1K1 K2K2 K2K2 K2K2 K3K3 K3K3 K4K4

Functional Equation Can be solved forwards or backwards. Write a boundary condition for the final period:

Replacement Analysis III A new approach which mimics that of solving integer knapsack problems: Stage: One for each possible age of asset. State: Number of years of accumulated service. Decisions: Number of times an asset is utilized for a given length of time over the horizon. Note: this is only valid for stationary costs.

Example Decision Map i T/i 3i 2i 0 i i+T/j i+2j i+j 0

Functional Equation Can be solved forwards or backwards. Where: Write a boundary condition for the first period:

Art vs. Science Age as the state space: –Conceptually simple, easy to explain. Period as the state space: –Computationally efficient –Can be generalized to non-stationary costs, multiple challengers easily Length of service as the state space: –Easy to bound problem –Relates to infinite horizon solutions

Curse of Dimensionality To given an idea of state space explosion, consider a fleet management problem: –Assign trucks to loads –Loads must move from one destination to another within some given time frame –The arrivals of loads are probabilistic State space: number of trucks (given type) at each location in time.

Approximation Methods These can generally be categorized as follows: –Reduction in granularity –Interpolation –Policy Approximation –Bounding/Fathoming –Cost to Go Function Approximations Unfortunately, art wins over science here too. Requires intimate knowledge of problem.

Decision Network 0123T

Adjusting Granularity Simply remove the number of possible states. Instead of evaluating 1,2,3,…,10, evaluate 1,5,10. Advocate: Bellman

Granularity continued… Solve continuously finer granularity problems based on previous solution Advocates: Bean and Smith (Michigan), Bailey (Pittsburgh)

Interpolation Solve for some of the states exactly and then interpolate solutions for “skipped” states Advocates: Kitanidis (Stanford)

Interpolation Solve for some of the states exactly and then interpolate solutions for “skipped” states Advocates: Kitanidis (Stanford) Solve Exactly.

Interpolation Solve for some of the states exactly and then interpolate solutions for “skipped” states Advocates: Kitanidis (Stanford) Solve Exactly. Interpolate.

Interpolation Interpolations over the entire state space often called spline methods. Neural networks also used. Advocates: Johnson (WPI), Bertsekas (MIT) Solve Exactly. Interpolate.

Policy Approximation Reduce the number of possible decisions to evaluate This merely reduces the number of arcs in the network Advocates: Bellman

Fathoming Paths Like branch and bound: use an upper bound (to a minimization problem) to eliminate inferior decisions (paths) Note: typical DP must be solved completely in order to find an upper bound to a problem Most easily implemented in “forward” solution problems (not always possible) Advocate: Martsen

Approximating Cost to Go Functions This is the hot topic in approximation methods Highly problem specific Idea: –Solving a DP determines the “cost-to-go” value for each state in the system -- value or cost to move from that state in a given stage to the final state in the final stage. –If I know this function a priori (or can approximate), then I don’t need to solve the entire DP

Example: Fleet Management Number of Trucks Value For a given location…. If I know this function for each location, then this problem is solved…

How Approximate? Helps to know what the function looks like (can find by plotting small instances) Powell (Princeton): Simulate demand and solve the deterministic problem (as a network flow problem) –Repeat and take average of values of each state to approximate functions –Use dual variables from network solutions to build cost- to-go functions

How Approximate? Bertsimas (MIT) proposes the use of heuristics to approximate the value function Specifically, when solving a multidimensional knapsack problem, the value function is approximated by adaptively rounding LP relaxations to the problem.

Implementing Approximations Can use to approximate the final period values and then solve “full” DP from there Can use approximations for each state and just “read” solution from table (always approximating and updating approximations)

Summary DP is a very useful and powerful modeling tool Ideal for sequential decision processes Can handle various situations: integrality, non-linearity, or stochastic problems DP is limited by: –Art vs. science in modeling –Curse of dimensionality –No commercial software But approximation approaches show promise