ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.

Slides:

Advertisements

Similar presentations

Problem solving with graph search

Advertisements

Algorithm Design Methods (I) Fall 2003 CSE, POSTECH.

Lindsey Bleimes Charlie Garrod Adam Meyerson

Markov Decision Process

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Partially Observable Markov Decision Process (POMDP)

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.

Dynamic Bayesian Networks (DBNs)

Decision Theoretic Planning

Reinforcement Learning

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

An Introduction to Markov Decision Processes Sarah Hickmott

Markov Decision Processes

Infinite Horizon Problems

Planning under Uncertainty

1 Stochastic Event Capture Using Mobile Sensors Subject to a Quality Metric Nabhendra Bisnik, Alhussein A. Abouzeid, and Volkan Isler Rensselaer Polytechnic.

Kuang-Hao Liu et al Presented by Xin Che 11/18/09.

Dynamic Power Management for Systems with Multiple Power Saving States Sandy Irani, Sandeep Shukla, Rajesh Gupta.

Recent Development on Elimination Ordering Group 1.

Markov Decision Processes

Math443/543 Mathematical Modeling and Optimization

4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)

Discretization Pieter Abbeel UC Berkeley EECS

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.

CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.

Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

Clearing Algorithms for Barter Exchange Markets: Enabling Nationwide Kidney Exchanges Hyunggu Jung Computer Science University of Waterloo Oct 6, 2008.

Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.

Random Sampling, Point Estimation and Maximum Likelihood.

1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.

1 Statistical Distribution Fitting Dr. Jason Merrick.

Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Presented by: Meysam rahimi

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

MDPs (cont) & Reinforcement Learning

© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

1 Random Disambiguation Paths Al Aksakalli In Collaboration with Carey Priebe & Donniell Fishkind Department of Applied Mathematics and Statistics Johns.

Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

Keep the Adversary Guessing: Agent Security by Policy Randomization

Advanced Algorithms Analysis and Design

Major Design Strategies

Analytics and OR DP- summary.

Reinforcement Learning (1)

Task: It is necessary to choose the most suitable variant from some set of objects by those or other criteria.

CPU Scheduling G.Anuradha

CS 188: Artificial Intelligence

Objective of This Course

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

traveling salesman problem

Markov Decision Problems

CS 416 Artificial Intelligence

Major Design Strategies

Markov Decision Processes

Major Design Strategies

Markov Decision Processes

Presentation transcript:

ANDREW MAO, STACY WONG Regrets and Kidneys

Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under time constraints  Limits amount of sampling/simulation Solve these problems with two black boxes:  Conditional sampling distribution  Offline optimization algorithm Why is the problem nontrivial?

Modeling with a Special Class of MDPs Consider exogenous Markov Decision Process (X-MDP) as a way to model certain problems  Exogenous uncertainty - random events are independent of our actions  We can sample arbitrarily far into the future, then optimize actions with respect to samples  Traditional methods to solve MDPs: search algorithm, Bellman equations  The following problems and algorithms all apply to this case

Motivation: Packet Scheduling Problem Schedule unknown sequence of jobs Goal: Maximize total value of jobs processed Assumptions:  Set of jobs Jobs with different arrival times  Schedule horizon H = [H min, H max ] to schedule jobs  Each job j has weight w(j)  j requires a single time unit to process  Only one job per time unit  j must be scheduled in time window [a(j), a(j) + d]  d is the same for all j  Schedule is a function σ: H --> Jobs

Generic Online Optimization Algorithm

Oblivious Algorithms Greedy Algorithm Local Optimal Algorithm

Stochastic: Expectation Algorithm Bad complexity - requires O = m|A| for only m samples, small if O fixed and |A| large Estimates are very optimistic: overestimate bias in score function (but for every action!) Maximizes expected value of serving this request

Stochastic: Consensus Algorithm  This is an "elitist" algorithm; the mode wins!  Expected value (and distribution of values) ignored C+E k algorithm Quantitative consensus (QC) algorithm

A New Approach: Regret  "Regret" -- what would happen if we chose a sub-optimal solution now?  Assume we can cheaply estimate the regret of a request r at time t  Combines stochastic optimization for each sample, with the ability to do an estimate for each action  We use RegretUB() for an upper bound on regret

Stochastic: Regret Algorithm Perform offline optimization once per sample

Results

Discussion of Stochastic Algorithms How many times do we use the offline optimization and how efficiently (i.e. |A| times per sample, or once per sample)  Normalized for algorithms described here Regret algorithm depends on the cheapness and accuracy of the RegretUB estimator E is R with an exact regret QC is R with worst case regret (fastest)

Kidney Problem Formulation Patients, donors arrive and expire (literally) Blood and tissue types can be modeled as a (conditional) distribution Randomness not dependent on actions we can use scenario sampling as described earlier Problem Formulation:  Directed graph, vertices are patient-donor pairs, compatibilities are edges.  Each edge (u -> v) is assigned a weight which describes "goodness" of donor kidney in v for patient in vertex u  L is limit for cycle length, usually 3  Maximization problem

The problem with prior-free approach Proposition 1  No deterministic prior-free algorithm can do better than L/2 Proposition 2  No randomized prior-free algorithm can do better than (2L-2)/L - approximately 2 only 75% of possible lives saved in this case.

Adapting Regrets Algorithm to Kidneys The action space for kidneys is to enumerate the set of all vertex disjoint cycles processed at time t  Exponential in the number of cycles; not tractable So, we split sets of cycles into individual cycles for optimization, then use another optimization to recombine cycles Do we lose correlated information? Assume number of cycles is still manageable if length limited

A1 “Direct adaptation of regrets” Is this a correct implementation of regrets? Somewhat resembles the QC algorithm

Algorithm 1 is not optimal!

A2 Optimizing regrets for each action What is the complexity of this algorithm?  Are the number of offline optimizations normalized?  According to simulation, the number of scenarios used to test algorithms 1 and 2 are the same.

Do Regrets make you GAG? Local Anticipatory Gap – the minimum expected loss in some given state by taking a certain action Intuitive explanation - LAG is low if there is an action that is close to optimal for most of the future scenarios Global Anticipatory Gap – A (maximal) sum of the local anticipatory gaps A smaller GAG is correlated to better performance of regrets-based algorithms

A3 AMSAA algorithm using MDPs Converts exogenous MDP to normal MDP, solving with heuristic search algorithm  Bad scaling properties. Could not be used for simulated data

Characterization of results Both of the following increase complexity:  Look-ahead period is the number of time steps into the future  No. of samples determines how many random paths are considered With kidney data, batch size is the time step over which actions are taken Things to note:  Algorithm 1 was ‘questionably’ regrets  Algorithms 1 and 2 ran with the same number of scenarios, but runtime differences were not noted  Algorithm 3 did not scale for large data sets (even though it was the most theoretically complex)  What do you think the contributions of these algorithms were?

Results Graphs of performance tuned to optimal parameters Is it reasonable to compare algorithms that are not normalized (in look-ahead, and no. of samples) Why use algorithm 3 if it doesn't scale?

Recap We examined online stochastic optimization problems for MDPs where randomness is not dependent on past actions Algorithms like regrets outperform others by maximizing the number of samples taken as well as information extracted from each sample, for each action In real problems such as online kidney exchange, we make adaptations such as breaking down action space into discrete problems and recombining For discrete real-time data, batch size and look-ahead are parameters that need to be calibrated Normalizing performance of algorithms for comparison is important!

THANKS! Questions?