© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

Slides:



Advertisements
Similar presentations
1 SOFSEM 2007 Weighted Nearest Neighbor Algorithms for the Graph Exploration Problem on Cycles Eiji Miyano Kyushu Institute of Technology, Japan Joint.
Advertisements

Traveling Salesperson Problem
Types of Algorithms.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
CSC 2300 Data Structures & Algorithms April 13, 2007 Chapter 9. Graph Algorithms.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Decision Theoretic Planning
© 2008 Warren B. Powell 1. Optimal Learning Informs TutORials October, 2008 Warren Powell Peter Frazier Princeton University © 2008 Warren B. Powell, Princeton.
Optimal Policies for POMDP Presented by Alp Sardağ.
Hierarchical Decompositions for Congestion Minimization in Networks Harald Räcke 1.
Entropy Rates of a Stochastic Process
Infinite Horizon Problems
Planning under Uncertainty
Visual Recognition Tutorial
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
8 TECHNIQUES OF INTEGRATION. In defining a definite integral, we dealt with a function f defined on a finite interval [a, b] and we assumed that f does.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
The Theory of NP-Completeness
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
© 2009 Warren B. Powell 1. Optimal Learning for Homeland Security CCICADA Workshop Morgan State, Baltimore, Md. March 7, 2010 Warren Powell With research.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
APPLICATIONS OF DIFFERENTIATION 4. In Sections 2.2 and 2.4, we investigated infinite limits and vertical asymptotes.  There, we let x approach a number.
The Shortest Path Problem
MAKING COMPLEX DEClSlONS
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
Copyright © Cengage Learning. All rights reserved. 3 Applications of Differentiation.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The knowledge gradient December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
LIMITS AND DERIVATIVES 2. In Sections 2.2 and 2.4, we investigated infinite limits and vertical asymptotes.  There, we let x approach a number.  The.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
1 Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron Scheduling for a Climate Forecast Application ANR-05-CIGC-11.
Copyright © Cengage Learning. All rights reserved. 2 Limits and Derivatives.
INFORMS Annual Meeting San Diego 1 HIERARCHICAL KNOWLEDGE GRADIENT FOR SEQUENTIAL SAMPLING Martijn Mes Department of Operational Methods for.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Easiest-to-Reach Neighbor Search Fatimah Aldubaisi.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Finite-Horizon Energy Allocation and Routing Scheme in Rechargeable Sensor Networks Shengbo Chen, Prasun Sinha, Ness Shroff, Changhee Joo Electrical and.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
Stochastic Optimization for Markov Modulated Networks with Application to Delay Constrained Wireless Scheduling Michael J. Neely University of Southern.
MAIN RESULT: We assume utility exhibits strategic complementarities. We show: Membership in larger k-core implies higher actions in equilibrium Higher.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
© 2008 Warren B. Powell 1. Optimal Learning Informs TutORials October, 2008 Warren Powell Peter Frazier With research by Ilya Ryzhov Princeton University.
Analytics and OR DP- summary.
Copyright © Cengage Learning. All rights reserved.
Chapter 5. Optimal Matchings
Policy Gradient in Continuous Time
Maximal Independent Set
Markov Decision Problems
Presentation transcript:

© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton University © 2009 Ilya O. Ryzhov, Princeton University

© 2009 Ilya O. Ryzhov 22 Motivation Learning on a graph » Need to quickly plan the fastest (least congested) travel route » GPS-enabled smartphones in the area can provide an estimate of local congestion » We can make a small number of queries before we have to recommend a route » Which areas should we measure in the limited time available? » We are solving a problem on a graph, but we can measure any individual component of the graph at any time

© 2009 Ilya O. Ryzhov 33 Information collection on a graph We have a shortest-path problem on a graph : If the edge lengths, were deterministic, the problem would have a simple solution » Algorithms by Bellman, Bellman-Ford, Dijkstra…

© 2009 Ilya O. Ryzhov 44 Information collection on a graph We have a shortest-path problem on a graph: If the edge lengths were stochastic with known distribution: » We could run a deterministic shortest-path algorithm with edge lengths given by » We could compute or approximate the distribution of the stochastic shortest path (Kulkarni 1986, Fan et al. 2005, Peer & Sharma 2007)

© 2009 Ilya O. Ryzhov 55 Information collection on a graph We have a shortest-path problem on a graph: In the problem of learning on a graph, the edge lengths are stochastic, with unknown distribution We use Bayesian statistics to learn the distributions sequentially

© 2009 Ilya O. Ryzhov 6 Information collection on a graph At first, we believe that But we measure this edge and observe Our beliefs change: Thus, our beliefs about the rewards are gradually improved over measurements 6 i j i j i j

© 2009 Ilya O. Ryzhov 7 Information collection on a graph After n measurements, our beliefs about the entire graph are encoded in the knowledge state: We can solve a deterministic shortest-path problem with edge lengths given by This gives us a path p n that seems to be the shortest, based on our beliefs » The length of this path is believed to be This is not necessarily the real shortest path » The true length of path p n is » The true length of the real shortest path is

© 2009 Ilya O. Ryzhov 88 Information collection on a graph Optimal routing over a graph » The best path according to our beliefs The black path is the path p n, with time-n length.

© 2009 Ilya O. Ryzhov 99 Information collection on a graph Optimal routing over a graph » The best path according to our beliefs » The edge we measure The black path is the path p n, with time-n length.

© 2009 Ilya O. Ryzhov 10 Information collection on a graph Optimal routing over a graph » The best path according to our beliefs » The edge we measure » The best path according to our new beliefs » How do we decide which links to measure? The black path is the path p n+1, with time-(n+1) length.

© 2009 Ilya O. Ryzhov 11 Learning policies Let be a function that takes the knowledge state and gives us an edge to measure A learning policy is a set of such functions Simple examples of learning policies: » Pure exploitation: find the time-n shortest path, then measure the shortest edge on that path » Variance-exploitation: find the time-n shortest path, and then measure the edge that we are least certain about

© 2009 Ilya O. Ryzhov 12 Implementation policies The problem consists of two phases: » Learning ( ): measuring individual edges » Implementation ( ): choosing a path An implementation policy is a single function which maps the final state to some path Simple examples of implementation policies: » Find the path p N : solve a deterministic shortest-path problem with edge lengths given by » -percentile: solve a deterministic shortest-path problem with edge lengths given by

© 2009 Ilya O. Ryzhov 13 Objective function Choose a measurement policy and an implementation policy to minimize the true length of the path chosen by the implementation policy Objective:.

© 2009 Ilya O. Ryzhov 14 Learning policies Theorem. The best possible implementation policy is the one that finds the path p N This result eliminates the problem of finding an implementation policy We only have to find a learning policy that makes our estimate of small

© 2009 Ilya O. Ryzhov 15 The KG decision rule: one-period look-ahead The KG rule chooses an edge to maximize the expected one- period improvement in our estimate of the shortest path

© 2009 Ilya O. Ryzhov 16 Learning using knowledge gradients Proposition. If we measure the edge at time n, then the best path at time n+1 (the path p n+1 that achieves ) will be either » The best time-n path containing the edge, or » The best time-n path not containing the edge. At time n, we know that the best time-(n+1) path can only be one of two things

© 2009 Ilya O. Ryzhov 17 Computation of the knowledge gradient The best path containing the edge

© 2009 Ilya O. Ryzhov 18 Computation of the knowledge gradient The best path not containing the edge

© 2009 Ilya O. Ryzhov 19 Main result: KG formula It can be shown that where and - standard normal cdf and pdf - time-n length of the best path containing - time-n length of the best path not containing 19 The marginal value of a measurement is bigger if these values are closer together

© 2009 Ilya O. Ryzhov 20 Asymptotic optimality property Jensen’s inequality gives a global lower bound on the value of any policy: Theorem. If the number of measurements is infinite, the KG policy attains the global lower bound. If we have infinitely many measurements, then the KG policy will find the true shortest path.

© 2009 Ilya O. Ryzhov 21 Asymptotic optimality property The proof is technical, but the key detail is The KG factor of an edge is zero if and only if the length of that edge is known perfectly (with infinite precision) It can be shown that the KG factor is continuous in The precision always increases when we measure (i,j) As we measure (i,j) more often, we have Since we measure the edge with the largest KG, eventually we will switch over to another edge

© 2009 Ilya O. Ryzhov 22 Asymptotic optimality property There are many simple methods that are asymptotically optimal » If we have infinitely many measurements, we could just measure every edge in a round-robin fashion However, KG is also myopically optimal » If N=1, KG allocates the sole measurement optimally KG is the only stationary method that is both myopically and asymptotically optimal This suggests that KG may yield good performance for general finite time horizons

© 2009 Ilya O. Ryzhov 23 Knowledge gradient on a graph Consider a simple layered graph (14 nodes, 24 edges) The true shortest path is highlighted in black The path that we think is the shortest is highlighted in blue Let’s see how the KG method changes our beliefs about the best path

© 2009 Ilya O. Ryzhov 24 Knowledge gradient on a graph Edge measured by KG: (5,8) Our beliefs about this edge have increased enough to change our beliefs about the best path!

© 2009 Ilya O. Ryzhov 25 Knowledge gradient on a graph Edge measured by KG: (1,5) Our beliefs about this edge have increased enough to change our beliefs about the best path!

© 2009 Ilya O. Ryzhov 26 Knowledge gradient on a graph Edge measured by KG: (2,7) Not every measurement changes our beliefs about the best path…

© 2009 Ilya O. Ryzhov 27 Knowledge gradient on a graph Edge measured by KG: (7,10) Notice how we always measure edges that are close to the blue path, but not always on it

© 2009 Ilya O. Ryzhov 28 Knowledge gradient on a graph Edges measured: (1,2), (5,8), (1,5), (2,7), (7,10) We have found the best path!

© 2009 Ilya O. Ryzhov 29 Experimental results Ten layered graphs (22 nodes, 50 edges) Ten larger layered graphs (38 nodes, 102 edges)

© 2009 Ilya O. Ryzhov 30 Conclusion We have defined a new class of optimal learning problems, beyond the scope of the traditional literature We have derived a one-period look-ahead method for the problem of learning on a graph The method produces an easily computable decision rule and has certain theoretical advantages » Optimal for N=1 by design: if we have only one measurement, we get as much value out of it as possible » Asymptotic optimality: if we have infinitely many measurements, we find the true shortest path Experimental evidence shows that KG performs well for values of N in between