Download presentation
Presentation is loading. Please wait.
Published byReginald Snow Modified over 9 years ago
1
© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton University © 2009 Ilya O. Ryzhov, Princeton University
2
© 2009 Ilya O. Ryzhov 22 Motivation Learning on a graph » Need to quickly plan the fastest (least congested) travel route » GPS-enabled smartphones in the area can provide an estimate of local congestion » We can make a small number of queries before we have to recommend a route » Which areas should we measure in the limited time available? » We are solving a problem on a graph, but we can measure any individual component of the graph at any time
3
© 2009 Ilya O. Ryzhov 33 Information collection on a graph We have a shortest-path problem on a graph : If the edge lengths, were deterministic, the problem would have a simple solution » Algorithms by Bellman, Bellman-Ford, Dijkstra…
4
© 2009 Ilya O. Ryzhov 44 Information collection on a graph We have a shortest-path problem on a graph: If the edge lengths were stochastic with known distribution: » We could run a deterministic shortest-path algorithm with edge lengths given by » We could compute or approximate the distribution of the stochastic shortest path (Kulkarni 1986, Fan et al. 2005, Peer & Sharma 2007)
5
© 2009 Ilya O. Ryzhov 55 Information collection on a graph We have a shortest-path problem on a graph: In the problem of learning on a graph, the edge lengths are stochastic, with unknown distribution We use Bayesian statistics to learn the distributions sequentially
6
© 2009 Ilya O. Ryzhov 6 Information collection on a graph At first, we believe that But we measure this edge and observe Our beliefs change: Thus, our beliefs about the rewards are gradually improved over measurements 6 i j i j i j
7
© 2009 Ilya O. Ryzhov 7 Information collection on a graph After n measurements, our beliefs about the entire graph are encoded in the knowledge state: We can solve a deterministic shortest-path problem with edge lengths given by This gives us a path p n that seems to be the shortest, based on our beliefs » The length of this path is believed to be This is not necessarily the real shortest path » The true length of path p n is » The true length of the real shortest path is
8
© 2009 Ilya O. Ryzhov 88 Information collection on a graph Optimal routing over a graph » The best path according to our beliefs The black path is the path p n, with time-n length.
9
© 2009 Ilya O. Ryzhov 99 Information collection on a graph Optimal routing over a graph » The best path according to our beliefs » The edge we measure The black path is the path p n, with time-n length.
10
© 2009 Ilya O. Ryzhov 10 Information collection on a graph Optimal routing over a graph » The best path according to our beliefs » The edge we measure » The best path according to our new beliefs » How do we decide which links to measure? The black path is the path p n+1, with time-(n+1) length.
11
© 2009 Ilya O. Ryzhov 11 Learning policies Let be a function that takes the knowledge state and gives us an edge to measure A learning policy is a set of such functions Simple examples of learning policies: » Pure exploitation: find the time-n shortest path, then measure the shortest edge on that path » Variance-exploitation: find the time-n shortest path, and then measure the edge that we are least certain about
12
© 2009 Ilya O. Ryzhov 12 Implementation policies The problem consists of two phases: » Learning ( ): measuring individual edges » Implementation ( ): choosing a path An implementation policy is a single function which maps the final state to some path Simple examples of implementation policies: » Find the path p N : solve a deterministic shortest-path problem with edge lengths given by » -percentile: solve a deterministic shortest-path problem with edge lengths given by
13
© 2009 Ilya O. Ryzhov 13 Objective function Choose a measurement policy and an implementation policy to minimize the true length of the path chosen by the implementation policy Objective:.
14
© 2009 Ilya O. Ryzhov 14 Learning policies Theorem. The best possible implementation policy is the one that finds the path p N This result eliminates the problem of finding an implementation policy We only have to find a learning policy that makes our estimate of small
15
© 2009 Ilya O. Ryzhov 15 The KG decision rule: one-period look-ahead The KG rule chooses an edge to maximize the expected one- period improvement in our estimate of the shortest path
16
© 2009 Ilya O. Ryzhov 16 Learning using knowledge gradients Proposition. If we measure the edge at time n, then the best path at time n+1 (the path p n+1 that achieves ) will be either » The best time-n path containing the edge, or » The best time-n path not containing the edge. At time n, we know that the best time-(n+1) path can only be one of two things
17
© 2009 Ilya O. Ryzhov 17 Computation of the knowledge gradient The best path containing the edge
18
© 2009 Ilya O. Ryzhov 18 Computation of the knowledge gradient The best path not containing the edge
19
© 2009 Ilya O. Ryzhov 19 Main result: KG formula It can be shown that where and - standard normal cdf and pdf - time-n length of the best path containing - time-n length of the best path not containing 19 The marginal value of a measurement is bigger if these values are closer together
20
© 2009 Ilya O. Ryzhov 20 Asymptotic optimality property Jensen’s inequality gives a global lower bound on the value of any policy: Theorem. If the number of measurements is infinite, the KG policy attains the global lower bound. If we have infinitely many measurements, then the KG policy will find the true shortest path.
21
© 2009 Ilya O. Ryzhov 21 Asymptotic optimality property The proof is technical, but the key detail is The KG factor of an edge is zero if and only if the length of that edge is known perfectly (with infinite precision) It can be shown that the KG factor is continuous in The precision always increases when we measure (i,j) As we measure (i,j) more often, we have Since we measure the edge with the largest KG, eventually we will switch over to another edge
22
© 2009 Ilya O. Ryzhov 22 Asymptotic optimality property There are many simple methods that are asymptotically optimal » If we have infinitely many measurements, we could just measure every edge in a round-robin fashion However, KG is also myopically optimal » If N=1, KG allocates the sole measurement optimally KG is the only stationary method that is both myopically and asymptotically optimal This suggests that KG may yield good performance for general finite time horizons
23
© 2009 Ilya O. Ryzhov 23 Knowledge gradient on a graph Consider a simple layered graph (14 nodes, 24 edges) The true shortest path is highlighted in black The path that we think is the shortest is highlighted in blue Let’s see how the KG method changes our beliefs about the best path 1 2 4 3 5 6 8 7 9 10 12 11 13 14
24
© 2009 Ilya O. Ryzhov 24 Knowledge gradient on a graph Edge measured by KG: (5,8) Our beliefs about this edge have increased enough to change our beliefs about the best path! 1 2 4 3 5 6 8 7 9 10 12 11 13 14
25
© 2009 Ilya O. Ryzhov 25 Knowledge gradient on a graph Edge measured by KG: (1,5) Our beliefs about this edge have increased enough to change our beliefs about the best path! 1 2 4 3 5 6 8 7 9 10 12 11 13 14
26
© 2009 Ilya O. Ryzhov 26 Knowledge gradient on a graph Edge measured by KG: (2,7) Not every measurement changes our beliefs about the best path… 1 2 4 3 5 6 8 7 9 10 12 11 13 14
27
© 2009 Ilya O. Ryzhov 27 Knowledge gradient on a graph Edge measured by KG: (7,10) Notice how we always measure edges that are close to the blue path, but not always on it 1 2 4 3 5 6 8 7 9 10 12 11 13 14
28
© 2009 Ilya O. Ryzhov 28 Knowledge gradient on a graph Edges measured: (1,2), (5,8), (1,5), (2,7), (7,10) We have found the best path! 1 2 4 3 5 6 8 7 9 10 12 11 13 14
29
© 2009 Ilya O. Ryzhov 29 Experimental results Ten layered graphs (22 nodes, 50 edges) Ten larger layered graphs (38 nodes, 102 edges)
30
© 2009 Ilya O. Ryzhov 30 Conclusion We have defined a new class of optimal learning problems, beyond the scope of the traditional literature We have derived a one-period look-ahead method for the problem of learning on a graph The method produces an easily computable decision rule and has certain theoretical advantages » Optimal for N=1 by design: if we have only one measurement, we get as much value out of it as possible » Asymptotic optimality: if we have infinitely many measurements, we find the true shortest path Experimental evidence shows that KG performs well for values of N in between
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.