Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discounted Deterministic Markov Decision Processes

Similar presentations

Presentation on theme: "Discounted Deterministic Markov Decision Processes"— Presentation transcript:

1 Discounted Deterministic Markov Decision Processes
and Discounted All-Pairs Shortest Paths Omid Madani – SRI International, AI center Mikkel Thorup – AT&T Labs, Research Uri Zwick – Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

2 Markov Decision Processes [Bellman ’57] [Howard ’60] …
States Actions Costs/Rewards Distributions Strategies Objectives

3 Markov Decision Processes [Bellman ’57] [Howard ’60] …
i-th action taken Limiting average version Discounted version Discount factor Optimal positional strategies can be found using LP Is there a strongly polynomial time algorithm?

4 Deterministic MDPs Limiting average version Discounted version
One player, deterministic actions

5 Deterministic MDPs = “The truck problem”
Traverse a single edge each day Maximize profit per day (in the “long run”)

6 Discounted Deterministic MDPs = “The discounted/unreliable truck problem”
Traverse a single edge each day Maximize (expected) total profit At each day, truck breaks down with prob. 1λ

7 Deterministic MDPs – limiting average version
For each vertex, find a cycle of minimum mean cost reachable from it [Karp ’78] O(mn) [Young-Tarjan-Orlin ’91] O(mn+n2log n) Better performance in practice

8 Discounted DMDPs … Both the path and the cycle matter
As 1, approaches the limiting average case

9 Discounted DMDPs – optimal strategies

10 Discounted DMDPs - results
Running time Authors O(n4) Papadimitriou-Tsitsiklis ’87 O(mn2log n) Madani ’02 O(mn2) Andersson-Vorobyov ’06 O(mn) New O(mn+n2log n) n – number of states/vertices m – number of actions/edges

11 Karp’s algorithm for finding minimum mean cost cycles [Karp’78]
dk(u) - the smallest cost of a k-edge path starting at u A cheapest n-edge path starting at v Complexity: O(mn) There is a vertex v such that all cycles on Pn(v) are optimal [Madani ’00]

12 Discounted DMDPs x(u) - The smallest discounted cost of an infinite path starting at u Each vertex has an optimal outgoing edge

13 A Karp-like/Value-iteration algorithm for DMDPs
dk(u) - smallest discounted cost of a k-path starting at u Claim 1: For every vV we have x(v)  y(v) Claim 2: On every optimal cycle there is at least one vertex v such that x(v) = y(v) How do we know who are the optimal cycles?

14 Discounted DMDPs – optimal strategies

15 A Karp-like algorithm for DMDPs
First Bellman-Ford phase: k = 1,2,…,n A Karp phase: Second Bellman-Ford phase: k = 1,2,…,n Theorem: For every vV we have x(v)=yn(v)

16 The Andersson-Vorobyov algorithm [’06]
We want to solve the following equations: Start with values that satisfy: If each vertex has a tight out-going edge, we are done Otherwise, increase values, in a controlled manner

17 The Andersson-Vorobyov algorithm [’06]
For each vertex, select an outgoing tight edge, if any. The result is a pseudo-forest. Cannot become tight Ideal! val(u)  c(u,v) + val(v) Tight for pseudo-forest edges val[u]  val[u] + depth[u] t for every u not in a pseudo-tree Pseudo-forest edges remain tight, for every t Find the smallest t for which a non-pseudo-forest edge becomes tight Sum of depths increases at most n2 iterations O(mn2) time

18 Speeding-up the algorithm
Do not reset the clock after each iteration The ‘time’ at which the edge (u,v) becomes tight

19 An O(mn+n2log n) algorithm
We would like to use Fibonacci heaps Unfortunately, time(u,v) may increase, as well as decrease Luckily, time(u) = minv time(u,v) can only decrease The resulting algorithm is similar to the algorithm of [Young-Tarjan-Orlin ’91] The Running time ‘typically’ (m + n log n) ???

20 Discounted All-Pairs Shortest Paths
Shortest paths may be ‘infinite’ a b a b c d The prefix of a shortest path is not necessarily a shortest path!

21 Discounted All-Pairs Shortest Paths
Naïve algorithm runs in O(n4)-time [Papadimitriou-Tsitsiklis ’87] A randomized O*(m1/2n2)-time algorithm

22 Open problems Equivalence of non-discounted DMDPs and discounted DMDPs? o(mn)-time algorithms? Different discount factors for different edges? (Non-deterministic) Markov Decision Processes?


Download ppt "Discounted Deterministic Markov Decision Processes"

Similar presentations

Ads by Google