Download presentation
Presentation is loading. Please wait.
Published bySudomo Setiabudi Modified over 6 years ago
1
Discounted Deterministic Markov Decision Processes
and Discounted All-Pairs Shortest Paths Omid Madani – SRI International, AI center Mikkel Thorup – AT&T Labs, Research Uri Zwick – Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA
2
Markov Decision Processes [Bellman ’57] [Howard ’60] …
States Actions Costs/Rewards Distributions … Strategies Objectives
3
Markov Decision Processes [Bellman ’57] [Howard ’60] …
i-th action taken Limiting average version Discounted version Discount factor Optimal positional strategies can be found using LP Is there a strongly polynomial time algorithm?
4
Deterministic MDPs Limiting average version Discounted version
One player, deterministic actions
5
Deterministic MDPs = “The truck problem”
Traverse a single edge each day Maximize profit per day (in the “long run”)
6
Discounted Deterministic MDPs = “The discounted/unreliable truck problem”
Traverse a single edge each day Maximize (expected) total profit At each day, truck breaks down with prob. 1λ
7
Deterministic MDPs – limiting average version
For each vertex, find a cycle of minimum mean cost reachable from it [Karp ’78] O(mn) [Young-Tarjan-Orlin ’91] O(mn+n2log n) Better performance in practice
8
Discounted DMDPs … Both the path and the cycle matter
As 1, approaches the limiting average case
9
Discounted DMDPs – optimal strategies
10
Discounted DMDPs - results
Running time Authors O(n4) Papadimitriou-Tsitsiklis ’87 O(mn2log n) Madani ’02 O(mn2) Andersson-Vorobyov ’06 O(mn) New O(mn+n2log n) n – number of states/vertices m – number of actions/edges
11
Karp’s algorithm for finding minimum mean cost cycles [Karp’78]
dk(u) - the smallest cost of a k-edge path starting at u A cheapest n-edge path starting at v Complexity: O(mn) There is a vertex v such that all cycles on Pn(v) are optimal [Madani ’00]
12
Discounted DMDPs x(u) - The smallest discounted cost of an infinite path starting at u Each vertex has an optimal outgoing edge
13
A Karp-like/Value-iteration algorithm for DMDPs
dk(u) - smallest discounted cost of a k-path starting at u Claim 1: For every vV we have x(v) y(v) Claim 2: On every optimal cycle there is at least one vertex v such that x(v) = y(v) How do we know who are the optimal cycles?
14
Discounted DMDPs – optimal strategies
15
A Karp-like algorithm for DMDPs
First Bellman-Ford phase: k = 1,2,…,n A Karp phase: Second Bellman-Ford phase: k = 1,2,…,n Theorem: For every vV we have x(v)=yn(v)
16
The Andersson-Vorobyov algorithm [’06]
We want to solve the following equations: Start with values that satisfy: If each vertex has a tight out-going edge, we are done Otherwise, increase values, in a controlled manner
17
The Andersson-Vorobyov algorithm [’06]
For each vertex, select an outgoing tight edge, if any. The result is a pseudo-forest. Cannot become tight Ideal! val(u) c(u,v) + val(v) Tight for pseudo-forest edges val[u] val[u] + depth[u] t for every u not in a pseudo-tree Pseudo-forest edges remain tight, for every t Find the smallest t for which a non-pseudo-forest edge becomes tight Sum of depths increases at most n2 iterations O(mn2) time
18
Speeding-up the algorithm
Do not reset the clock after each iteration The ‘time’ at which the edge (u,v) becomes tight
19
An O(mn+n2log n) algorithm
We would like to use Fibonacci heaps Unfortunately, time(u,v) may increase, as well as decrease Luckily, time(u) = minv time(u,v) can only decrease The resulting algorithm is similar to the algorithm of [Young-Tarjan-Orlin ’91] The Running time ‘typically’ (m + n log n) ???
20
Discounted All-Pairs Shortest Paths
Shortest paths may be ‘infinite’ a b a b c d The prefix of a shortest path is not necessarily a shortest path!
21
Discounted All-Pairs Shortest Paths
Naïve algorithm runs in O(n4)-time [Papadimitriou-Tsitsiklis ’87] A randomized O*(m1/2n2)-time algorithm
22
Open problems Equivalence of non-discounted DMDPs and discounted DMDPs? o(mn)-time algorithms? Different discount factors for different edges? (Non-deterministic) Markov Decision Processes?
23
THE END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.