Download presentation
Presentation is loading. Please wait.
Published byCleopatra Gilbert Modified over 8 years ago
1
Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About the latest complexity bounds for Policy Iteration
2
Policy Iteration to solve Markov Decision Processes Order-Regular matrices: a powerful tool for the analysis
12
starting state How much will we pay in the long run?
13
starting state How much will we pay in the long run? cost vector
14
starting state How much will we pay in the long run? discount factor
15
Markov chains
16
Markov Decision Processes one action per state in general
18
action action cost transition probability Goal: find the optimal policy The value of a policy = the long term costs of the corresponding Markov chain Proposition: there always exists what we aim for !
23
How do we solve a Markov Decision Process ? Policy Iteration
24
P OLICY I TERATION
25
Choose an initial policy0. end while 1. Evaluate 2. Improve is the best action in each state based on P OLICY I TERATION compute
26
Choose an initial policy0. end while 1. Evaluate 2. Improve P OLICY I TERATION compute is the best action in each state based on
27
Choose an initial policy0. end while 1. Evaluate 2. Improve Stop ! We found the optimal policy P OLICY I TERATION compute is the best action in each state based on
28
Policy Iteration has exponential complexity Bad news: But we still aim for upper bounds… At least in general… [Fearnley 2010, Friedmann 2009, H. et al. 2012]
29
Policy Iteration needs at most iterations
30
Policy Iteration needs at most iterations [Mansour & Singh 1999]
31
Policy Iteration needs at most iterations [H. et al. 2014] not possible to improve using « standard » tools
32
Can we do even better?
47
The matrix is “Order-Regular”
50
How large are the largest Order-Regular matrices that we can build?
51
The answer of exhaustive search ?? Conjecture (Hansen & Zwick, 2012) the Fibonacci number the golden ratio
52
The answer of exhaustive search Theorem (H. et al., 2014) for (Proof: a “smart” exhaustive search)
53
How large are the largest Order-Regular matrices that we can build?
54
A constructive approach
56
Iterate and build matrices of size
57
Can we do better ?
58
Yes! We can build matrices of size
59
So, what do we know about Order-Regular matrices ?
60
currently the best bounds for MDPs
61
For papers and slides …and much more perso.uclouvain.be/romain.hollanders/
62
Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About the latest complexity bounds for Policy Iteration
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.