Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About the latest complexity bounds for Policy Iteration
Policy Iteration to solve Markov Decision Processes Order-Regular matrices: a powerful tool for the analysis
starting state How much will we pay in the long run?
starting state How much will we pay in the long run? cost vector
starting state How much will we pay in the long run? discount factor
Markov chains
Markov Decision Processes one action per state in general
action action cost transition probability Goal: find the optimal policy The value of a policy = the long term costs of the corresponding Markov chain Proposition: there always exists what we aim for !
How do we solve a Markov Decision Process ? Policy Iteration
P OLICY I TERATION
Choose an initial policy0. end while 1. Evaluate 2. Improve is the best action in each state based on P OLICY I TERATION compute
Choose an initial policy0. end while 1. Evaluate 2. Improve P OLICY I TERATION compute is the best action in each state based on
Choose an initial policy0. end while 1. Evaluate 2. Improve Stop ! We found the optimal policy P OLICY I TERATION compute is the best action in each state based on
Policy Iteration has exponential complexity Bad news: But we still aim for upper bounds… At least in general… [Fearnley 2010, Friedmann 2009, H. et al. 2012]
Policy Iteration needs at most iterations
Policy Iteration needs at most iterations [Mansour & Singh 1999]
Policy Iteration needs at most iterations [H. et al. 2014] not possible to improve using « standard » tools
Can we do even better?
The matrix is “Order-Regular”
How large are the largest Order-Regular matrices that we can build?
The answer of exhaustive search ?? Conjecture (Hansen & Zwick, 2012) the Fibonacci number the golden ratio
The answer of exhaustive search Theorem (H. et al., 2014) for (Proof: a “smart” exhaustive search)
How large are the largest Order-Regular matrices that we can build?
A constructive approach
Iterate and build matrices of size
Can we do better ?
Yes! We can build matrices of size
So, what do we know about Order-Regular matrices ?
currently the best bounds for MDPs
For papers and slides …and much more perso.uclouvain.be/romain.hollanders/
Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About the latest complexity bounds for Policy Iteration