Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About.

Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About the latest complexity bounds for Policy Iteration

Policy Iteration to solve Markov Decision Processes Order-Regular matrices: a powerful tool for the analysis

starting state How much will we pay in the long run?

starting state How much will we pay in the long run? cost vector

starting state How much will we pay in the long run? discount factor

Markov chains

Markov Decision Processes one action per state in general

action action cost transition probability Goal: find the optimal policy The value of a policy = the long term costs of the corresponding Markov chain Proposition: there always exists what we aim for !

How do we solve a Markov Decision Process ? Policy Iteration

P OLICY I TERATION

Choose an initial policy0. end while 1. Evaluate 2. Improve is the best action in each state based on P OLICY I TERATION compute

Choose an initial policy0. end while 1. Evaluate 2. Improve P OLICY I TERATION compute is the best action in each state based on

Choose an initial policy0. end while 1. Evaluate 2. Improve Stop ! We found the optimal policy P OLICY I TERATION compute is the best action in each state based on

Policy Iteration has exponential complexity Bad news: But we still aim for upper bounds… At least in general… [Fearnley 2010, Friedmann 2009, H. et al. 2012]

Policy Iteration needs at most iterations

Policy Iteration needs at most iterations [Mansour & Singh 1999]

Policy Iteration needs at most iterations [H. et al. 2014] not possible to improve using « standard » tools

Can we do even better?

The matrix is “Order-Regular”

How large are the largest Order-Regular matrices that we can build?

The answer of exhaustive search ?? Conjecture (Hansen & Zwick, 2012) the Fibonacci number the golden ratio

The answer of exhaustive search Theorem (H. et al., 2014) for (Proof: a “smart” exhaustive search)

How large are the largest Order-Regular matrices that we can build?

A constructive approach

Iterate and build matrices of size

Can we do better ?

Yes! We can build matrices of size

So, what do we know about Order-Regular matrices ?

currently the best bounds for MDPs

For papers and slides …and much more perso.uclouvain.be/romain.hollanders/

Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About the latest complexity bounds for Policy Iteration

Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About.

Similar presentations

Presentation on theme: "Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About.

Similar presentations

Presentation on theme: "Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About."— Presentation transcript:

Similar presentations

About project

Feedback