Presentation is loading. Please wait.

Presentation is loading. Please wait.

Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About.

Similar presentations


Presentation on theme: "Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About."— Presentation transcript:

1 Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About the latest complexity bounds for Policy Iteration

2 Policy Iteration to solve Markov Decision Processes Order-Regular matrices: a powerful tool for the analysis

3

4

5

6

7

8

9

10

11

12 starting state How much will we pay in the long run?

13 starting state How much will we pay in the long run? cost vector

14 starting state How much will we pay in the long run? discount factor

15 Markov chains

16 Markov Decision Processes one action per state in general

17

18 action action cost transition probability Goal: find the optimal policy The value of a policy = the long term costs of the corresponding Markov chain Proposition: there always exists what we aim for !

19

20

21

22

23 How do we solve a Markov Decision Process ? Policy Iteration

24 P OLICY I TERATION

25 Choose an initial policy0. end while 1. Evaluate 2. Improve is the best action in each state based on P OLICY I TERATION compute

26 Choose an initial policy0. end while 1. Evaluate 2. Improve P OLICY I TERATION compute is the best action in each state based on

27 Choose an initial policy0. end while 1. Evaluate 2. Improve Stop ! We found the optimal policy P OLICY I TERATION compute is the best action in each state based on

28 Policy Iteration has exponential complexity Bad news: But we still aim for upper bounds… At least in general… [Fearnley 2010, Friedmann 2009, H. et al. 2012]

29 Policy Iteration needs at most iterations

30 Policy Iteration needs at most iterations [Mansour & Singh 1999]

31 Policy Iteration needs at most iterations [H. et al. 2014] not possible to improve using « standard » tools

32 Can we do even better?

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47 The matrix is “Order-Regular”

48

49

50 How large are the largest Order-Regular matrices that we can build?

51 The answer of exhaustive search ?? Conjecture (Hansen & Zwick, 2012) the Fibonacci number the golden ratio

52 The answer of exhaustive search Theorem (H. et al., 2014) for (Proof: a “smart” exhaustive search)

53 How large are the largest Order-Regular matrices that we can build?

54 A constructive approach

55

56 Iterate and build matrices of size

57 Can we do better ?

58 Yes! We can build matrices of size

59 So, what do we know about Order-Regular matrices ?

60 currently the best bounds for MDPs

61 For papers and slides …and much more perso.uclouvain.be/romain.hollanders/

62 Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About the latest complexity bounds for Policy Iteration


Download ppt "Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 About."

Similar presentations


Ads by Google