Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Agent Shared Hierarchy Reinforcement Learning Neville Mehta Prasad Tadepalli School of Electrical Engineering and Computer Science Oregon State University.

Similar presentations


Presentation on theme: "Multi-Agent Shared Hierarchy Reinforcement Learning Neville Mehta Prasad Tadepalli School of Electrical Engineering and Computer Science Oregon State University."— Presentation transcript:

1 Multi-Agent Shared Hierarchy Reinforcement Learning Neville Mehta Prasad Tadepalli School of Electrical Engineering and Computer Science Oregon State University

2 2 Highlights  Sharing value functions  Coordination  Framework to express sharing & coordination with hierarchies  RTS domain

3 3 Previous Work  MAXQ, Options, ALisp  Coordination in the hierarchical setting (Makar, Mahadevan)  Sharing flat value functions (Tan)  Concurrent reinforcement learning for multiple effectors (Murthi, Russell, …)

4 4 Outline  Average Reward Learning  RTS domain  Hierarchical ARL  MASH framework  Experimental results  Conclusion & future work

5 5 SMDP  Semi-Markov Decision Process (SMDP) extends MDPs by allowing for temporally extended actions –States S –Actions A –Transition function P(s’, N|s, a) –Reward function R(s’|s, a) –Time function T(s’|s, a)  Given an SMDP, an agent in state s following policy , G a i n½ ¼ ( s ) = l i m N ! 1 E ( P N i = 0 r i ) E ( P N i = 0 t i )

6 6 Average Reward Learning  Taking action a in state s –Immediate reward r(s, a) –Action duration t(s, a)  Average-adjusted reward =  Optimal policy  * maximizes the RHS, and leads to the optimal gain h ¼ ( s 0 ) = E [( r ( s 0 ; a ) ¡ ½ t ( s 0 ; a )) + ( r ( s 1 ; a ) ¡ ½ t ( s 1 ; a )) + ¢¢¢ ] )h ¼ ( s 0 ) = E [ r ( s 0 ; a ) ¡ ½ t ( s 0 ; a )] + h ¼ ( s 1 ) s 0 s 1 s 2 s n s 0 s n r ( s 0 ; a 0 ) ¡ ½ t ( s 0 ; a 0 ) r ( s 0 ; a 1 ) ¡ ½ t ( s 0 ; a 1 ) r ( s 0 ; a 2 ) ¡ ½ t ( s 0 ; a 2 ) Parent task Child task r ( s ; a ) ¡ ½ ¼ t ( s ; a ) ½ ¼ ¤ ¸ ½ ¼

7 7 RTS Domain  Grid world domain  Multiple peasants mine resources (wood, gold) to replenish the home stock  Avoid collisions with one another  Attack the enemy’s base

8 8 RTS Domain Task Hierarchy Root Harvest(l)Deposit Goto(k) EastSouthNorthWest PickPut Offense(e) Idle Attack Primitive Task Composite Task  MAXQ task hierarchy –Original SMDP is split into sub-SMDPs (subtasks) –Solving the Root task solves the entire SMDP  Each subtask M i is defined by –State abstraction B i –Actions A i –Termination (goal) predicate G i

9 9 Hierarchical Average Reward Learning  Value function decomposition for a recursively gain-optimal policy in Hierarchical H learning:  If the state abstractions are sound,  Root task = Bellman equation h a ( B a ( s )) = h a ( s )

10 10 Hierarchical Average Reward Learning  No pseudo rewards  No completion function  Scheduling is a learned behavior

11 11 Hierarchical Average Reward Learning  Sharing requires coordination  Coordination part of state not action (Mahadevan)  No need for each subtask to see reward

12 12 Single Hierarchical Agent Root Harvest(W1) Goto(W1) North Root Harvest(l)Deposit Goto(k) EastSouthNorthWest PickPut Offense(e) Idle Attack

13 13 Simple Multi-Agent Setup Root Harvest(l)Deposit Goto(k) EastSouthNorthWest PickPut Offense(e) Idle Attack Root Offense(E1) Attack Root Harvest(l)Deposit Goto(k) EastSouthNorthWest PickPut Offense(e) Idle Attack Root Harvest(W1) Goto(W1) North

14 14 MASH Setup Root Harvest(l)Deposit Goto(k) EastSouthNorthWest PickPut Offense(e) Idle Attack Root Offense(E1) Attack Root Harvest(W1) Goto(W1) North

15 15 Experimental Results  2 agents in a 15 x 15 grid, Pr(Resource Regeneration) = 5%; Pr(Enemy) = 1%; Rewards = (-1, 100, -5, 50); 30 runs  4 agents in a 25 × 25 grid, Pr(Resource Regeneration) = 7.5%; Pr(Enemy) = 1%; Rewards = (0, 100, -5, 50); 30 runs  Couldn’t run separate agents coordination for 4 agents 25 × 25

16 16 Experimental Results

17 17 Experimental Results (2)

18 18 Conclusion  Sharing value functions  Coordination  Framework to express sharing & coordination with hierarchies

19 19 Future Work  Non-Markovian & non-stationary  Learning the task hierarchy –Task – subtask relationships –State abstractions –Termination conditions  Combining MASH framework with factored action models  Recognizing opportunities for sharing & coordination

20 20 Current Work  Murthi, Russell features


Download ppt "Multi-Agent Shared Hierarchy Reinforcement Learning Neville Mehta Prasad Tadepalli School of Electrical Engineering and Computer Science Oregon State University."

Similar presentations


Ads by Google