Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne
Presentation Outline Project Motivation Project Aim Rules of the Gridworld Flat Reinforcement Learning Feudal Reinforcement Learning State Variable Combination Approach
Project Motivation Reinforcement Learning is an attractive form of machine learning, but because of the curse of dimensionality, with complex problems it becomes inefficient Hierarchical Reinforcement Learning is a method for dealing with this curse of dimensionality
Project Aim Implementing various algorithms of Hierarchical Reinforcement Learning to a complex gridworld problem Comparing the various algorithms to each other and to flat Reinforcement Learning
Rules of the gridworld Possible Actions: Left, Right, Up, Down and Rest Collecting food and drink increases nourishment and hydration respectively After landing on the tree, the creature is carrying wood which it can use to repair its shelter
Rules of the gridworld Resting in a repaired shelter increases health in proportion to the shelter condition Landing on the lion decreases health and results in a direct punishment After every 4 steps, nourishment, hydration, and shelter condition decrease by 1. After 10 steps, health decreases by 1.
Flat Reinforcement learning Sarsa with eligibility traces was used To get Flat Reinforcement Learning working, the task needed to be simplified slightly Limited to a 6x6 gridworld Nourishment, Hydration, Health and Shelter Condition minimised to 5 discrete levels each Total states: 6 x 6 x 5 x 5 x 5 x 5 x 2 = Managable
Flat Reinforcement Learning The given task requires a large amount of exploration in order to find the optimal solution Total exploration at first, decreasing gradually until finally total exploitation Optimistic initialisation of tables to maximum possible reward of 6400 encourages efficient exploration
Flat Reinforcement Learning Results
Feudal Reinforcement Learning Needs to be modified for the given problem In the simple maze problem, state variables change independently, and don’t change by more than 1 In the simple maze problem, high level actions can be defined as the same as low level actions
Feudal Reinforcement Learning Main problem with the complex problem is the high level actions are hard to define State variables can change simultaneously and by more than one, i.e. creature can move to the left, and fully satisfy hunger in one step, changing two state variables simultaneously High level actions are defined as desired high level state
Feudal Reinforcement Learning Results Feudal reinforcement learning failed horribly
State Variable Combination Approach In a problem with conflicting sub-problems, sub-problems tend to be defined by a limited set of state variables Sub-agents are created, each in charge of a limited set of state variables Some sub-agents will be inherently equipped to solve a sub-problem Some sub-agents will not hold any useful information By incorporating all possible combinations, we minimise the amount of designer intervention
Examples of Sub-agents
Choosing between sub-agents If the sub-agent which predicts the highest possible reward for a given state is obeyed, the best action should get chosen The problem with this is that some sub-agents which do not hold any useful information might falsely predict a high reward Reliability of sub-agents also needs to be taken into account This is achieved by keeping track of the variance of predicted rewards High Variance = Unreliable Prediction Low Variance = Reliable Prediction
Results
Questions ?