Download presentation
Presentation is loading. Please wait.
1
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005
2
Outline Reinforcement Learning RL Agent RL Agent Policy Policy Hierarchical Reinforcement Learning The Need The Need Sub-Goal Detection Sub-Goal Detection State Clusters State Clusters Border States Border States Continuous State and/or Action Spaces Continuous State and/or Action Spaces Options Options Macro Q-Learning with Parallel Option Discovery Macro Q-Learning with Parallel Option Discovery Experimental Results
3
Reinforcement Learning Agent observes the state, and takes the action according to the policy Policy is a function from the state space onto the action space Policy can be deterministic or non- deterministic State and action spaces can be discrete, continuous or hybrid
4
RL Agent No model of the environment Agent observes state s, takes action a and goes into state s’ observing reward r Agent tries to maximize total expected reward (return) Finite state machine model SS’ a, r
5
Policy In a flat RL model, policy is a map from each state to a primitive action In the optimal policy, the action taken by the agent return highest return at each each step Can be kept in tabular format for small state and action spaces Function approximators can be used for large state or action spaces (or continuous ones)
6
The Need For Hierarchical RL Increase the performance Applying RL to the problems with large action and/or state space become feasible Detection of sub-goals can help the agent to have the abstract actions defined over the primitive actions Sub-goals and abstract actions can be used in different tasks on the same domain. The knowledge is transferred between tasks The policy of the agent can be translated into a natural language
7
Sub-goal Detection A sub-goal can be a single state, a subset of the state space, or a constraint in the state space Reaching a sub-goal should help the agent reaching the main goal (to get the highest return) Sub-goals must be discovered by the agent autonomously
8
State Clusters The states in a cluster are strongly connected to each other The number of state transitions among clusters are small The states at two ends of a state transition between two different clusters are sub-goal candidates Clusters can be hierarchical Different clusters can be in the same cluster at a higher level Different clusters can be in the same cluster at a higher level
9
Border States Some actions cannot be applied in some states. These states are defined as border states Border states are assumed to have a transition sequence. We can travel through the border states by taking some actions Each end in this transition sequence is a candidate sub-goal assuming the agent sufficiently explored the environment
10
Border State Detection For discrete action and state space F(s): set of states which can be reached from state s in one time unit F(s): set of states which can be reached from state s in one time unit G(s): if an action in G(s) is applied at state s, no state transition occurs G(s): if an action in G(s) is applied at state s, no state transition occurs H(s): if an action in H(s) is applied at state s, the agent moves to a different state H(s): if an action in H(s) is applied at state s, the agent moves to a different state
11
Border State Detection Detect the longest state sequence s 0,s 1,s 2,…,s k-1,s k which satisfies the following constraints s i F(s i+1 ) or s i+1 F(s i ) for 0 i<k s i F(s i+1 ) or s i+1 F(s i ) for 0 i<k G(s i ) G(s i+1 ) for 0<i<k-1 G(s i ) G(s i+1 ) for 0<i<k-1 H(s 0 ) G(s 1 ) H(s 0 ) G(s 1 ) H(s k ) G(s k-1 ) H(s k ) G(s k-1 ) s 0 and s k are candidate sub-goals
12
Border States on Continuous State and Action Spaces Environment is assumed to be bounded State and action vectors can include both continuous and discrete dimensions The derivative of state vector with respect to the action vector can be used The border state regions must have small derivatives for some action vectors The large change in these derivatives is the indication of border state regions
13
Options An option is a policy It can be local (defined on a subset of state space) or can be global The option policy can use primitive actions or other options It is hierarchical Used to reach sub-goals
14
Macro Q-Learning with Parallel Option Discovery Agent starts with no sub-goal and option It detects the sub-goals and learns the option policies and the main policy simultaneously Options are formed and removed from the model according the sub-goal detection algorithm When a possible sub-goal is detected, a new option is added to the model to have the policy to reach this sub- goal All options policies are updated in parallel The agent generates an internal reward if a sub-goal is reached
15
Macro Q-Learning with Parallel Option Discovery An Option is defined by the following: O = ( o, o, I o, Q o, r o ) where Q o is Q values for the option and r o is the internal reward signal associated with the option Intra-option learning method is used
16
Experiments Flat RL Hierarchical RL
17
Options in HRL
18
Questions and Suggestions!!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.