Download presentation
Presentation is loading. Please wait.
1
Continous-Action Q-Learning
Jose Del R.Millan et al, Machine Learning 49, (2002) Summarized by Seung-Joon Yi (C) 2003, SNU Biointelligence Lab,
2
ITPM(Incremental Topology Preserving Map)
Consists of units and edges between pairs of units. Maps current sensory situation x onto action a. Units are created incrementally and incorporates bias After being created, the units’ sensory component is tuned by self-organizing rules Their action component is updated through reinforcement learning. (C) 2003, SNU Biointelligence Lab,
3
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
ITPM Units and bias Initially the ITPM has no units and they are created as the robot uses built-in reflexes. Units in the network have overlapping localized receptive fields. When the neural controller makes incorrect generalizations, reflexes get control of the robot and it adds a new unit to the ITPM. (C) 2003, SNU Biointelligence Lab,
4
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
ITPM Self-organizing rules (C) 2003, SNU Biointelligence Lab,
5
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
ITPM Advantages Automatically allocates units in the visited parts of the input space. Adjusts dynamically the necessary resolution in different regions. Experiments show that in everage every unit is connected to 5 others at the end of learning episodes. (C) 2003, SNU Biointelligence Lab,
6
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
ITPM General learning algorithm (C) 2003, SNU Biointelligence Lab,
7
Discrete-action Q-Learning
Action selection rule Ε-greedy policy Q-value update rule (C) 2003, SNU Biointelligence Lab,
8
Continous-action Q-Learning
Action selection rule An average of the discrete actions of the nearest unit weighted by their Q-values Q-value of the selected continous action a is: (C) 2003, SNU Biointelligence Lab,
9
Continous-action Q-Learning
Q-value update rule (C) 2003, SNU Biointelligence Lab,
10
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Average-Reward RL Q-value update rule (C) 2003, SNU Biointelligence Lab,
11
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experiments Wall following task Reward (C) 2003, SNU Biointelligence Lab,
12
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experiments Performance comparison between discrete and continous discountd-rewarded RL (C) 2003, SNU Biointelligence Lab,
13
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experiments Performance comparison between discrete and continous average-rewarded RL (C) 2003, SNU Biointelligence Lab,
14
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experiments Performance comparison between discounted and average-rewarded RL,discrete-action case (C) 2003, SNU Biointelligence Lab,
15
(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Conclusion Presented a simple Q-learning that works in continous domains. ITPM represents continous input space Compared discounted-rewarded RL against average-awarded RL (C) 2003, SNU Biointelligence Lab,
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.