Continous-Action Q-Learning

Slides:



Advertisements
Similar presentations
Viktor Zhumatiya, Faustino Gomeza,
Advertisements

Artificial Intelligence Chapter 5 State Machines.
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Automatic Identification of ROIs (Regions of interest) in fMRI data.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ.
Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.
Evolutionary Reinforcement Learning Systems Presented by Alp Sardağ.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
September 10, 2012Introduction to Artificial Intelligence Lecture 2: Perception & Action 1 Boundary-following Robot Rules 1  2  3  4  5.
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
CONTENTS:  Introduction  What is neural network?  Models of neural networks  Applications  Phases in the neural network  Perceptron  Model of fire.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
Overview of Machine Learning RPI Robotics Lab Spring 2011 Kane Hadley.
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
POMDPs: 5 Reward Shaping: 4 Intrinsic RL: 4 Function Approximation: 3.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
INTRODUCTION TO Machine Learning
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
1 Introduction to Reinforcement Learning Freek Stulp.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
Chapter 14. Active Vision for Goal-Oriented Humanoid Robot Walking (2/2) in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from.
Chapter 15. Cognitive Adequacy in Brain- Like Intelligence in Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans Cinarel, Ceyda.
Ch 20. Parameter Control Ch 21. Self-adaptation Evolutionary Computation vol. 2: Advanced Algorithms and Operators Summarized and presented by Jung-Woo.
Information Processing by Neuronal Populations Chapter 6: Single-neuron and ensemble contributions to decoding simultaneously recoded spike trains Information.
Chapter 4. Analysis of Brain-Like Structures and Dynamics (2/2) Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans 09/25.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Artificial Intelligence Chapter 7 Agents That Plan Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Machine Learning 12. Local Models.
On-Line Markov Decision Processes for Learning Movement in Video Games
Chapter 3. Reinforcement Learning in RL for Adaptive Dialogue Systems, V. Rieser & O. Lemon Course: Autonomous Machine Learning Patrick Emaase Biointelligence.
Data Mining, Neural Network and Genetic Programming
A Comparison of Learning Algorithms on the ALE
Review of AI Professor: Liqing Zhang
Backgammon project Oren Salzman Guy Levit Instructors:
Reinforcement learning (Chapter 21)
Creating fuzzy rules from numerical data using a neural network
Reinforcement Learning
Chapter 5. The Bootstrapping Approach to Developing Reinforcement Learning-based Strategies in Reinforcement Learning for Adaptive Dialogue Systems, V.
Transferring Instances for Model-Based Reinforcement Learning
Dr. Kenneth Stanley September 6, 2006
Supervised Training of Deep Networks
Self organizing networks
Ch 14. Active Vision for Goal-Oriented Humanoid Robot Walking (1/2) Creating Brain-Like Intelligence, Sendhoff et al. (eds), Robots Learning from.
Lecture 22 Clustering (3).
Teaching a Machine to Read Maps with Deep Reinforcement Learning
Fuzzy logic with biomolecules
Presented by Ramy Shahin March 12th 2018
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Subhayu Basu et al. , DNA8, (2002) MEC Seminar Su Dong Kim
Biointelligence Laboratory, Seoul National University
Reinforcement Learning
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Artificial Intelligence Chapter 7 Agents That Plan
Relevance and Reinforcement in Interactive Browsing
Dynamics of Training Noh, Yung-kyun Mar. 11, 2003
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.
Continuous Curriculum Learning for RL
Presentation transcript:

Continous-Action Q-Learning Jose Del R.Millan et al, Machine Learning 49, 247-265 (2002) Summarized by Seung-Joon Yi (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

ITPM(Incremental Topology Preserving Map) Consists of units and edges between pairs of units. Maps current sensory situation x onto action a. Units are created incrementally and incorporates bias After being created, the units’ sensory component is tuned by self-organizing rules Their action component is updated through reinforcement learning. (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ ITPM Units and bias Initially the ITPM has no units and they are created as the robot uses built-in reflexes. Units in the network have overlapping localized receptive fields. When the neural controller makes incorrect generalizations, reflexes get control of the robot and it adds a new unit to the ITPM. (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ ITPM Self-organizing rules (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ ITPM Advantages Automatically allocates units in the visited parts of the input space. Adjusts dynamically the necessary resolution in different regions. Experiments show that in everage every unit is connected to 5 others at the end of learning episodes. (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ ITPM General learning algorithm (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Discrete-action Q-Learning Action selection rule Ε-greedy policy Q-value update rule (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Continous-action Q-Learning Action selection rule An average of the discrete actions of the nearest unit weighted by their Q-values Q-value of the selected continous action a is: (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Continous-action Q-Learning Q-value update rule (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Average-Reward RL Q-value update rule (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Experiments Wall following task Reward (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Experiments Performance comparison between discrete and continous discountd-rewarded RL (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Experiments Performance comparison between discrete and continous average-rewarded RL (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Experiments Performance comparison between discounted and average-rewarded RL,discrete-action case (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Conclusion Presented a simple Q-learning that works in continous domains. ITPM represents continous input space Compared discounted-rewarded RL against average-awarded RL (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/