Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Reinforcement learning
RL for Large State Spaces: Value Function Approximation
Artificial Curiosity Tyler Streeter
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Reinforcement Learning
An Introduction to Artificial Intelligence. Introduction Getting machines to “think”. Imitation game and the Turing test. Chinese room test. Key processes.
Reinforcement learning (Chapter 21)
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Apprenticeship learning for robotic control Pieter Abbeel Stanford University Joint work with Andrew Y. Ng, Adam Coates, Morgan Quigley.
Reinforcement learning
Reinforcement Learning
Reinforcement Learning
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Evolutionary Reinforcement Learning Systems Presented by Alp Sardağ.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies * Mary Lou Maher University of Sydney.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Reinforcement Learning
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT Seminar in Computational Neuroscience Zurab Bzhalava.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Autonomous Virtual Humans Tyler Streeter April 15, 2004.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
Neural Networks Chapter 7
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Reinforcement learning (Chapter 21)
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Big data classification using neural network
Randomness in Neural Networks
Done Done Course Overview What is AI? What are the Major Challenges?
A Crash Course in Reinforcement Learning
Chapter 6: Temporal Difference Learning
Reinforcement learning (Chapter 21)
István Szita & András Lőrincz
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Reinforcement learning (Chapter 21)
Reinforcement Learning
An Overview of Reinforcement Learning
CSE 473 Introduction to Artificial Intelligence Neural Networks
Dr. Unnikrishnan P.C. Professor, EEE
RL for Large State Spaces: Value Function Approximation
Basics of Deep Learning No Math Required
Chapter 6: Temporal Difference Learning
Chapter 1: Introduction
Designing Neural Network Architectures Using Reinforcement Learning
CS 416 Artificial Intelligence
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
World models and basis functions
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005

Motivation Intelligent agents are becoming increasingly important.

Motivation Most intelligent agents today are designed for very specific tasks. Ideally, agents could be reused for many tasks. Goal: provide a general purpose agent implementation. Reinforcement learning provides practical algorithms.

Initial Work Humanoid motor control through neuroevolution (i.e. controlling physically simulated characters with neural networks optimized with genetic algorithms) (videos)

What is Reinforcement Learning? Learning how to behave in order to maximize a numerical reward signal Very general: almost any problem can be formulated as a reinforcement learning problem

Basic RL Agent Implementation Main components: –Value function: maps states to “values” –Policy: maps states to actions State representation converts observations to features (allows linear function approximation methods for value function and policy) Temporal difference (TD) prediction errors train value function and policy

RBF State Representation

Temporal Difference Learning Learning to predict the difference in value between successive time steps. Compute TD error: δ t = r t + γV(s t+1 ) - V(s t ) Train value function: V(s t ) ← V(s t ) + ηδ t Train policy by adjusting action probabilities

Biological Inspiration Biological brains: the proof of concept that intelligence actually works. Midbrain dopamine neuron activity is very similar to temporal difference errors. Figure from Suri, R.E. (2002). TD Models of Reward Predictive Responses in Dopamine Neurons. Neural Networks, 15:

Internal Predictive Model An accurate predictive model can temporarily replace actual experience from the environment.

Training a Predictive Model Given the previous observation and action, the predictive model tries to predict the current observation and reward. Training signals are computed from the error between actual and predicted information.

Planning Reinforcement learning from simulated experiences. Planning sequences end when uncertainty is too high.

Curiosity Rewards Intrinsic drive to explore unfamiliar states. Provide extra rewards proportional to uncertainty.

Verve Verve is an Open Source implementation of curious, planning reinforcement learning agents. Intended to be an out-of-the-box solution, e.g. for game development or robotics. Distributed as a cross-platform library written in C++. Agents can be saved to and loaded from XML files. Includes Python bindings.

1D Hot Plate Task

1D Signaled Hot Plate Task

2D Hot Plate Task

2D Maze Task #1

2D Maze Task #2

Pendulum Swing-Up Task

Pendulum Neural Networks

Cart-Pole/Inverted Pendulum Task

Planning in Maze #2

Curiosity Task

Future Work More applications (real robots, game development, interactive training) Hierarchies of motor programs –Constructed from low-level primitive actions –High-level planning –High-level exploration