Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

Reinforcement learning

RL for Large State Spaces: Value Function Approximation

Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

Radial Basis Functions

Reinforcement Learning

Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Radial Basis Function Networks 표현아 Computer Science, KAIST.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.

Making Decisions CSE 592 Winter 2003 Henry Kautz.

Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.

1 Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning Greg Grudic University of Colorado.

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.

Radial Basis Function Networks

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

The Reinforcement Learning Toolbox – Reinforcement Learning in Optimal Control Tasks Gerhard Neumann Master Thesis 2005 Institute für Grundlagen der Informationsverarbeitung.

Drones Collecting Cell Phone Data in LA AdNear had already been using methods.

1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.

Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.

NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom.

Reinforcement learning This is mostly taken from Dayan and Abbot ch. 9 Reinforcement learning is different than supervised learning in that there is no.

Applying reinforcement learning to Tetris A reduction in state space Underling : Donald Carr Supervisor : Philip Sterne.

Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and.

Reinforcement Learning

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.

Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical.

Reinforcement Learning

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Neural Networks Chapter 7

1 Introduction to Reinforcement Learning Freek Stulp.

Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.

© Daniel S. Weld 1 Logistics No Reading First Tournament Wed Details TBA.

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

Playing Tic-Tac-Toe with Neural Networks

Intelligent Numerical Computation1 Center:Width:.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Big data classification using neural network

Randomness in Neural Networks

Done Done Course Overview What is AI? What are the Major Challenges?

Learning to Generate Networks

A Crash Course in Reinforcement Learning

Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.

Neuro-Computing Lecture 4 Radial Basis Function Network

Dr. Unnikrishnan P.C. Professor, EEE

RL for Large State Spaces: Value Function Approximation

Chapter 2: Evaluative Feedback

Final Project presentation

Chapter 1: Introduction

Designing Neural Network Architectures Using Reinforcement Learning

August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab

Chapter 2: Evaluative Feedback

Reinforcement Learning (2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

World models and basis functions

Reinforcement Learning (2)

Presentation transcript:

Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006

Motivation Intelligent agents are becoming increasingly important.

Motivation Most intelligent agents today are carefully designed for very specific tasks Ideally, we could avoid a lot of work by letting the agents train themselves Goal: provide a general purpose agent implementation based on reinforcement learning Target audience: Application developers (especially roboticists and game developers)

Reinforcement Learning Learning how to behave in order to maximize a numerical reward signal Very general: lots of real-world problems can be formulated as reinforcement learning problems

Reinforcement Learning Typical challenges: –Temporal credit assignment –Structural credit assignment –Exploration vs. exploitation –Continuous state spaces Solutions: –TD learning with value function and policy represented as single-layer neural networks –Eligibility traces for connection weights –Softmax action selection –Function approximation with Gaussian radial basis functions

RL Agent Implementation Value function: maps states to “values” Policy: maps states to actions State representation converts observations to features (allows linear function approximation methods for value function and policy) Temporal difference (TD) prediction errors train value function and policy

RBF State Representation

Verve Software Library Cross-platform library written in C++ with Python bindings License: BSD or LGPL Unit tested, heavily-commented source code Complete API documentation Widely applicable: user-defined sensors, actuators, sensor resolution, and reward function Optimized to reduce computational requirements (e.g., dynamically-growing RBF array)

Free Parameters Inputs –Number of sensors –Choice of discrete or continuous (RBF) –Continuous sensor resolution –Circular continuous sensors Number of outputs Reward function Agent update rate (step size) Learning rates Eligibility trace decay time constant Reward discounting time constant

C++ Code Sample (1/3) // Define an AgentDescriptor. verve::AgentDescriptor agentDesc; agentDesc.addDiscreteSensor(4); // Use 4 possible values. agentDesc.addContinuousSensor(); agentDesc.setContinuousSensorResolution(10); agentDesc.setNumOutputs(3); // Use 3 actions. // Create the Agent and an Observation initialized to fit this Agent. verve::Agent agent(agentDesc); verve::Observation obs; obs.init(agent); // Set the initial state of the world. initEnvironment();

C++ Code Sample (2/3) // Loop forever (or until some desired learning performance is achieved). while (1) { // Set the Agent and environment update rate to 10 Hz. verve::real dt = 0.1; // Update the Observation based on the current state of the world. // Each sensor is accessed via an index. obs.setDiscreteValue(0, computeDiscreteInput()); obs.setContinuousValue(0, computeContinuousInput0()); obs.setContinuousValue(1, computeContinuousInput1()); // Compute the current reward, which is application-dependent. verve::real reward = computeReward(); // Update the Agent with the Observation and reward. unsigned int action = agent.update(reward, obs, dt);

C++ Code Sample (3/3) // Apply the chosen action to the environment. switch(action) { case 0: performAction0(); break; case 1: performAction1(); break; case 2: performAction2(); break; default: break; } // Simulate the environment ahead by 'dt' seconds. updateEnvironment(dt); }

Examples 2D Maze Pendulum swing-up Cart-pole/inverted pendulum

2D Maze Task

Pendulum Swing-Up Task

Pendulum Neural Networks

Cart-Pole/Inverted Pendulum Task

Experimental Feature - Planning Planning: training the value function and policy from a learned model of the environment (i.e. reinforcement learning from simulated experiences) Reduces training time significantly 2D Maze Task with Planning

Experimental Feature - Curiosity Curiosity: an intrinsic drive to explore unfamiliar states Provide extra rewards proportional to uncertainty or “learning progress” Drives agents to improve mental models of the environment (used for planning) Multiple Rewards Task with Curiosity

Future Work The exhaustive RBF state representation is too slow for high-dimensional state spaces. Possible solutions: dimensionality reduction (e.g., using PCA or ICA), hierarchical state and action representations, and focused attention Temporal state representation (e.g., tapped delay lines)