Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

Lecture 18: Temporal-Difference Learning

RL for Large State Spaces: Value Function Approximation

Artificial Curiosity Tyler Streeter

11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Yiannis Demiris and Anthony Dearden By James Gilbert.

Journal club Marian Tsanov Reinforcement Learning.

Reinforcement Learning

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Reinforcement Learning

Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,

Ratbert: Nearest Sequence Memory Based Prediction Model Applied to Robot Navigation by Sergey Alexandrov iCML 2003.

Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning

Variable Selection for Optimal Decision Making Lacey Gunter University of Michigan Statistics Department Michigan Student Symposium for Interdisciplinary.

Reinforcement Learning (1)

Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Intelligent Agents. Software agents O Monday: O Overview video (Introduction to software agents) O Agents and environments O Rationality O Wednesday:

COMP 4640 Intelligent & Interactive Systems Cheryl Seals, Ph.D. Computer Science & Software Engineering Auburn University Lecture 2: Intelligent Agents.

Reinforcement Learning

NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom.

Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.

Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)

Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.

Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.

CMSC 471 Fall 2009 Temporal Difference Learning Prof. Marie desJardins Class #25 – Tuesday, 11/24 Thanks to Rich Sutton and Andy Barto for the use of their.

Neural Networks Chapter 7

Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.

Artificial Intelligence Chapter 10 Planning, Acting, and Learning Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.

Training  Addresses a knowledge and skill deficit  “How to get the job done” Technology Transfer  Broader scope than training  Create a mechanism.

Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.

Reinforcement learning (Chapter 21)

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

Learning Analytics isn’t new Ways in which we might build on the long history of adaptive learning systems within contemporary online learning design Professor.

Does the brain compute confidence estimates about decisions?

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Self-Organizing Network Model (SOM) Session 11

Done Done Course Overview What is AI? What are the Major Challenges?

Chapter 6: Temporal Difference Learning

Reinforcement learning (Chapter 21)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement learning (Chapter 21)

A Simple Artificial Neuron

PETRA 2014 An Interactive Learning and Adaptation Framework for Socially Assistive Robotics: An Interactive Reinforcement Learning Approach Konstantinos.

"Playing Atari with deep reinforcement learning."

RL for Large State Spaces: Value Function Approximation

Artificial Intelligence Chapter 10 Planning, Acting, and Learning

October 6, 2011 Dr. Itamar Arel College of Engineering

Chapter 6: Temporal Difference Learning

Chapter 1: Introduction

Chapter 7: Eligibility Traces

Artificial Intelligence Chapter 10 Planning, Acting, and Learning

Intelligent Process Automation in Audit

August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Overview Reinforcement learning (RL) agents can reduce learning time dramatically by planning with learned predictive models. Planning trajectories, sequences of imagined interactions with the environment, are one way to use such predictive models. In complex environments, planning agents need an intrinsic drive to improve their predictive models, such as a curiosity drive that rewards agents for experiencing novel states. Curiosity acts as a higher form of exploration than simple random action selection because it encourages targeted investigation of interesting situations. In a task with multiple external rewards, we show that curious planning RL agents outperform non-curious planning agents in the long run.

Agent Architecture The agent architecture combines several components –Separate feedforward neural networks for policy and value function –Temporal difference learning with eligibility traces –Planning with a feedforward neural network for sensory/reward prediction –Prediction uncertainty estimation with a feedforward neural network –Curiosity rewards based on prediction uncertainty An open source implementation of this architecture is available online:

Main Algorithm loop forever: –based on the previous observation and action, predict the current observation and reward –train the predictor using the actual current observation and reward –train the uncertainty estimator using the predictor’s MSE –loop until uncertainty is too high: total reward = predicted current reward + curiosity reward (proportional to uncertainty estimation) perform TD learning on the value function and policy using the predicted current observation and the total reward –use the policy to choose an action –update the environment with the chosen action; generate the next observation and reward

Curiosity Implementation Curiosity here is simply a reward proportional to prediction errors (i.e. the MSE of the model’s predictions), similar to the curiosity reward used in Barto, Singh, & Chentanez, To generate curiosity rewards during planning (where prediction errors cannot be measured), a second predictor is required which learns to estimate the model’s prediction errors. This curiosity implementation is sufficient for the present investigation. However, more powerful curiosity mechanisms are available which use curiosity rewards proportional to the decrease in prediction errors over time (e.g., Schmidhuber, 1991, and Oudeyer & Kaplan, 2004).

Agent Architecture

Experiment In this discrete environment an agent is able to sense its 2D position and can move left, right, up, down, or remain in place. The agent starts each trial in space ‘A’ and interacts for 100 time steps. There are four external rewards (three small and one large). Performance = sum of rewards received over the course of a single trial.

Results

Discussion The curious agent plot shows a noticeable delay before any visible improvement, indicating a curiosity- driven exploration phase. Around trial 15, the agent “runs out” of interesting situations to explore, so it begins maximizing external reward intake. In general, curious agents may take longer to improve performance on specific tasks, but they learn a broader knowledge base that might be applicable to a wide variety of tasks.

Future Work Experimentation with various curiosity mechanisms (e.g., curiosity rewards proportional to the decrease in prediction errors) Higher-dimensional state spaces with dimensionality reduction techniques (e.g., PCA, SOMs, ICA)

Links My Website Verve Agent Implementation Iowa State Virtual Reality Applications Center Iowa State HCI Graduate Program

References Barto, A., Singh, S., & Chentanez, N Intrinsically motivated learning of hierarchical collections of skills. In 3rd International Conference on Development and Learning. Oudeyer, P.-Y., and Kaplan, F Intelligent adaptive curiosity: a source of self-development. In Berthouze, L., et al, eds., Proceedings of the 4th International Workshop on Epigenetic Robotics, volume 117, Lund University Cognitive Studies. Schmidhuber, J Curious model-building control systems. In Proceedings of the International Joint Conference on Neural Networks, vol. 2, IEEE.