Harm van Seijen Bram Bakker Leon Kester - TNO / UvA - UvA

Slides:

Advertisements

Similar presentations

Project title: Hierarchical awareness in distributed agents (start 1 Jan. 2005) Project team members: dr. Bram Bakker (postdoc UvA) dr.ir. Leon Kester.

Advertisements

Reinforcement Learning

Dialogue Policy Optimisation

Agent-based Modeling: A Brief Introduction Louis J. Gross The Institute for Environmental Modeling Departments of Ecology and Evolutionary Biology and.

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide.

Decision Theoretic Planning

CompLACS Composing Learning for Artificial Cognitive Systems Year 2: Specification of scenarios.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Distributed Reinforcement Learning for a Traffic Engineering Application Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina.

Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,

Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Making Decisions CSE 592 Winter 2003 Henry Kautz.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

ADITI BHAUMICK ab3585. To use reinforcement learning algorithm with function approximation. Feature-based state representations using a broad characterization.

Search and Planning for Inference and Learning in Computer Vision

Reinforcement Learning

Introduction Many decision making problems in real life

OBJECT FOCUSED Q-LEARNING FOR AUTONOMOUS AGENTS M. ONUR CANCI.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.

1 Introduction to Reinforcement Learning Freek Stulp.

Designing Games for Distributed Optimization Na Li and Jason R. Marden IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No. 2, pp ,

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Reinforcement Learning

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.

Traffic Light Simulation Lynn Jepsen. Introduction and Background Try and find the most efficient way to move cars through an intersection at different.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

M. Lopes (ISR) Francisco Melo (INESC-ID) L. Montesano (ISR)

Online Multiscale Dynamic Topic Models

Reinforcement Learning in POMDPs Without Resets

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

An Overview of Reinforcement Learning

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Timothy Boger and Mike Korostelev

"Playing Atari with deep reinforcement learning."

Announcements Homework 3 due today (grace period through Friday)

Reinforcement Learning with Partially Known World Dynamics

Dr. Unnikrishnan P.C. Professor, EEE

Chapter 2: Evaluative Feedback

Reinforcement Learning

Decision Trees ADVANTAGES:

Traffic Light Simulation

Introduction to Reinforcement Learning and Q-Learning

CS 188: Artificial Intelligence Fall 2008

Chapter 7: Eligibility Traces

Traffic Light Simulation

Traffic Light Simulation

Reinforcement Learning Dealing with Partial Observability

Reinforcement Nisheeth 18th January 2019.

Adaptive Traffic Control

Chapter 2: Evaluative Feedback

Reinforcement Learning (2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement Learning (2)

Presentation transcript:

Reinforcement Learning with Multiple, Qualitatively Different State Representations Harm van Seijen Bram Bakker Leon Kester - TNO / UvA - UvA NIPS 2007 workshop 11/16/2018

The Reinforcement Learning Problem action a Agent Environment state s, reward r Goal: maximize cumulative discounted reward Question: What is the best way to represent the environment? NIPS 2007 workshop 11/16/2018

NIPS 2007 workshop 11/16/2018

NIPS 2007 workshop 11/16/2018

Explanation of our Approach. NIPS 2007 workshop 11/16/2018

agent 1 : state space S1 = {s11, s12, s13, … s1N1}  Suppose 3 agents work in the same environment and have the same action-space, but different state space: agent 1 : state space S1 = {s11, s12, s13, … s1N1} state space size = N1 agent 2 : state space S2 = {s21, s22, s23, … s2N2} state space size = N2 agent 3 : state space S3 = {s31, s32, s33, … s3N3} state space size = N3 (mutual) action space A = {a1, a2} action space size = 2 NIPS 2007 workshop 11/16/2018

Extension action space External Actions a_e1 : old a1 a_e2 : old a2 Switch actions: a_s1 : ‘switch to representation 1’ a_s2 : ‘switch to representation 2’ a_s3 : ‘switch to representation 3’ New Action space: a1 : a_e1 + a_s1 a2 : a_e1 + a_s2 a3 : a_e1 + a_s3 a4 : a_e2 + a_s1 a5 : a_e2 + a_s2 a6 : a_e2 + a_s3 NIPS 2007 workshop 11/16/2018

Extension state space agent 1 : state space S1 = {s11, s12, s13, … s1N1} state space size = N1 agent 2 : state space S2 = {s21, s22, s23, … s2N2} state space size = N2 agent 3 : state space S3 = {s31, s32, s33, … s3N3} state space size = N3 switch agent: state space S = {s11, s12, …, s1N1, s21, s22, …, s2N2,s31, s32, …, s3N3} state space size = N1+N2+N3 NIPS 2007 workshop 11/16/2018

Requirements and Advantages. NIPS 2007 workshop 11/16/2018

Requirements for Convergence Theoretical Requirement If the individual representations obey the Markov property than convergence to the optimal solution is guaranteed. Empirical Requirement Each representation should contain information that is useful for deciding on which external action to take and information that is useful for deciding when to switch. NIPS 2007 workshop 11/16/2018

State-Action Space Sizes Example Representation States Actions State-Actions Rep 1 100 2 200 Rep 2 50 Rep 3 Switch (OR) 250 6 1.500 Union (AND) 500.000 1.000.000 NIPS 2007 workshop 11/16/2018

Switching is advantageous if: The state-space is very large AND The state-space is heterogeneous. NIPS 2007 workshop 11/16/2018

Results. NIPS 2007 workshop 11/16/2018

Traffic Scenario Situation: crossroad of 2 one-way roads Task: traffic agent has to decide at each time step whether the vertical lane or the horizontal lane should get green light. Changing lights involves an orange time of 5 time steps. Reward: total cars waiting in front of the traffic light * -1 NIPS 2007 workshop 11/16/2018

Representation 1 NIPS 2007 workshop 11/16/2018

Representation 2 NIPS 2007 workshop 11/16/2018

Representations Compared States Actions State-Actions Rep 1 64 2 128 Rep 2 24 48 Switch 88 4 352 Rep 1+ 256 512 NIPS 2007 workshop 11/16/2018

On-line performance for Traffic Scenario NIPS 2007 workshop 11/16/2018

Demo. NIPS 2007 workshop 11/16/2018

Conclusions and Future Work. NIPS 2007 workshop 11/16/2018

Conclusions We introduced an extension to the standard RL problem by allowing the decision agent to dynamically switch between a number of qualitatively different representations. This approach offers advantages in RL problems with large, heterogeneous state spaces. Experiments with a (simulated) traffic control problem showed good results: the agent allowed to switch had a higher end-performance, while the convergence rate was similar compared to a representation with similar state-action space size. NIPS 2007 workshop 11/16/2018

Future Work Use larger state spaces (~ few hundred states per representation) and more than 2 different representations. Explore the application domain of sensor management (for example switch between radar settings) Combine the switching approach with function approximation. Examine in more detail the convergence properties of the switch representation. Use representations that describe realistic sensor output. Explore new methods for switching. NIPS 2007 workshop 11/16/2018

Thank you. NIPS 2007 workshop 11/16/2018

Switching Algorithm versus POMDP update estimate of a hidden variable and base decisions on a probability distribution over all possible values of this hidden variable. not possible to choose between different representations Switch Algorithm: hidden information is present, but not taken into account. The price for this is a more stochastic action outcome. when hidden information is very important for the decision making process the agent can decide to switch to a different representation that does take the information into account. NIPS 2007 workshop 11/16/2018