ReinforcementLearning: A package for replicating human behavior in R

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Lecture 18: Temporal-Difference Learning
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
Department of Computer Science Undergraduate Events More
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
Reinforcement Learning
Neural Networks Chapter 7
INTRODUCTION TO Machine Learning
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
CS 484 – Artificial Intelligence1 Announcements Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30 Lab 3 due Thursday, November 1.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Reinforcement Learning
Figure 5: Change in Blackjack Posterior Distributions over Time.
Reinforcement Learning
Deep Reinforcement Learning
A Crash Course in Reinforcement Learning
Chapter 6: Temporal Difference Learning
Reinforcement Learning (1)
Reinforcement Learning in POMDPs Without Resets
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Reinforcement Learning
Artificial Intelligence Lecture No. 5
An Overview of Reinforcement Learning
Deep reinforcement learning
"Playing Atari with deep reinforcement learning."
Human-level control through deep reinforcement learning
Random Sampling over Joins Revisited
Announcements Homework 3 due today (grace period through Friday)
Chapter 3: The Reinforcement Learning Problem
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
Chapter 4: Dynamic Programming
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Dr. Unnikrishnan P.C. Professor, EEE
RL for Large State Spaces: Value Function Approximation
یادگیری تقویتی Reinforcement Learning
Chapter 3: The Reinforcement Learning Problem
Reinforcement Learning
Reinforcement Learning
Chapter 3: The Reinforcement Learning Problem
Chapter 6: Temporal Difference Learning
Deep Reinforcement Learning
CS 188: Artificial Intelligence Fall 2008
Designing Neural Network Architectures Using Reinforcement Learning
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Chapter 4: Dynamic Programming
Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Reinforcement Learning
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

ReinforcementLearning: A package for replicating human behavior in R University of Freiburg, Germany Nicolas Pröllochs, Stefan Feuerriegel & Dirk Neumann R User Conference July 6, 2017

Pröllochs – Reinforcement learning in R Motivation Reinforcement learning Recently gained a great deal of traction in studies that perform human-like learning Learns optimal behavior through trial-and-error interactions with a dynamic environment Approach appears quite natural by mimicking the fundamental way humans learn Applications Robotics and production control Finance, economics & behavioral research Artificial intelligence (e.g. learn playing Atari games based on input pixels (Mnih et al., 2015)) … Trial-and-error learning based on continuous interactions between an agent an its environment Pröllochs – Reinforcement learning in R

Reinforcement learning Reinforcement learning problem Feedback of the agent is restricted to a reward signal that indicates how well the agent is behaving Any instruction concerning how to improve its behavior is absent Agent-environment interface Model consists of environment states S, agent actions A, and rewards R At each time step t, the agent observes a representation of the environment’s state st and selects an action at Subsequently, it receives a reward rt+1 and observes the next state st+1 Result: State-action function and optimal policy that specifies the best possible action in each state Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Experience replay Experience replay Allows reinforcement learning agents to remember and reuse experiences from the past Speed up convergence by replaying observed state transitions repeatedly to the agent, as if they were new observations collected while interacting with a system Only requires input data in the form of sample sequences consisting of states, actions and rewards Batch reinforcement learning Collect sample sequences from a running system without direct interaction Use the collected sample to learn an optimal policy for every state transition in the input data Resulting policy can be applied to the system for validation purposes or to collect new data points (e.g. in order to iteratively improve the current policy). Pröllochs – Reinforcement learning in R

Package: ReinforcementLearning Motivation Reinforcement learning results in a highly interpretable policy that allows to generate new insights for research and practice The available tools in R are not living up to the needs of researcher and practitioners ReinforcementLearning package is intended to partially close this gap Main features Learn an optimal policy from a fixed set of a priori known transition samples Predefined learning rules and action selection modes A highly customizable framework for model-free reinforcement learning tasks Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Installation Installation via CRAN install.packages("ReinforcementLearning") library(ReinforcementLearning) Installation via GitHub devtools::install_github("nproellochs/ReinforcementLearning") Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Data format Each training example consists of a state transition tuple (s,a,r,s_new): s The current environment state. a The selected action in the current state. r The immediate reward received after transitioning from the current state to the next state. s_new The next environment state. Input data must be a dataframe in which each row represents a state transition tuple (s,a,r,s_new) Pröllochs – Reinforcement learning in R

Generating experience Read-in sample experience data("tictactoe") head(tictactoe, 5) ## State Action NextState Reward ## 1 ......... c7 ......X.B 0 ## 2 ......X.B c6 ...B.XX.B 0 ## 3 ...B.XX.B c2 .XBB.XX.B 0 ## 4 .XBB.XX.B c8 .XBBBXXXB 0 ## 5 .XBBBXXXB c1 XXBBBXXXB 0 B Experience sampling using an environment function environment <- function(state, action) { ... return(list("NextState" = newState, "Reward" = reward)) } Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Case study: Gridworld Learning problem Agent navigates from a random starting position to a goal position on a 2x2 grid Each cell represents one state (4 states) Wall around the grid and between s1 and s4 (agent cannot move off the grid) Crossing each square leads to a reward of -1 Reaching the goal position (s4) results in a reward of 10 |—————————| | s1 | s4 | | s2 s3 | # Define state and action sets states <- c("s1", "s2", "s3", "s4") actions <- c("up", "down", "left", "right") Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Case study: Gridworld function (state, action) { next_state <- state if (state == state("s1") && action == "down") next_state <- state("s2") if (state == state("s2") && action == "up") next_state <- state("s1") if (state == state("s2") && action == "right") next_state <- state("s3") if (state == state("s3") && action == "left") if (state == state("s3") && action == "up") next_state <- state("s4") if (next_state == state("s4") && state != state("s4")) { reward <- 10 } else { reward <- -1 return(list(NextState = next_state, Reward = reward)) Environment function A function that defines the dynamics of the environment Function has to be manually implemented and must take a state and an action as input Return value must be a list containing the name of the next state and the reward |—————————| | s1 | s4 | | s2 s3 | Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Case study: Gridworld Generate sample experience Package is shipped with the built-in capability to sample experience from a function that defines the dynamics of the environment Allows to easily validate the performance of reinforcement learning, by applying the learned policy to newly generated samples # Define state and action sets states <- c("s1", "s2", "s3", "s4") actions <- c("up", "down", "left", "right") # Sample N = 1000 random sequences data <- sampleExperience(N = 1000, env = env, states = states, actions = actions) head(data) ## State Action Reward NextState ## 1 s4 left -1 s4 ## 2 s2 right -1 s3 ## 3 s2 right -1 s3 ## 4 s3 left -1 s2 ## 5 s4 up -1 s4 ## 6 s1 down -1 s2 |—————————| | s1 | s4 | | s2 s3 | Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Case study: Gridworld Performing reinforcement learning ReinforcementLearning(…) learns an optimal policy based on the input data User is required to specify the column names of the individual tuple elements in 'data‘ Control parameters customize the learning behavior of the agent # Define reinforcement learning parameters control <- list(alpha = 0.1, gamma = 0.5, epsilon = 0.1) # Perform reinforcement learning model <- ReinforcementLearning( data, s = "State", a = "Action", r = "Reward", s_new = "NextState", control = control ) Learning parameters alpha Learning rate, set between 0 and 1 gamma Discount factor, set between 0 and 1 epsilon Exploration parameter, set between 0 and 1 |—————————| | s1 | s4 | | s2 s3 | See package manual for details Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Case study: Gridworld # Print result print(model) ## State-Action function Q ## right up down left ## s1 -0.6633782 -0.6687457 0.7512191 -0.6572813 ## s2 3.5806843 -0.6893860 0.7760491 0.7394739 ## s3 3.5702779 9.1459425 3.5765323 0.6844573 ## s4 -1.8005634 -1.8567931 -1.8244368 -1.8377018 ## ## Policy ## s1 s2 s3 s4 ## "down" "right" "up" "right" ## Reward (last iteration) ## [1] -263 Inspect results print(model) shows the state-action table and the optimal (i.e. the best possible action in each state) |—————————| | s1 | s4 | | s2 s3 | Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Case study – next steps # Sample N = 1000 sequences using epsilon-greedy action selection data_new <- sampleExperience( N = 1000, env = env, states = states, actions = actions, model = model, actionSelection = "epsilon-greedy", control = control ) A Sample new experience using learned policy B Update the existing policy model_new <- ReinforcementLearning( data_new, s = "State", a = "Action", r = "Reward", s_new = "NextState", control = control, model = model ) Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Case study – next steps C Inspect results from updated policy and plot learning curve D Plot learning curve summary(model_new) Model details Learning rule: experienceReplay Learning iterations: 2 Number of states: 4 Number of actions: 4 Total Reward: 1409 Reward details (per iteration) Min: -461 Max: 1409 Average: 474 Median: 474 Standard deviation: 1322.29 plot(model_new) Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R Summary Reinforcement learning teaches an agent via interaction with its environment without any supervision other than its own decision-making policy ReinforcementLearning R package allows an agent to learn optimal behavior based on sample experience consisting of states, actions and rewards (experience replay) The package provides a remarkably flexible framework and is easily applied to a wide range of different problems Package ReinforcementLearning is available on CRAN Pröllochs – Reinforcement learning in R

Pröllochs – Reinforcement learning in R