Matthew E. Taylor 1 Autonomous Inter-Task Transfer in Reinforcement Learning Domains Matthew E. Taylor Learning Agents Research Group Department of Computer.

Slides:

Advertisements

Similar presentations

Autonomic Scaling of Cloud Computing Resources

Advertisements

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

RL for Large State Spaces: Value Function Approximation

Partially Observable Markov Decision Process (POMDP)

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.

Towards Equilibrium Transfer in Markov Games 胡裕靖

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Reinforcement Learning Presented by: Kyle Feuz.

ONLINE Q-LEARNER USING MOVING PROTOTYPES by Miguel Ángel Soto Santibáñez.

A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

OBJECT FOCUSED Q-LEARNING FOR AUTONOMOUS AGENTS M. ONUR CANCI.

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hierarchical Reinforcement Learning Using Graphical Models Victoria Manfredi and.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.

Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.

Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.

POMDPs: 5 Reward Shaping: 4 Intrinsic RL: 4 Function Approximation: 3.

Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.

Reinforcement learning (Chapter 21)

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.

The ideals reality of science The pursuit of verifiable answers highly cited papers for your c.v. The validation of our results by reproduction convincing.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Transfer Learning and Intelligence: an Argument and Approach Matthew E. Taylor Joint work with: Gregory Kuhlmann and Peter Stone Learning Agents Research.

Keep the Adversary Guessing: Agent Security by Policy Randomization

A Crash Course in Reinforcement Learning

Reinforcement learning (Chapter 21)

Reinforcement Learning (1)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement learning (Chapter 21)

An Overview of Reinforcement Learning

Transferring Instances for Model-Based Reinforcement Learning

UAV Route Planning in Delay Tolerant Networks

Announcements Homework 3 due today (grace period through Friday)

 Real-Time Scheduling via Reinforcement Learning

Reinforcement learning

Chapter 3: The Reinforcement Learning Problem

Dr. Unnikrishnan P.C. Professor, EEE

October 6, 2011 Dr. Itamar Arel College of Engineering

 Real-Time Scheduling via Reinforcement Learning

Designing Neural Network Architectures Using Reinforcement Learning

Reinforcement Learning (2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Continuous Curriculum Learning for RL

Reinforcement Learning (2)

Presentation transcript:

Matthew E. Taylor 1 Autonomous Inter-Task Transfer in Reinforcement Learning Domains Matthew E. Taylor Learning Agents Research Group Department of Computer Sciences University of Texas at Austin 6/24/2008

Matthew E. Taylor 2 Inter-Task Transfer Learning tabula rasa can be unnecessarily slow Humans can use past information –Soccer with different numbers of players Agents leverage learned knowledge in novel tasks

Matthew E. Taylor 3 Primary Questions Source S SOURCE, A SOURCE Target S TARGET, A T ARGET Is it possible to transfer learned knowledge? Possible to transfer without a providing a task mapping? Only consider reinforcement learning tasks

Matthew E. Taylor Reinforcement Learning (RL): Key Ideas Markov Decision Process (MDP): ⟨ SATR ⟩ Policy: π(s) = a Action-Value function: Q(s, a) = ℜ State variables: s = ⟨ x 1, x 2, … x n ⟩ Environment Agent Action StateReward MDP S: States in task A: Actions agent can take T: T(S, A) → S’ R: R(S) → ℜ 4

Matthew E. Taylor 5 Outline Reinforcement Learning Background Inter-Task Mappings Value Function Transfer MASTER: Learning Inter-Task Mappings Related Work Future Work and Conclusion

Matthew E. Taylor 6 Enabling Transfer Source Task Target Task Environment Agent Action T State T Reward T Environment Agent Action S State S Reward S Q S : S S ×A S → ℜ Q T : S T ×A T → ℜ

Matthew E. Taylor 7 Inter-Task Mappings SourceTarget Source

Matthew E. Taylor 8 Inter-Task Mappings χ x: s target→ s source –Given state variable in target task (some x from s=x 1, x 2, … x n) –Return corresponding state variable in source task χ A: a target→ a source –Similar, but for actions Intuitive mappings exist in some domains (Oracle) Used to construct transfer functional Target ⟨ x 1 …x n ⟩ {a 1 …a m } S T ARGET A T ARGET Source S SOURCE A SOURCE ⟨ x 1 …x k ⟩ {a 1 …a j } χxχx χAχA

Matthew E. Taylor K2K2 K3K3 T2T2 T1T1 K1K1 Both takers move towards player with ball Goal: Maintain possession of ball 5 agents 3 (stochastic) actions 13 (noisy & continuous) state variables Keeper with ball may hold ball or pass to either teammate Keepaway [Stone, Sutton, and Kuhlmann 2005] 4 vs. 3: 7 agents 4 actions 19 state variables

Matthew E. Taylor Keepaway Hand-coded χ A Hold 4v3  Hold 3v2 Pass1 4v3  Pass1 3v2 Pass2 4v3  Pass2 3v2 Pass3 4v3  Pass2 3v2 Actions in 4 vs. 3 have “similar” actions in 3 vs K2K2 K3K3 T2T2 T1T1 K2K2 K3K3 T2T2 T1T1 K1K1 K4K4 T3T3 K1K1 Pass1 4v3 Pass2 4v3 Pass3 4v3

Matthew E. Taylor 11 Define similar state variables in two tasks Example: distances from player with ball to teammates Keepaway Hand-coded χ X K2K2 K3K3 T2T2 T1T1 K1K1 K2K2 K3K3 T2T2 T1T1 K1K1 K4K4 T3T3

Matthew E. Taylor 12 Reinforcement Learning Background Inter-Task Mappings Value Function Transfer MASTER: Learning Inter-Task Mappings Related Work Future Work and Conclusion Outline

Matthew E. Taylor 13 Source S SOURCE, A SOURCE Target S TARGET, A T ARGET Value Function Transfer

Matthew E. Taylor 14 Value Function Transfer ρ( Q S (S S, A S ) ) = Q T (S T, A T ) Action-Value function transferred ρ is task-dependant: relies on inter-task mappings ρ Q not defined on S T and A T Source Task Target Task Environment Agent Action T State T Reward T Environment Agent Action S State S Reward S Q S : S S ×A S → ℜ Q T : S T ×A T → ℜ

Matthew E. Taylor Learning Keepaway Sarsa update –CMAC, RBF, and neural network approximation successful Q π (s,a): Predicted number of steps episode will last –Reward = +1 for every timestep 15

Matthew E. Taylor  ’s Effect on CMACs 4 vs. 33 vs. 2  For each weight in 4 vs. 3 function approximator: o Use inter-task mapping to find corresponding 3 vs. 2 weight 16

Matthew E. Taylor Threshold: 8.5 Performance Target: no Transfer Target: with Transfer Target + Source: with Transfer Target: no transfer Target: with Transfer Two distinct scenarios: 1. Target Time Metric: Successful if target task learning time reduced Transfer Evaluation Metrics Set a threshold performance Majority of agents can achieve with learning 2. Total Time Metric : Successful if total (source + target) time reduced “Sunk Cost” is ignored Source task(s) independently useful AI Goal Effectively utilize past knowledge Only care about Target Source Task(s) not useful Engineering Goal Minimize total training 17

Matthew E. Taylor Value Function Transfer: Time to threshold in 4 vs. 3 Total Time No Transfer Target Task Time } 18

Matthew E. Taylor Value Function Transfer Flexibility Different Function Approximators –Radial Basis Function & Neural Network Different Actuators 19 Pass accuracy “Accurate” passers have normal actuators “Inaccurate” passers have less capable kick actuators Value Function Transfer also reduces target task time and total time: Inaccurate 3 vs. 2 → Inaccurate 4 vs. 3 Accurate 3 vs. 2 → Inaccurate 4 vs. 3 Inaccurate 3 vs. 2 → Accurate 4 vs. 3

Matthew E. Taylor Value Function Transfer Flexibility Different Function Approximators Different Actuators Different Keepaway Tasks –5 vs. 4, 6 vs. 5, 7 vs. 6 20

Matthew E. Taylor Value Function Transfer Flexibility Different Function Approximators Different Actuators Different Keepaway Tasks Partial Mappings K2K2 K3K3 T1T1 K2K2 K3K3 T2T2 T1T1 K1K1 K4K4 T3T3 K1K1 T2T2 Transfer Functional # 3 vs. 2 Episodes Avg. 4 vs. 3 Time none Full Partial

Matthew E. Taylor Value Function Transfer Flexibility Different Function Approximators Different Actuators Different Keepaway Tasks Partial Mappings Different Domains –Knight Joust to 4 vs. 3 Keepaway 22 Goal: Travel from start to goal line 2 agents 3 actions 3 state variables Fully Observable Discrete State Space (Q-table with ~600 s,a pairs) Deterministic Actions Opponent moves directly towards player Player may move North, or take a knight jump to either side # Knight Joust Episodes 4 vs. 3 Time , ,

Matthew E. Taylor Value Function Transfer Flexibility Different Function Approximators Different Actuators Different Keepaway Tasks Partial Mappings Different Domains –Knight Joust to 4 vs. 3 Keepaway –3 vs. 2 Flat Reward, 3 vs. 2 Giveaway 23 Source TaskEpisodes4 vs. 3 Time None vs. 2 Keepaway vs. 2 Flat Reward vs. 2 Giveaway

Matthew E. Taylor Transfer MethodSource RL Base Method Target RL Base Method Knowledge Transferred Value Function Transfer Temporal Difference Q-values or Q-value weights Q-Value ReuseTemporal Difference Function Approximator Policy TransferPolicy Search Neural Network weights TIMBRELAnyModel-LearningExperienced Instances Rule TransferAnyTemporal DifferenceRules Representation Transfer multiple 24 Transfer Methods

Matthew E. Taylor Empirical Evaluation Keepaway: 3 vs. 2, 4 vs. 3, 5 vs. 4, 6 vs. 5, 7 vs. 6 Server Job Scheduling –Autonomic Computing Task –Server processes jobs in a queue while new jobs arrive –Policy selects between jobs with different utility functions Source Job Types 1,2 Target Job Types

Matthew E. Taylor Empirical Evaluation Keepaway: 3 vs. 2, 4 vs. 3, 5 vs. 4, 6 vs. 5, 7 vs. 6 Server Job Scheduling –Autonomic Computing Task –Server processes jobs in a queue while new jobs arrive –Policy selects between jobs with different utility functions Mountain Car –2D –3D Cross-Domain Transfer –Ringworld to Keepaway –Knight’s Joust to Keepaway K2K2 K3K3 T2T2 T1T1 K1K1 # Actions # State Variables Discrete vs. Continuous Deterministic vs. Stochastic Fully vs. Partially Observable Single Agent vs. Multi-Agent

Matthew E. Taylor 27 Reinforcement Learning Background Inter-Task Mappings Value Function Transfer MASTER: Learning Inter-Task Mappings Related Work Future Work and Conclusion Outline

Matthew E. Taylor 28 Learning Task Relationships Sometimes task relationships are unknown Necessary for Autonomous Transfer But finding similarities (analogies) can be very hard! Key idea: –Agents may generate data (experience) in both tasks –Leverage existing machine learning techniques 2 Techniques, differ in amount of background knowledge

Matthew E. Taylor Context Steps to enable Autonomous transfer: 1.Select a relevant source task, given a target task 2.Learn how the source and target tasks are related 3.Effectively transfer knowledge between tasks Transfer is Feasible (step 3) Steps toward Finding Mappings between Tasks (step 2) –Leverage full QDBNs to search for mappings [Liu and Stone, 2006] –Test possible mappings on-line [Soni and Singh, 2006] –Mapping Learning via Classification ? 29

Matthew E. Taylor Context Steps to enable Autonomous transfer: 1.Select a relevant source task, given a target task 2.Learn how the source and target tasks are related 3.Effectively transfer knowledge between tasks Transfer is Feasible (step 3) Steps toward Finding Mappings between Tasks (step 2) –Leverage full QDBNs to search for mappings [Liu and Stone, 2006] –Test possible mappings on-line [Soni and Singh, 2006] –Mapping Learning via Classification 30 S, r, S’ SourceTarget S, A, r, S’ A Action Classifier S, A, r, S’ A →AA →A

Matthew E. Taylor MASTER Overview Modeling Approximate State Transitions by Exploiting Regression Goals: –Learn inter-task mapping between tasks –Minimize data complexity –No background knowledge needed Algorithm Overview: 1.Record data in source task 2.Record small amount of data in target task 3.Analyze data off-line to determine best mapping 4.Use mapping in target task Environment Agent Action T State T Reward T Target Task Environment Agent Action S State S Reward S Source Task MASTER

Matthew E. Taylor MASTER Algorithm Record observed (s source, a source, s’ source ) tuples in source task Record small number of (s target, a target, s’ target ) tuples in target task Learn one-step transition model, T(S T, A T ), for the target task: M (s target, a target ) → s’ target for every possible action mapping χ A for every possible state variable mapping χ X Transform recorded source task tuples Calculate the error of the transformed source task tuples on the target task model: ∑(M(s transformed, a transformed ) – s’ transformed ) 2 return χ A, χ X with lowest error Environment Agent Action T State T Reward T Target Task Environment Agent Action S State S Reward S Source Task MASTER

Matthew E. Taylor Pros: Very little target task data needed (sample complexity) Analysis for discovering mappings is off-line Cons: Exponential in # of state variables and actions Observations 33

Matthew E. Taylor Generalized Mountain Car 2D Mountain Car –x, –Left, Neutral, Right 3D Mountain Car (novel task) –x, y,, –Neutral, West, East, South, North 34

Matthew E. Taylor Generalized Mountain Car 2D Mountain Car –x,–x, –Left, Neutral, Right 3D Mountain Car (novel task) –x, y,, –Neutral, West, East, South, North χ X –x, y → x –, → χ A –Neutral → Neutral –West, South → Left –East, North → Right Both tasks: Episodic Scaled State Variables Sarsa CMAC function approximation 35

Matthew E. Taylor MASTER Algorithm Record observed (s source, a source, s’ source ) tuples in source task Record small number of (s target, a target, s’ target ) tuples in target task Learn one-step transition model, T(S,A), for the target task: M (s target, a target ) → s’ target for every possible action mapping χ A for every possible state variable mapping χ X Transform recorded source task tuples Calculate the error of the transformed source task tuples on the target task model: ∑(M(s transformed, a transformed ) – s’ transformed ) 2 return χ A, χ X with lowest error 36

Matthew E. Taylor MASTER and Mountain Car Record observed (x,, a 2D, x’, ’) tuples in 2D task Record small number of (x, y,,, a 3D, x’, y’, ’, ’) tuples in 3D task Learn one-step transition model, T(S,A), for the 3D task: M (x, y,,, a 3D ) →x’, y’, ’, ’ for every possible action mapping χ A for every possible state variable mapping χ X Transform recorded source task tuples Calculate the error of the transformed source task tuples on the target task model: ∑(M(s transformed, a transformed ) – s’ transformed ) 2 return χ A, χ X with lowest error 37

Matthew E. Taylor MASTER and Mountain Car Record observed (x,, a 2D, x’, ’) tuples in 2D task Record small number of (x, y,,, a 3D, x’, y’, ’, ’) tuples in 3D task Learn one-step transition model, T(S,A), for the 3D task: M (x, y,,, a 3D ) →x’, y’, ’, ’ Example: χ A {Neutral, West} → Neutral {South} → Left {East, North} → Right χ X {x, y, x} → x {y} → x 38 M(x, x, x, x, a 3D ) → x’, x’, x’, x’ (-0.50, 0.01, Right, -0.49, 0.02) (-0.50, -0.50, -0.50, 0.01, East, -0.49, -0.49, -0.49, 0.02) (-0.50, -0.50, -0.50, 0.01, North, -0.49, -0.49, -0.49, 0.02)

Matthew E. Taylor MASTER and Mountain Car Record observed (x,, a 2D, x’, ’) tuples in 2D task Record small number of (x, y,,, a 3D, x’, y’, ’, ’) tuples in 3D task Learn one-step transition model, T(S,A), for the 3D task: M (x, y,,, a 3D ) →x’, y’, ’, ’ for every possible action mapping χ A for every possible state variable mapping χ X Transform recorded source task tuples Calculate the error of the transformed source task tuples on the target task model: ∑(M(s transformed, a transformed ) – s’ transformed ) 2 return χ A, χ X with lowest error (of 240 possible mappings: 16 state variables × 15 actions) 39

Matthew E. Taylor Environment Agent Action S State S Reward S Environment Agent Action T State T Reward T Q-value Reuse Source Task Q-value function Target Task Q-value function 40

Matthew E. Taylor Environment Agent Action S State S Reward S Environment Agent Action T State T Reward T Target: LearnedSource: Fixed Q-value Reuse 41 Q(s target, a target ) = Q learned (s target, a target ) + Q source (χ X (s target ), χ A (a target ))

Matthew E. Taylor No Transfer Hand coded mappings Utilizing Mappings in 3D Mountain Car 42

Matthew E. Taylor Experimental Setup Learn in 2D Mountain Car for 100 episodes Learn in 3D Mountain Car for 25 episodes Apply MASTER –Train transition model off-line using backprop in Weka Transfer from 2D to 3D: Q-Value Reuse Learn the 3D Task 43

Matthew E. Taylor State Variable Mappings Evaluated xyMSE xxxx xxx xxx xx xxx xx xx x xxx ……… 44

Matthew E. Taylor Action Mappings Evaluated Target Task ActionSource Task ActionMSE NeutralLeft Neutral NeutralRight WestLeft WestNeutral WestRight EastLeft EastNeutral EastRight ……… (-0.50, 0.01, Right, -0.49, 0.02) (-0.50, -0.50, 0.01, 0.01, East, -0.49, -0.49, 0.02, 0.02) (-0.50, -0.50, 0.01, 0.01, North, -0.49, -0.49, 0.02, 0.02) 45

Matthew E. Taylor Transfer in 3D Mountain Car 46 Average Both No Transfer Average Actions Hand-Coded 1/MSE

Matthew E. Taylor Transfer in 3D Mountain Car: Zoom 47 No Transfer Average Actions

Matthew E. Taylor MASTER Wrap-up First fully autonomous mapping-learning method Learning done off-line Use to select most relevant source task or transfer from multiple source tasks Future work –Incorporate heuristic search –Use in more complex domains –Formulate as optimization problem? 48

Matthew E. Taylor 49 Reinforcement Learning Background Inter-Task Mappings Value Function Transfer MASTER: Learning Inter-Task Mappings Related Work Future Work and Conclusion Outline

Matthew E. Taylor Related Work: Framework Allowed task differences Source task selection Type of knowledge transferred Allowed base learners + 3 others 50

Matthew E. Taylor Selected Related Work: Transfer Methods 1.Same state variables and actions [Selfridge+, 1985] 2.Multi-task learning [Fernandez and Veloso, 2006] 3.Methods to avoid inter-task mappings [Konidaris and Barto, 2007] 4.Different state variables and actions [Torrey+, ] 51 T(s, a)=s’ Action StateReward s = ⟨ x 1, … x n ⟩

Matthew E. Taylor Selected Related Work: Mapping Learning Methods On-line : Test possible mappings on-line as new actions [Soni and Singh, 2006] k-Armed bandit, each arm is a mapping [Talvite and Singh, 2007] Off-line Full Qualitative Dynamic Bayes Networks (QDBNs) [Liu and Stone, 2006] 52 Hold: 2 vs. 1 Keepaway Assume T types of task-independent objects Keepaway domain has 2 object types: Keepers and Takers Assume T types of task-independent objects Keepaway domain has 2 object types: Keepers and Takers

Matthew E. Taylor 53 Reinforcement Learning Background Inter-Task Mappings Value Function Transfer MASTER: Learning Inter-Task Mappings Related Work Future Work and Conclusion Outline

Matthew E. Taylor 54 Open Question 1: Optimize for Metrics Minimize target time: more source task training? Minimize total time: “moderate” amount of training? Depends on task similarity 3 vs. 2 to 4 vs. 3

Matthew E. Taylor –Is transfer beneficial for a given pair of tasks? Avoid Negative Transfer? Open Question 2: Effects of Task Similarity Transfer trivialTransfer impossible Source identical to Target Source unrelated to Target 55

Matthew E. Taylor Open Question 3: Avoiding Negative Transfer Currently depends on heuristics and human knowledge Very similar tasks may not transfer Need more theoretical analysis –Approximate bisimulation metrics? [Ferns et al., ] –Utilize homomorphisms? [Soni and Singh, 2006] 56

Matthew E. Taylor Acknowledgements Advisor: Peter Stone Committee: Risto Miikkulainen, Ray Mooney, Bruce Porter, and Rich Sutton Other co-authors for material in the dissertation: Nick Jong, Greg Kuhlmann, Shimon Whiteson, and Yaxin Liu LARG 57

Matthew E. Taylor 58 Conclusion Inter-task mappings can be: –Used with many different RL algorithms –Used in many domains –Learned from interacting with an environment Plausibility and efficacy have been demonstrated Next up: Broaden applicability and autonomy

Matthew E. Taylor 59 Thanks for your attention! Questions?