Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.
Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
© Jesse Davis 2006 View Learning Extended: Learning New Tables Jesse Davis 1, Elizabeth Burnside 1, David Page 1, Vítor Santos Costa 2 1 University of.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
WISCONSIN UNIVERSITY OF WISCONSIN - MADISON Integrating Knowledge Capture and Supervised Learning through a Human-Computer Interface Trevor Walker, Gautam.
RL via Practice and Critique Advice Kshitij Judah, Saikat Roy, Alan Fern and Tom Dietterich PROBLEM: RL takes a long time to learn a good policy. Teacher.
Lisa Torrey University of Wisconsin – Madison CS 540.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Boosting Markov Logic Networks
Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Reinforcement Learning
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA.
Introduction Many decision making problems in real life
Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,
Lisa Torrey University of Wisconsin – Madison Doctoral Defense May 2009.
Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.
WISCONSIN UNIVERSITY OF WISCONSIN - MADISON Broadening the Applicability of Relational Learning Trevor Walker Ph.D. Defense 1.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and.
Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,
Leveraging Human Knowledge for Machine Learning Curriculum Design Matthew E. Taylor teamcore.usc.edu/taylorm.
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hierarchical Reinforcement Learning Using Graphical Models Victoria Manfredi and.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University.
Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
POMDPs: 5 Reward Shaping: 4 Intrinsic RL: 4 Function Approximation: 3.
Transfer in Variable - Reward Hierarchical Reinforcement Learning Hui Li March 31, 2006.
Reinforcement learning (Chapter 21)
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
The ideals reality of science The pursuit of verifiable answers highly cited papers for your c.v. The validation of our results by reproduction convincing.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
Matthew E. Taylor 1 Autonomous Inter-Task Transfer in Reinforcement Learning Domains Matthew E. Taylor Learning Agents Research Group Department of Computer.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Thirty-Three Years of Knowledge-Based Machine Learning
Reinforcement learning (Chapter 21)
Reinforcement learning (Chapter 21)
Transferring Instances for Model-Based Reinforcement Learning
"Playing Atari with deep reinforcement learning."
Announcements Homework 3 due today (grace period through Friday)
Richard Maclin University of Minnesota - Duluth
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Introduction to Reinforcement Learning and Q-Learning
Refining Rules Incorporated into Knowledge-Based Support Vector Learners via Successive Linear Programming Richard Maclin University of Minnesota - Duluth.
Reinforcement Learning (2)
Reinforcement Learning (2)
Presentation transcript:

Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison

Acknowledgements Lisa Torrey, Trevor Walker, & Rich Maclin Lisa Torrey, Trevor Walker, & Rich Maclin DARPA IPTO Grant HR DARPA IPTO Grant HR NRL Grant N G002 NRL Grant N G002 DARPA IPTO Grant FA C-7606 DARPA IPTO Grant FA C-7606

What Would You Like to Say to This Penguin? IF a Bee is (Near and West) & an Ice is (Near and North) an Ice is (Near and North)Then Begin Begin Move East Move East Move North Move North END END

Without advice With advice Empirical Results

Our Approach to Transfer Learning Source Task Extraction Extracted Knowledge Transferred Knowledge Mapping Refinement Target Task

Potential Benefits of Transfer performance training with transfer without transfer higher start steeper slope higher asymptote

Outline Reinforcement Learning w/ Advice Reinforcement Learning w/ Advice Transfer via Rule Extraction & Advice Taking Transfer via Rule Extraction & Advice Taking Transfer via Macros Transfer via Macros Transfer via Markov Logic Networks (time permitting) Transfer via Markov Logic Networks (time permitting) Wrap Up Wrap Up

Reinforcement Learning (RL) Overview Choose action Sense state Receive reward Policy: choose the action with the highest Q-value in the current state Use the rewards to estimate the Q- values of actions in states Described by a set of features

The RoboCup Domain

RoboCup Subtasks Mobile KeepAway BreakAway MoveDownfield Variant of Stone & Sutton, ICML 2001

Q Learning (Watkins PhD, 1989) Q function state action value policy(state) = argmax action For large state spaces, need function approximation

Learning the Q Function Feature vector Weight vector ● A Std Approach: Linear support-vector regression Q-value = Set weights to minimize Model size + C × Data misfit distance(me,teammate1) distance(me,opponent1) angle(opponent1, me, teammate1) … … T ●

Advice in RL Advice provides constraints on Q values under specified conditions Advice provides constraints on Q values under specified conditions IF an opponent is near me AND a teammate is open THEN Q(pass(teammate)) > Q(move(ahead)) Apply as soft constraints in optimization Apply as soft constraints in optimization Model size + C × Data misfit + μ × Advice misfit

Aside: Generalizing the Idea of a Training Example for Support Vector Machines (SVMs) Can extend the SVM linear program to handle “regions as training examples” Fung, Mangasarian, & Shavlik: NIPS 2003, COLT 2004

Specifying Advice for Support Vector Regression B x ≤ d  y ≥ h’ x + β If input (x) is in region specified by B and d then output (y) should be above some line (h’x + β) x y

Sample Advice Advice format Bx ≤ d  f(x) ≥ hx +  If distanceToGoal ≤ 10 and shotAngle ≥ 30 shotAngle ≥ 30 Then Q(shoot) ≥

Sample Advice-Taking Results if distanceToGoal  10 and shotAngle  30 then prefer shoot over all other actions advice std RL 2 vs 1 BreakAway, rewards +1, -1 Q(shoot) > Q(pass) Q(shoot) > Q(move)

Outline Reinforcement Learning w/ Advice Reinforcement Learning w/ Advice Transfer via Rule Extraction & Advice Taking Transfer via Rule Extraction & Advice Taking Transfer via Macros Transfer via Macros Transfer via Markov Logic Networks Transfer via Markov Logic Networks Wrap Up Wrap Up

Close-Transfer Scenarios 2-on-1 BreakAway 3-on-2 BreakAway 4-on-3 BreakAway

Distant-Transfer Scenarios 3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield

Our First Transfer-Learning Approach: Exploit fact that models and advice in same language Q x = w x1 f 1 + w x2 f 2 + b x Q y = w y1 f 1 + b y Q z = w z2 f 2 + b z Source Q functions Q´ x = w x1 f´ 1 + w x2 f´ 2 + b x Q´ y = w y1 f´ 1 + b y Q´ z = w z2 f´ 2 + b z Mapped Q functions if Q´ x > Q´ y and Q´ x > Q´ z then prefer x´ Advice if w x1 f´ 1 + w x2 f´ 2 + b x > w y1 f´ 1 + b y and w x1 f´ 1 + w x2 f´ 2 + b x > w z2 f´ 2 + b z then prefer x´ to y´ and z´ Advice (expanded)

User Advice in Skill Transfer There may be new skills in the target that cannot be learned from the source There may be new skills in the target that cannot be learned from the source We allow (human) users to add their own advice about these skills We allow (human) users to add their own advice about these skills IF: distance(me, GoalPart) < 10 AND angle(GoalPart, me, goalie) > 40 angle(GoalPart, me, goalie) > 40 THEN: prefer shoot(GoalPart) User Advice for KeepAway to BreakAway

Sample Human Interaction “Use what you learned in KeepAway, and add in this new action SHOOT.” “Here is some advice about shooting …” “Now go practice for awhile.”

Policy Transfer to 3-on-2 BreakAway Torrey, Walker, Shavlik & Maclin: ECML 2005

Our Second Approach: Use Inductive Logic Programming (ILP) on SOURCE to extract advice Given Positive and negative examples for each action Positive and negative examples for each actionDo Learn first-order rules that describe most positive examples but few negative examples Learn first-order rules that describe most positive examples but few negative examples good_action(pass(t1), state1) good_action(pass(t2), state3) good_action(pass(t1), state2) good_action(pass(t2), state2) good_action(pass(t1), state3) good_action(pass(Teammate), State) :- distance(me, Teammate, State) > 10, distance(Teammate, goal, State) < 15.

Searching for an ILP Clause ( top-down search using A*)

Skill Transfer to 3-on-2 BreakAway Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006

MoveDownfield to BreakAway Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006

Approach #3: Relational Macros A relational macro is a finite-state machine A relational macro is a finite-state machine Nodes represent internal states of agent in which independent policies apply Nodes represent internal states of agent in which independent policies apply Conditions for transitions and actions are learned via ILP Conditions for transitions and actions are learned via ILP hold ← true pass(Teammate) ← isOpen(Teammate) isClose(Opponent) allOpponentsFar

Step 1: Learning Macro Structure Objective: find (via ILP) an action pattern that separates good and bad games Objective: find (via ILP) an action pattern that separates good and bad games macroSequence(Game, StateA) ← macroSequence(Game, StateA) ← actionTaken(Game, StateA, move, ahead, StateB), actionTaken(Game, StateB, pass, _, StateC), actionTaken(Game, StateC, shoot, _, gameEnd). pass(Teammate) move(ahead) shoot(GoalPart)

Step 2: Learning Macro Conditions Objective: describe when transitions and actions should be taken Objective: describe when transitions and actions should be taken For the transition from move to pass transition(State) ← distance(Teammate, goal, State) < 15. For the policy in the pass node action(State, pass(Teammate)) ← angle(Teammate, me, Opponent, State) > 30. pass(Teammate) move(ahead) shoot(GoalPart)

Learned 2-on-1 BreakAway Macro pass(Teammate) move(Direction) shoot(goalRight) shoot(goalLeft) This shot is apparently a leading pass Player with BALL executes the macro

Transfer via Demonstration Demonstration 1. Execute the macro strategy to get Q-value estimates 2. Infer low Q values for actions not taken by macro 3. Compute an initial Q function with these examples 4. Continue learning with standard RL Advantage: potential for large immediate jump in performance Disadvantage: risk that agent will blindly follow an inappropriate strategy

Macro Transfer to 3-on-2 BreakAway Torrey, Shavlik, Walker & Maclin: ILP 2007 Variant of Taylor & Stone

Macro Transfer to 4-on-3 BreakAway Torrey, Shavlik, Walker & Maclin: ILP 2007

Outline Reinforcement Learning w/ Advice Reinforcement Learning w/ Advice Transfer via Rule Extraction & Advice Taking Transfer via Rule Extraction & Advice Taking Transfer via Macros Transfer via Macros Transfer via Markov Logic Networks Transfer via Markov Logic Networks Wrap Up Wrap Up

Approach #4: Markov Logic Networks (Richardson and Domingos, MLj 2003) 0 ≤ Q < ≤ Q < 1.0 dist1 > 5 ang1 > 45 dist2 < 10 IF dist1 > 5 AND ang1 > 45 THEN 0 ≤ Q < 0.5 Wgt = 2.1 IF dist2 < 10 AND ang1 > 45 THEN 0.5 ≤ Q < 1.0 Wgt = 1.7

Using MLNs to Learn a Q Function Perform hierarchical clustering Perform hierarchical clustering to find set of good Q-value bins to find set of good Q-value bins Use ILP to learn rules that Use ILP to learn rules that classify examples into bins classify examples into bins Use MNL weight-learning methods to choose weights for these formulas Use MNL weight-learning methods to choose weights for these formulas IF dist1 > 5 AND ang1 > 45 THEN 0 ≤ Q < 0.1 Q

MLN Transfer to 3-on-2 BreakAway Torrey, Shavlik, Natarajan, Kuppili & Walker: AAAI TL Workshop 2008

Outline Reinforcement Learning w/ Advice Reinforcement Learning w/ Advice Transfer via Rule Extraction & Advice Taking Transfer via Rule Extraction & Advice Taking Transfer via Macros Transfer via Macros Transfer via Markov Logic Networks Transfer via Markov Logic Networks Wrap Up Wrap Up

Summary of Our Transfer Methods 1. Directly reuse weighted sums as advice 2. Use ILP to learn generalized advice for each action 3. Use ILP to learn macro-operators 4. Use Markov Logic Networks to learn probability distributions for Q functions

Our Desiderata for Transfer in RL Transfer knowledge in first-order logic Transfer knowledge in first-order logic Accept advice from humans expressed naturally Accept advice from humans expressed naturally Refine transferred knowledge Refine transferred knowledge Improve performance in related target tasks Improve performance in related target tasks Major challenge: Avoid negative transfer Major challenge: Avoid negative transfer

Related Work in RL Transfer Value-function transfer (Taylor & Stone 2005) Value-function transfer (Taylor & Stone 2005) Policy reuse (Fernandez & Veloso 2006) Policy reuse (Fernandez & Veloso 2006) State abstractions (Walsh et al. 2006) State abstractions (Walsh et al. 2006) Options (Croonenborghs et al. 2007) Options (Croonenborghs et al. 2007) Torrey and Shavlik survey paper on line

Conclusion Transfer learning important perspective for machine learning - move beyond isolated learning tasks Transfer learning important perspective for machine learning - move beyond isolated learning tasks Appealing ways to do transfer learning are via the advice-taking and demonstration perspectives Appealing ways to do transfer learning are via the advice-taking and demonstration perspectives Long-term goal: instructable computing - teach computers the same way we teach humans Long-term goal: instructable computing - teach computers the same way we teach humans