Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.

Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison

Acknowledgements Lisa Torrey, Trevor Walker, & Rich Maclin Lisa Torrey, Trevor Walker, & Rich Maclin DARPA IPTO Grant HR0011-04-1-0007 DARPA IPTO Grant HR0011-04-1-0007 NRL Grant N00173-06-1-G002 NRL Grant N00173-06-1-G002 DARPA IPTO Grant FA8650-06-C-7606 DARPA IPTO Grant FA8650-06-C-7606

What Would You Like to Say to This Penguin? IF a Bee is (Near and West) & an Ice is (Near and North) an Ice is (Near and North)Then Begin Begin Move East Move East Move North Move North END END

Without advice With advice Empirical Results

Our Approach to Transfer Learning Source Task Extraction Extracted Knowledge Transferred Knowledge Mapping Refinement Target Task

Potential Benefits of Transfer performance training with transfer without transfer higher start steeper slope higher asymptote

Outline Reinforcement Learning w/ Advice Reinforcement Learning w/ Advice Transfer via Rule Extraction & Advice Taking Transfer via Rule Extraction & Advice Taking Transfer via Macros Transfer via Macros Transfer via Markov Logic Networks (time permitting) Transfer via Markov Logic Networks (time permitting) Wrap Up Wrap Up

Reinforcement Learning (RL) Overview Choose action Sense state Receive reward Policy: choose the action with the highest Q-value in the current state Use the rewards to estimate the Q- values of actions in states Described by a set of features

The RoboCup Domain

RoboCup Subtasks Mobile KeepAway BreakAway MoveDownfield Variant of Stone & Sutton, ICML 2001

Q Learning (Watkins PhD, 1989) Q function state action value policy(state) = argmax action For large state spaces, need function approximation

Learning the Q Function Feature vector Weight vector ● A Std Approach: Linear support-vector regression Q-value = Set weights to minimize Model size + C × Data misfit distance(me,teammate1) distance(me,opponent1) angle(opponent1, me, teammate1) … 0.2 -0.1 0.9 … T ●

Advice in RL Advice provides constraints on Q values under specified conditions Advice provides constraints on Q values under specified conditions IF an opponent is near me AND a teammate is open THEN Q(pass(teammate)) > Q(move(ahead)) Apply as soft constraints in optimization Apply as soft constraints in optimization Model size + C × Data misfit + μ × Advice misfit

Aside: Generalizing the Idea of a Training Example for Support Vector Machines (SVMs) Can extend the SVM linear program to handle “regions as training examples” Fung, Mangasarian, & Shavlik: NIPS 2003, COLT 2004

Specifying Advice for Support Vector Regression B x ≤ d  y ≥ h’ x + β If input (x) is in region specified by B and d then output (y) should be above some line (h’x + β) x y

Sample Advice Advice format Bx ≤ d  f(x) ≥ hx +  If distanceToGoal ≤ 10 and shotAngle ≥ 30 shotAngle ≥ 30 Then Q(shoot) ≥ 0.9 0.9

Sample Advice-Taking Results if distanceToGoal  10 and shotAngle  30 then prefer shoot over all other actions advice std RL 2 vs 1 BreakAway, rewards +1, -1 Q(shoot) > Q(pass) Q(shoot) > Q(move)

Outline Reinforcement Learning w/ Advice Reinforcement Learning w/ Advice Transfer via Rule Extraction & Advice Taking Transfer via Rule Extraction & Advice Taking Transfer via Macros Transfer via Macros Transfer via Markov Logic Networks Transfer via Markov Logic Networks Wrap Up Wrap Up

Close-Transfer Scenarios 2-on-1 BreakAway 3-on-2 BreakAway 4-on-3 BreakAway

Distant-Transfer Scenarios 3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield

Our First Transfer-Learning Approach: Exploit fact that models and advice in same language Q x = w x1 f 1 + w x2 f 2 + b x Q y = w y1 f 1 + b y Q z = w z2 f 2 + b z Source Q functions Q´ x = w x1 f´ 1 + w x2 f´ 2 + b x Q´ y = w y1 f´ 1 + b y Q´ z = w z2 f´ 2 + b z Mapped Q functions if Q´ x > Q´ y and Q´ x > Q´ z then prefer x´ Advice if w x1 f´ 1 + w x2 f´ 2 + b x > w y1 f´ 1 + b y and w x1 f´ 1 + w x2 f´ 2 + b x > w z2 f´ 2 + b z then prefer x´ to y´ and z´ Advice (expanded)

User Advice in Skill Transfer There may be new skills in the target that cannot be learned from the source There may be new skills in the target that cannot be learned from the source We allow (human) users to add their own advice about these skills We allow (human) users to add their own advice about these skills IF: distance(me, GoalPart) < 10 AND angle(GoalPart, me, goalie) > 40 angle(GoalPart, me, goalie) > 40 THEN: prefer shoot(GoalPart) User Advice for KeepAway to BreakAway

Sample Human Interaction “Use what you learned in KeepAway, and add in this new action SHOOT.” “Here is some advice about shooting …” “Now go practice for awhile.”

Policy Transfer to 3-on-2 BreakAway Torrey, Walker, Shavlik & Maclin: ECML 2005

Our Second Approach: Use Inductive Logic Programming (ILP) on SOURCE to extract advice Given Positive and negative examples for each action Positive and negative examples for each actionDo Learn first-order rules that describe most positive examples but few negative examples Learn first-order rules that describe most positive examples but few negative examples good_action(pass(t1), state1) good_action(pass(t2), state3) good_action(pass(t1), state2) good_action(pass(t2), state2) good_action(pass(t1), state3) good_action(pass(Teammate), State) :- distance(me, Teammate, State) > 10, distance(Teammate, goal, State) < 15.

Searching for an ILP Clause ( top-down search using A*)

Skill Transfer to 3-on-2 BreakAway Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006

MoveDownfield to BreakAway Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006

Approach #3: Relational Macros A relational macro is a finite-state machine A relational macro is a finite-state machine Nodes represent internal states of agent in which independent policies apply Nodes represent internal states of agent in which independent policies apply Conditions for transitions and actions are learned via ILP Conditions for transitions and actions are learned via ILP hold ← true pass(Teammate) ← isOpen(Teammate) isClose(Opponent) allOpponentsFar

Step 1: Learning Macro Structure Objective: find (via ILP) an action pattern that separates good and bad games Objective: find (via ILP) an action pattern that separates good and bad games macroSequence(Game, StateA) ← macroSequence(Game, StateA) ← actionTaken(Game, StateA, move, ahead, StateB), actionTaken(Game, StateB, pass, _, StateC), actionTaken(Game, StateC, shoot, _, gameEnd). pass(Teammate) move(ahead) shoot(GoalPart)

Step 2: Learning Macro Conditions Objective: describe when transitions and actions should be taken Objective: describe when transitions and actions should be taken For the transition from move to pass transition(State) ← distance(Teammate, goal, State) < 15. For the policy in the pass node action(State, pass(Teammate)) ← angle(Teammate, me, Opponent, State) > 30. pass(Teammate) move(ahead) shoot(GoalPart)

Learned 2-on-1 BreakAway Macro pass(Teammate) move(Direction) shoot(goalRight) shoot(goalLeft) This shot is apparently a leading pass Player with BALL executes the macro

Transfer via Demonstration Demonstration 1. Execute the macro strategy to get Q-value estimates 2. Infer low Q values for actions not taken by macro 3. Compute an initial Q function with these examples 4. Continue learning with standard RL Advantage: potential for large immediate jump in performance Disadvantage: risk that agent will blindly follow an inappropriate strategy

Macro Transfer to 3-on-2 BreakAway Torrey, Shavlik, Walker & Maclin: ILP 2007 Variant of Taylor & Stone

Macro Transfer to 4-on-3 BreakAway Torrey, Shavlik, Walker & Maclin: ILP 2007

Approach #4: Markov Logic Networks (Richardson and Domingos, MLj 2003) 0 ≤ Q < 0.5 0.5 ≤ Q < 1.0 dist1 > 5 ang1 > 45 dist2 < 10 IF dist1 > 5 AND ang1 > 45 THEN 0 ≤ Q < 0.5 Wgt = 2.1 IF dist2 < 10 AND ang1 > 45 THEN 0.5 ≤ Q < 1.0 Wgt = 1.7

Using MLNs to Learn a Q Function Perform hierarchical clustering Perform hierarchical clustering to find set of good Q-value bins to find set of good Q-value bins Use ILP to learn rules that Use ILP to learn rules that classify examples into bins classify examples into bins Use MNL weight-learning methods to choose weights for these formulas Use MNL weight-learning methods to choose weights for these formulas IF dist1 > 5 AND ang1 > 45 THEN 0 ≤ Q < 0.1 Q

MLN Transfer to 3-on-2 BreakAway Torrey, Shavlik, Natarajan, Kuppili & Walker: AAAI TL Workshop 2008

Summary of Our Transfer Methods 1. Directly reuse weighted sums as advice 2. Use ILP to learn generalized advice for each action 3. Use ILP to learn macro-operators 4. Use Markov Logic Networks to learn probability distributions for Q functions

Our Desiderata for Transfer in RL Transfer knowledge in first-order logic Transfer knowledge in first-order logic Accept advice from humans expressed naturally Accept advice from humans expressed naturally Refine transferred knowledge Refine transferred knowledge Improve performance in related target tasks Improve performance in related target tasks Major challenge: Avoid negative transfer Major challenge: Avoid negative transfer

Related Work in RL Transfer Value-function transfer (Taylor & Stone 2005) Value-function transfer (Taylor & Stone 2005) Policy reuse (Fernandez & Veloso 2006) Policy reuse (Fernandez & Veloso 2006) State abstractions (Walsh et al. 2006) State abstractions (Walsh et al. 2006) Options (Croonenborghs et al. 2007) Options (Croonenborghs et al. 2007) Torrey and Shavlik survey paper on line

Conclusion Transfer learning important perspective for machine learning - move beyond isolated learning tasks Transfer learning important perspective for machine learning - move beyond isolated learning tasks Appealing ways to do transfer learning are via the advice-taking and demonstration perspectives Appealing ways to do transfer learning are via the advice-taking and demonstration perspectives Long-term goal: instructable computing - teach computers the same way we teach humans Long-term goal: instructable computing - teach computers the same way we teach humans

Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.

Similar presentations

Presentation on theme: "Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.

Similar presentations

Presentation on theme: "Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison."— Presentation transcript:

Similar presentations

About project

Feedback