Lisa Torrey University of Wisconsin – Madison CS 540
Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism Knowing one language affects learning in another Transfer can be both positive and negative
GivenLearn Task T Task S
performance training higher start higher slope higher asymptote
All Hypotheses Allowed Hypotheses Search
All Hypotheses Allowed Hypotheses Search Thrun and Mitchell 1995: Transfer slopes for gradient descent
Bayesian LearningBayesian Transfer Prior distribution + Data = Posterior Distribution Bayesian methods Raina et al.2006: Transfer a Gaussian prior
LineCurve Surface Circle Pipe Hierarchical methods Stracuzzi 2006: Learn Boolean concepts that can depend on each other
Dealing with Missing Data or Labels Shi et al. 2008: Transfer via active learning Task S Task T
Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 ) Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3
Starting-point methods Hierarchical methods Alteration methods Imitation methods New RL algorithms
target-task training Initial Q-table transfer no transfer Source task Starting-point methods Taylor et al. 2005: Value-function transfer
Hierarchical methods RunKick Pass Shoot Soccer Mehta et al. 2008: Transfer a learned hierarchy
Alteration methods Walsh et al. 2006: Transfer aggregate states Task S Original states Original actions Original rewards New states New actions New rewards
New RL Algorithms Torrey et al. 2006: Transfer advice about skills Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 ) Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3
Imitation methods training source target policy used Torrey et al. 2007: Demonstrate a strategy
Starting-point methods Imitation methods Hierarchical methods Hierarchical methods New RL algorithms Skill Transfer Macro Transfer
3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield 2-on-1 BreakAway
IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF distance(Teammate) ≤ 10 THEN pass(Teammate) …
Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Environment Agent Batch 1 Environment Agent Batch 2 … Compute Q-functions
Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning with Advice (KBKR) Environment Agent Batch 1 Compute Q-functions Environment Agent Batch 2 … Advice + µ × AdviceMisfit
Source Target IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) ILP Advice Taking [Human advice] Mapping
Skill transfer to 3-on-2 BreakAway from several tasks
pass(Teammate) move(Direction) shoot(goalRight) shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(ahead) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(left) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalRight)
source target training policy used An imitation method
Source Target ILP Demonstration
Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game)
Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP shoot(goalRight) IF [ … ] THEN enter(State) IF [ … ] THEN loop(State, Teammate)) pass(Teammate)
Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway
Machine learning is often designed in standalone tasks Transfer is a natural learning ability that we would like to incorporate into machine learners There are some successes, but challenges remain, like avoiding negative transfer and automating mapping