Download presentation
Presentation is loading. Please wait.
Published byAllyson Thompson Modified over 9 years ago
1
Lisa Torrey University of Wisconsin – Madison CS 540
2
Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism Knowing one language affects learning in another Transfer can be both positive and negative
3
GivenLearn Task T Task S
4
performance training higher start higher slope higher asymptote
5
All Hypotheses Allowed Hypotheses Search
6
All Hypotheses Allowed Hypotheses Search Thrun and Mitchell 1995: Transfer slopes for gradient descent
7
Bayesian LearningBayesian Transfer Prior distribution + Data = Posterior Distribution Bayesian methods Raina et al.2006: Transfer a Gaussian prior
8
LineCurve Surface Circle Pipe Hierarchical methods Stracuzzi 2006: Learn Boolean concepts that can depend on each other
9
Dealing with Missing Data or Labels Shi et al. 2008: Transfer via active learning Task S Task T
10
Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 ) Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3
11
Starting-point methods Hierarchical methods Alteration methods Imitation methods New RL algorithms
12
0000 0000 0000 target-task training 2548 9172 5914 Initial Q-table transfer no transfer Source task Starting-point methods Taylor et al. 2005: Value-function transfer
13
Hierarchical methods RunKick Pass Shoot Soccer Mehta et al. 2008: Transfer a learned hierarchy
14
Alteration methods Walsh et al. 2006: Transfer aggregate states Task S Original states Original actions Original rewards New states New actions New rewards
15
New RL Algorithms Torrey et al. 2006: Transfer advice about skills Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 ) Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3
16
Imitation methods training source target policy used Torrey et al. 2007: Demonstrate a strategy
17
Starting-point methods Imitation methods Hierarchical methods Hierarchical methods New RL algorithms Skill Transfer Macro Transfer
18
3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield 2-on-1 BreakAway
19
IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF distance(Teammate) ≤ 10 THEN pass(Teammate) …
20
Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Environment Agent Batch 1 Environment Agent Batch 2 … Compute Q-functions
21
Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning with Advice (KBKR) Environment Agent Batch 1 Compute Q-functions Environment Agent Batch 2 … Advice + µ × AdviceMisfit
22
Source Target IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) ILP Advice Taking [Human advice] Mapping
23
Skill transfer to 3-on-2 BreakAway from several tasks
24
pass(Teammate) move(Direction) shoot(goalRight) shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(ahead) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(left) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalRight)
25
source target training policy used An imitation method
26
Source Target ILP Demonstration
27
Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game)
28
Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP shoot(goalRight) IF [ … ] THEN enter(State) IF [ … ] THEN loop(State, Teammate)) pass(Teammate)
29
Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway
30
Machine learning is often designed in standalone tasks Transfer is a natural learning ability that we would like to incorporate into machine learners There are some successes, but challenges remain, like avoiding negative transfer and automating mapping
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.