Download presentation
Presentation is loading. Please wait.
1
Machine Learning via Advice Taking Jude Shavlik
2
Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA
3
Quote (2002) from DARPA Sometimes an assistant will merely watch you and draw conclusions. Sometimes you have to tell a new person, 'Please don't do it this way' or 'From now on when I say X, you do Y.' It's a combination of learning by example and by being guided.
4
Widening the “Communication Pipeline” between Humans and Machine Learners Teacher Pupil Machine Learner
5
Our Approach to Building Better Machine Learners Human partner expresses advice “naturally” and w/o knowledge of ML agent’s internals Agent incorporates advice directly into the function it is learning Additional feedback (rewards, I/O pairs, inferred labels, more advice) used to refine learner continually
6
“Standard” Machine Learning vs. Theory Refinement Positive Examples (“should see doctor”) temp = 102.1, age = 21, sex = F, … temp = 101.7, age = 37, sex = M, … Negative Examples (“take two aspirins”) temp = 99.1, age = 43, sex = M, … temp = 99.6, age = 24, sex = F, … Approximate Domain Knowledge if temp = high and age = young … then neg example Related work by labs of Mooney, Pazzani, Cohen, Giles, etc
7
Rich Maclin’s PhD (1995) IF a Bee is (Near and West) & an Ice is (Near and North) Then Begin Move East Move North END
8
Sample Results Without advice With advice
9
Our Motto Give advice rather than commands to your computer
10
Outline Prior Knowledge and Support Vector Machines Intro to SVM’s Linear Separation Non-Linear Separation Function Fitting (“Regression”) Advice-Taking Reinforcement Learning Transfer Learning via Advice Taking
11
Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support Vectors ? Margin
12
Linear Algebra for SVM’s Given p points in n dimensional space Represent by p-by-n matrix A of reals More succinctly where e is vector of ones Separate by two bounding planes Each A i in class +1 or -1
13
“Slack” Variables Dealing with Data that is not Linearly Separable A+ A- y Support Vectors
14
Support Vector Machines Quadratic Programming Formulation Solve this quadratic program min s.t. Maximize margin by minimizing Minimize sum of slack vars with wgt
15
Support Vector Machines Linear Programming Formulation Use 1-norm instead of 2-norm (typically runs faster; better feature selection; might generalize better, NIPS ‘03) min s.t.
16
Knowledge-Based SVM’s Generalizing “Example” from POINT to REGION A+ A-
17
Incorporating “Knowledge Sets” Into the SVM Linear Program This implication equivalent to set of constraints (proof in NIPS ’02 paper) Suppose that knowledge set belongs to class A+ Hence must lie in half space We therefore have the implication
18
Resulting LP for KBSVM’s We get this linear program (LP) Ranges over # regions
19
KBSVM with Slack Variables Was 0
20
SVMs and Non-Linear Separating Surfaces f1f1 f2f2 + + _ _ h(f 1, f 2 ) g(f 1, f 2 ) + + _ _ Non-linearly map to new space Linearly separate in new space (using kernels) Result is non-linear separator in original space Fung et al. (2003) presents knowledge- based non-linear SVMs
21
Support Vector Regression (aka Kernel Regression) Linearly approximating a function, given array A of inputs and vector y of (numeric) outputs f(x) ≈ x’w + b Find weights such that Aw + be ≈ y In dual space, w = A’ , so get (A A’) + be ≈ y Kernel’izing (to get non-linear approx) K(A,A’) + be ≈ y y x
22
What to Optimize? Linear program to optimize 1 st term ( ) is “regularizer” that minimizes model complexity 2 nd term is approximation error, weighted by parameter C Classical “least squares” fit if quadratic version and first term ignored
23
Predicting Y for New X y = K(x’, A’) + b Use Kernel to compute “distance” to each training point (ie, row in A) Weight by i (hopefully many i are zero), Sum Add b (a scalar)
24
Knowledge-Based SVR Mangasarian, Shavlik, & Wild, JMLR ‘04 Add soft constraints to linear program (so need only follow advice approximately) minimize ||w|| 1 + C ||s|| 1 + penalty for violating advice such that y - s Aw + b y + s “slacked” match to advice Advice: In this region, y should exceed 4 S y 4
25
Testbeds: Subtasks of RoboCup Keep ball from opponents [Stone & Sutton, ICML 2001] Mobile KeepAway Score goal [Maclin et al., AAAI 2005] BreakAway
26
Reinforcement Learning Overview Take an action Receive a state Receive a reward Policy: choose the action with the highest Q-value in the current state Use the rewards to estimate the Q- values of actions in states Described by a set of features
27
Incorporating Advice in KBKR Advice format Bx ≤ d f(x) ≥ hx + If distanceToGoal ≤ 10 and shotAngle ≥ 30 Then Q(shoot) ≥ 0.9
28
Giving Advice About Relative Values of Multiple Functions Maclin et al, AAAI ’05 When the input satisfies preconditions(input) Then f 1 (input) > f 2 (input)
29
Sample Advice-Taking Results if distanceToGoal 10 and shotAngle 30 then prefer shoot over all other actions advice std RL 2 vs 1 BreakAway, rewards +1, -1 Q(shoot) > Q(pass) Q(shoot) > Q(move)
30
Transfer Learning Agent discovers how tasks are related We use a user mapping to tell the agent this Agent learns Task A Agent encounters related Task B Agent uses knowledge from Task A to learn Task B faster Task A is the source Task B is the target
31
Transfer Learning: The Goal for the Target Task performance training with transfer without transfer better start faster rise better asymptote
32
Our Transfer Algorithm Observe source task games to learn skills Use ILP to create advice for the target task Learn target task with KBKR Translate learned skills into transfer advice If there is user advice, add it in
33
Learning Skills By Observation Source-task games are sequences: (state, action) Learning skills is like learning to classify states by their correct actions ILP = Inductive Logic Programming State 1 distBetween(me,teammate2) = 15 distBetween(me,teammate1) = 10 distBetween(me,opponent1) = 5... action = pass(teammate2) outcome = caught(teammate2)
34
ILP: Searching for First-Order Rules P :- true P :- QP :- R P :- R, QP :- R, S P :- R, S, V, W, X P :- S We also use a random-sampling approach
35
Advantages of ILP Can produce first-order rules for skills Capture only the essential aspects of the skill We expect these aspects to transfer better Can incorporate background knowledge pass(Teammate) pass(teammate1) pass(teammateN) vs.......
36
Example of a Skill Learned by ILP from KeepAway pass(Teammate) :- distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7. Also gave “human” advice about shooting, since that is new skill in BreakAway
37
TL Level 7: KA to BA Raw Curves
38
TL Level 7: KA to BA Averaged Curves
39
TL Level 7: Statistics TL Metrics Average Reward TypeNameKA to BAMD to BA ScoreP ValueScoreP Value IJump start0.050.03120.080.0086 Jump start smoothed0.080.00020.060.0014 IITransfer ratio1.820.00341.860.0004 Transfer ratio (truncated)1.820.00321.860.0004 Average relative reduction (narrow)0.580.00420.540.0004 Average relative reduction (wide)0.700.00180.710.0008 Ratio (of area under the curves)1.370.00561.410.0012 Transfer difference503.570.0046561.270.0008 Transfer difference (scaled)1017.000.00401091.20.0016 IIIAsymptotic advantage0.090.00860.110.0040 Asymptotic advantage smoothed0.080.01160.100.0030 Boldface indicates a significant difference was found
40
Conclusion Can use much more than I/O pairs in ML Give advice to computers; they automatically refine it based on feedback from user or environment Advice an appealing mechanism for transferring learned knowledge computer-to-computer
41
Some Papers (on-line, use Google :-) Creating Advice-Taking Reinforcement LearnersCreating Advice-Taking Reinforcement Learners, Maclin & Shavlik, Machine Learning 1996 Knowledge-Based Support Vector Machine ClassifiersKnowledge-Based Support Vector Machine Classifiers, Fung, Mangasarian, & Shavlik, NIPS 2002 Knowledge-Based Nonlinear Kernel ClassifiersKnowledge-Based Nonlinear Kernel Classifiers, Fung, Mangasarian, & Shavlik, COLT 2003 Knowledge-Based Kernel ApproximationKnowledge-Based Kernel Approximation, Mangasarian, Shavlik, & Wild, JAIR 2004 Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel RegressionGiving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression, Maclin, Shavlik, Torrey, Walker, & Wild, AAAI 2005 Skill Acquisition via Transfer Learning and Advice TakingSkill Acquisition via Transfer Learning and Advice Taking, Torrey, Shavlik, Walker, & Maclin, ECML 2006
42
Backups
43
Breakdown of Results
44
What if User Advice is Bad?
45
Related Work on Transfer Q-function transfer in RoboCup Taylor & Stone (AAMAS 2005, AAAI 2005) Transfer via policy reuse Fernandez & Veloso (AAMAS 2006, ICML workshop 2006) Madden & Howley (AI Review 2004) Torrey et al. (ECML 2005) Transfer via relational RL Driessens et al. (ICML workshop 2006)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.