Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.

Machine Learning via Advice Taking Jude Shavlik

Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Quote (2002) from DARPA Sometimes an assistant will merely watch you and draw conclusions. Sometimes you have to tell a new person, 'Please don't do it this way' or 'From now on when I say X, you do Y.' It's a combination of learning by example and by being guided.

Widening the “Communication Pipeline” between Humans and Machine Learners Teacher Pupil Machine Learner

Our Approach to Building Better Machine Learners Human partner expresses advice “naturally” and w/o knowledge of ML agent’s internals Agent incorporates advice directly into the function it is learning Additional feedback (rewards, I/O pairs, inferred labels, more advice) used to refine learner continually

“Standard” Machine Learning vs. Theory Refinement Positive Examples (“should see doctor”) temp = 102.1, age = 21, sex = F, … temp = 101.7, age = 37, sex = M, … Negative Examples (“take two aspirins”) temp = 99.1, age = 43, sex = M, … temp = 99.6, age = 24, sex = F, … Approximate Domain Knowledge if temp = high and age = young … then neg example Related work by labs of Mooney, Pazzani, Cohen, Giles, etc

Rich Maclin’s PhD (1995) IF a Bee is (Near and West) & an Ice is (Near and North) Then Begin Move East Move North END

Sample Results Without advice With advice

Our Motto Give advice rather than commands to your computer

Outline Prior Knowledge and Support Vector Machines  Intro to SVM’s  Linear Separation  Non-Linear Separation  Function Fitting (“Regression”)  Advice-Taking Reinforcement Learning  Transfer Learning via Advice Taking

Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support Vectors ? Margin

Linear Algebra for SVM’s Given p points in n dimensional space Represent by p-by-n matrix A of reals More succinctly where e is vector of ones Separate by two bounding planes Each A i in class +1 or -1

“Slack” Variables Dealing with Data that is not Linearly Separable A+ A- y Support Vectors

Support Vector Machines Quadratic Programming Formulation Solve this quadratic program min s.t. Maximize margin by minimizing Minimize sum of slack vars with wgt

Support Vector Machines Linear Programming Formulation Use 1-norm instead of 2-norm (typically runs faster; better feature selection; might generalize better, NIPS ‘03) min s.t.

Knowledge-Based SVM’s Generalizing “Example” from POINT to REGION A+ A-

Incorporating “Knowledge Sets” Into the SVM Linear Program This implication equivalent to set of constraints (proof in NIPS ’02 paper) Suppose that knowledge set belongs to class A+ Hence must lie in half space We therefore have the implication

Resulting LP for KBSVM’s We get this linear program (LP) Ranges over # regions

KBSVM with Slack Variables Was 0

SVMs and Non-Linear Separating Surfaces f1f1 f2f2 + + _ _ h(f 1, f 2 ) g(f 1, f 2 ) + + _ _ Non-linearly map to new space Linearly separate in new space (using kernels) Result is non-linear separator in original space Fung et al. (2003) presents knowledge- based non-linear SVMs

Support Vector Regression (aka Kernel Regression) Linearly approximating a function, given array A of inputs and vector y of (numeric) outputs f(x) ≈ x’w + b Find weights such that Aw + be ≈ y In dual space, w = A’ , so get (A A’)  + be ≈ y Kernel’izing (to get non-linear approx) K(A,A’)  + be ≈ y y x

What to Optimize? Linear program to optimize 1 st term (  ) is “regularizer” that minimizes model complexity 2 nd term is approximation error, weighted by parameter C Classical “least squares” fit if quadratic version and first term ignored

Predicting Y for New X y = K(x’, A’)  + b Use Kernel to compute “distance” to each training point (ie, row in A) Weight by  i (hopefully many  i are zero), Sum Add b (a scalar)

Knowledge-Based SVR Mangasarian, Shavlik, & Wild, JMLR ‘04 Add soft constraints to linear program (so need only follow advice approximately) minimize ||w|| 1 + C ||s|| 1 + penalty for violating advice such that y - s  Aw + b  y + s “slacked” match to advice Advice: In this region, y should exceed 4 S y 4

Testbeds: Subtasks of RoboCup Keep ball from opponents [Stone & Sutton, ICML 2001] Mobile KeepAway Score goal [Maclin et al., AAAI 2005] BreakAway

Reinforcement Learning Overview Take an action Receive a state Receive a reward Policy: choose the action with the highest Q-value in the current state Use the rewards to estimate the Q- values of actions in states Described by a set of features

Incorporating Advice in KBKR Advice format Bx ≤ d  f(x) ≥ hx +  If distanceToGoal ≤ 10 and shotAngle ≥ 30 Then Q(shoot) ≥ 0.9

Giving Advice About Relative Values of Multiple Functions Maclin et al, AAAI ’05 When the input satisfies preconditions(input) Then f 1 (input) > f 2 (input)

Sample Advice-Taking Results if distanceToGoal  10 and shotAngle  30 then prefer shoot over all other actions advice std RL 2 vs 1 BreakAway, rewards +1, -1 Q(shoot) > Q(pass) Q(shoot) > Q(move)

Transfer Learning Agent discovers how tasks are related We use a user mapping to tell the agent this Agent learns Task A Agent encounters related Task B Agent uses knowledge from Task A to learn Task B faster Task A is the source Task B is the target

Transfer Learning: The Goal for the Target Task performance training with transfer without transfer better start faster rise better asymptote

Our Transfer Algorithm Observe source task games to learn skills Use ILP to create advice for the target task Learn target task with KBKR Translate learned skills into transfer advice If there is user advice, add it in

Learning Skills By Observation Source-task games are sequences: (state, action) Learning skills is like learning to classify states by their correct actions ILP = Inductive Logic Programming State 1 distBetween(me,teammate2) = 15 distBetween(me,teammate1) = 10 distBetween(me,opponent1) = 5... action = pass(teammate2) outcome = caught(teammate2)

ILP: Searching for First-Order Rules P :- true P :- QP :- R P :- R, QP :- R, S P :- R, S, V, W, X P :- S We also use a random-sampling approach

Advantages of ILP Can produce first-order rules for skills Capture only the essential aspects of the skill We expect these aspects to transfer better Can incorporate background knowledge pass(Teammate) pass(teammate1) pass(teammateN) vs.......

Example of a Skill Learned by ILP from KeepAway pass(Teammate) :- distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7. Also gave “human” advice about shooting, since that is new skill in BreakAway

TL Level 7: KA to BA Raw Curves

TL Level 7: KA to BA Averaged Curves

TL Level 7: Statistics TL Metrics Average Reward TypeNameKA to BAMD to BA ScoreP ValueScoreP Value IJump start0.050.03120.080.0086 Jump start smoothed0.080.00020.060.0014 IITransfer ratio1.820.00341.860.0004 Transfer ratio (truncated)1.820.00321.860.0004 Average relative reduction (narrow)0.580.00420.540.0004 Average relative reduction (wide)0.700.00180.710.0008 Ratio (of area under the curves)1.370.00561.410.0012 Transfer difference503.570.0046561.270.0008 Transfer difference (scaled)1017.000.00401091.20.0016 IIIAsymptotic advantage0.090.00860.110.0040 Asymptotic advantage smoothed0.080.01160.100.0030 Boldface indicates a significant difference was found

Conclusion Can use much more than I/O pairs in ML Give advice to computers; they automatically refine it based on feedback from user or environment Advice an appealing mechanism for transferring learned knowledge computer-to-computer

Some Papers (on-line, use Google :-) Creating Advice-Taking Reinforcement LearnersCreating Advice-Taking Reinforcement Learners, Maclin & Shavlik, Machine Learning 1996 Knowledge-Based Support Vector Machine ClassifiersKnowledge-Based Support Vector Machine Classifiers, Fung, Mangasarian, & Shavlik, NIPS 2002 Knowledge-Based Nonlinear Kernel ClassifiersKnowledge-Based Nonlinear Kernel Classifiers, Fung, Mangasarian, & Shavlik, COLT 2003 Knowledge-Based Kernel ApproximationKnowledge-Based Kernel Approximation, Mangasarian, Shavlik, & Wild, JAIR 2004 Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel RegressionGiving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression, Maclin, Shavlik, Torrey, Walker, & Wild, AAAI 2005 Skill Acquisition via Transfer Learning and Advice TakingSkill Acquisition via Transfer Learning and Advice Taking, Torrey, Shavlik, Walker, & Maclin, ECML 2006

Backups

Breakdown of Results

What if User Advice is Bad?

Related Work on Transfer Q-function transfer in RoboCup Taylor & Stone (AAMAS 2005, AAAI 2005) Transfer via policy reuse Fernandez & Veloso (AAMAS 2006, ICML workshop 2006) Madden & Howley (AI Review 2004) Torrey et al. (ECML 2005) Transfer via relational RL Driessens et al. (ICML workshop 2006)

Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.

Similar presentations

Presentation on theme: "Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.

Similar presentations

Presentation on theme: "Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA."— Presentation transcript:

Similar presentations

About project

Feedback