Richard Maclin University of Minnesota - Duluth

Slides:

Advertisements

Similar presentations

Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.

Advertisements

Support Vector Machines

Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel.

Active Learning with Support Vector Machines

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2011 Predicting Solar Generation from Weather Forecasts Using Machine Learning Navin.

Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1, Kristin P. Bennett 2, Amina Shabbeer 2, Richard Maclin 3 and Jude W. Shavlik 1 1 University.

Haimonti Dutta 1 and Hillol Kargupta 2 1 Center for Computational Learning Systems (CCLS), Columbia University, NY, USA. 2 University of Maryland, Baltimore.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.

Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Practical Issues with SVM. Handwritten Digits:

Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,

Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University.

GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.

Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.

Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

Nonlinear Knowledge in Kernel Approximation Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison.

Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final.

Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.

Support Vector Machine Slides from Andrew Moore and Mingyue Tan.

Learning to Align: a Statistical Approach

Semi-Supervised Learning Using Label Mean

CS 9633 Machine Learning Support Vector Machines

Semi-Supervised Clustering

Sofus A. Macskassy Fetch Technologies

Thirty-Three Years of Knowledge-Based Machine Learning

Ensembles (Bagging, Boosting, and all that)

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Trevor Walker, Gautam Kunapuli, Noah Larsen, David Page, Jude Shavlik

Constrained Clustering -Semi Supervised Clustering-

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Support Vector Machines

Regularized risk minimization

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Development of mean value engine model using ANN

An Introduction to Support Vector Machines

Learning with information of features

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

Artificial Intelligence Chapter 3 Neural Networks

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Artificial Intelligence Lecture No. 28

Artificial Intelligence Chapter 3 Neural Networks

Support Vector Machines and Kernels

Artificial Intelligence Chapter 3 Neural Networks

Concave Minimization for Support Vector Machine Classifiers

Artificial Intelligence Chapter 3 Neural Networks

Refining Rules Incorporated into Knowledge-Based Support Vector Learners via Successive Linear Programming Richard Maclin University of Minnesota - Duluth.

Attention for translation

University of Wisconsin - Madison

Primal Sparse Max-Margin Markov Networks

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Modeling IDS using hybrid intelligent systems

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

A Simple and Effective Method for Incorporating Advice into Kernel Methods Richard Maclin University of Minnesota - Duluth Jude Shavlik, Trevor Walker, Lisa Torrey University of Wisconsin - Madison

The Setting Given Examples of classification/regression task Advice from an expert about the task Do Learn an accurate model Knowledge-Based Classification/Regression

Advice goalie isn’t covering it and angleGoalieGCenter ≥ 25 IF goal center is close and goalie isn’t covering it THEN Shoot! and angleGoalieGCenter ≥ 25 IF distGoalCenter ≤ 15 THEN Qshoot(x) ≥ 0.9

Knowledge-Based Classification

+ penalties for not following advice (hence advice can be refined ) Knowledge-Based Support Vector Methods [Fung et al., 2002, 2003 (KBSVM), Mangasarian et al., 2005 (KBKR)] min size of model + C |s| + penalties for not following advice (hence advice can be refined ) such that f(x) = y  s + constraints that represent advice slack terms

Our Motivation KBKR adds many terms to opt. problem Want accurate but more efficient method Scale to a large number of rules KBKR alters advice in somewhat hard to understand ways (rotation and translation) Focus on a simpler method

Our Contribution – ExtenKBKR Method for incorporating advice that is more efficient than KBKR Advice defined extensionally rather than intensionally (as in KBKR)

Support Vector Machines

Knowledge-Based SVM Also penalty for rotation, translation

Note, point from one class pseudo labeled with the other class Our Extensional KBSVM Note, point from one class pseudo labeled with the other class

Incorporating Advice in KBKR Advice format Bx ≤ d  f(x) ≥  IF distGoalCenter ≤ 15 and angleGoalieGCenter ≥ 25 THEN Qshoot(x) ≥ 0.9

Linear Program with Advice KBKR min sum per action a ||w||1 + |b| + C|sa| + sum per advice k 1||zk||1+ 2 k such that for each action a wax +ba = Qa(x)  sa for each advice k wk+BkTuk = 0  zk -dT uk + k ≥ k – bk ExtenKBKR ( / |Mk|) ||mk||1 Mk wk + bk + m ≥ k

Choosing Examples “Under” Advice Training data – adds second label more weight if labeled same less if labeled differently Unlabeled data – semi-supervised method Generated data – but can be complex to generate meaningful data

Size of Linear Program Additional Items Per Advice Rule KBKR ExtenKBKR Variables E+1 Mk Constraint Terms E2 E Mk E – number of examples Mk – number of examples per advice item (expect Mk << E)

Artificial Data: Methodology 10 input variables Two functions f1 = 20x1x2x3x4 – 1.25 f2 = 5x5 – 5x2 + 3x6 – 2x4 – 0.5 Selected C, 1, 2,  with tuning set Considered adding 0 or 5 pseudo points Used Gaussian kernel

Artificial Data: Advice IF x1 ≥ .7  x2 ≥ .7  x3 ≥ .7  x4 ≥ .7 THEN f1(x) ≥ 4 IF x5 ≥ .7  x2 ≤ .3  x6 ≥ .7  x4 ≤ .3 THEN f2(x) ≥ 5 IF x5 ≥ .6  x6 ≥ .6 THEN PREFER f2(x) TO f1(x) BY .1 IF x5 ≤ .3  x6 ≤ .3 THEN PREFER f1(x) TO f2(x) BY .1 IF x2 ≥ .7  x4 ≥ .7 THEN PREFER f1(x) TO f2 (x) BY .1 IF x2 ≤ .3  x4 ≤ .3 THEN PREFER f2(x) TO f1(x) BY .1

Error on Artificial Data

Time Taken on Artificial Data

RoboCup: Methodology Test on 2-on-1 BreakAway 13 tiled features Average over 10 runs Selected C, 1, 2,  with tuning set Use linear model (tiled features for non-linearity)

ExtenKBKR twice as fast as KBKR in CPU cycles RoboCup Performance ExtenKBKR twice as fast as KBKR in CPU cycles

Related Work Knowledge-Based Kernel Methods Fung et al., NIPS 2002, COLT 2003 Mangasarian et al., JMLR 2005 Maclin et al., AAAI 2005 Le et al., ICML 2006 Mangasarian and Wild, IEEE Trans Neural Nets 2006 Other Methods Using Prior Knowledge Schoelkopf et al., NIPS 1998 Epshteyn & DeJong, ECML 2005 Sun & DeJong, ICML 2005 Semi-supervised SVMs Wu & Srihari, KDD 2004 Franz et al., DAGM 2004

Future Work Label “near” examples to allow advice to expand Analyze predictions for pseudo-labeled examples to determine how advice refined Test on semi-supervised learning tasks

Conclusions ExtenKBKR Key idea: sample advice (extensional definition) and train using standard methods Empirically as accurate as KBKR Empirically more efficient than KBKR Easily adapted to other forms of advice

Acknowledgements US Naval Research Laboratory grant N00173-06-1-G002 (to RM) DARPA grant HR0011-04-1-0007 (to JS)

Questions?