© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice George Forman Martin Scholz Shyam.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Introduction to Support Vector Machines (SVM)
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Stefan Rüping Fraunhofer IAIS Ranking Interesting Subgroups.
Yehuda Koren , Joe Sill Recsys’11 best paper award
Chapter 4: Linear Models for Classification
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Decision Tree Rong Jin. Determine Milage Per Gallon.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Speaker Adaptation for Vowel Classification
Support Vector Machines
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Data mining and statistical learning - lecture 13 Separating hyperplane.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Computational Analysis of USA Swimming Data Junfu Xu School of Computer Engineering and Science, Shanghai University.
A)Evaluate f(x)= 5x-3 for x=4 a)Graph the following point on a graph. *Label Each* (1, 3), (-1, 0), (1/2, -4), (0, 4)
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. BNS Feature Scaling: An Improved Representation over TF·IDF for SVM Text Classification Presenter : Lin,
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Universit at Dortmund, LS VIII
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
A Face processing system Based on Committee Machine: The Approach and Experimental Results Presented by: Harvest Jang 29 Jan 2003.
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines Tao Department of computer science University of Illinois.
© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
An Exercise in Machine Learning
Weka. Weka A Java-based machine vlearning tool Implements numerous classifiers and other ML algorithms Uses a common.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign
Support Vector Regression in Marketing Georgi Nalbantov.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
Deep Learning Amin Sobhani.
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Classification Nearest Neighbor
CS 2750: Machine Learning Support Vector Machines
Discriminative Frequent Pattern Analysis for Effective Classification
Intro to Machine Learning
Data Transformations targeted at minimizing experimental variance
Applied Machine Learning For Quant Finance
Presentation transcript:

© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice George Forman Martin Scholz Shyam Rajaram HP Labs, Palo Alto, CA, USA Feature Shaping for Linear SVM Classifiers

Linear SVMs ? In reality: High-dimensional Varying predictiveness Heterogenous features common for feature selection

Example: Useful Non-linear Feature

Feature Transformations and SVMs Affine transformations no Linear transformation relative Distance between examples yes Non-monotonic transform. yes Change to single featureEffect

Wishlist: Raw Data - Things to Fix Detection of irrelevant features Appropriate scaling of feature ranges − Blood pressure vs. BMI: scale = importance ? Linear dependence of feature on target − FIX: Speeding - death rate doubles every 10mph Monotonic relationships with the target − FIX: blood pressure etc. healthy in a specific interval

The Transformation Landscape Complexity & Costs Feature Selection x i ’:=w i x i w i  {0,1} Feature Scaling w i  R + … Feature Shaping Non-linear kernels Feature Construction Kernel Learning Individual features Features sets Raw feature x i Transformed x i ’

8 Feature Selection Metrics [Forman CIKM’08]

9 BNS for feature selection [Forman, JMLR’02]

10 Scaling beats selection [Forman CIKM’08] BNS scaling binary features BNS selection IG selection F-measure

11 Scaling beats selection [Forman CIKM’08] BNS scaling binary features BNS selection IG selection F-measure

Shaping Example

Estimating class distributions Input: labeled examples projected to feature x i Goal: estimate p i := P( y | x i = v ) Large variety of cases: − Nominal, binary features − Ordinal features − Continuous features Output: p i : R  [0, 1] Compute blue curve!

Input: p i : R  [0, 1] Goal: make x i “more linearly dependent” Local probability (LP) shaper − x i ’ := p i ( x i ) − non-monotonic transformation Monotonic transformations: − Use rank as new feature value − Derive values from ROC plots Output: function for each i, mapping x i to x i ’ Reshaping Features

Coherent Data Processing Blocks PDF estimation Reshaping Features Feature Scaling Normalization Preserving sparsity

Feature Scaling Scale of features should reflect importance BNS scaling for binary features: For continuous case: − use BNS score of best binary split Diffing: scale each feature to [0, |BNS(x i ’)|]

Normalization Options tested in our experiments: L2 normalization – standard in text mining L1 normalization – sparse solutions No normalization

Preserving Sparsity Text data usually very sparse Substantial impact on complexity Discussed transformations: not sparsity- preserving Solution: − Affine transformation  no effect on SVMs − Adapt f i so that f i (x i,m ) = 0 if x i,m is mode of x i

Experiments Benchmarks − Text: News articles, TREC, Web data, … − UCI: 11 popular datasets, mixed attribute types − Used as binary classification problems, 50+ positives Learner: − Linear SVM (SMO) − 5x XVal to determine C (out of {.01,.1,1,10,100}) − No internal normalization of input − Logistic scaling activated for output

Text: Accuracy vs. training set size

UCI data: AUC vs. training set size

Overview: All binary UCI tasks

Lesion Study on UCI data PDF estimation Reshaping Features Feature Scaling Normalization Preserving sparsity

Conclusions Data representation is crucial in data mining “Feature Shaping”: − expressive, local technique for transforming features − generalizes selection and scaling − computationally cheap, very practical − tuned locally for each feature Simplistic implementation  decent improvements Case dependent, smart implementation  ? Questions?