Download presentation
Presentation is loading. Please wait.
Published byJob Warren Modified over 9 years ago
2
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice George Forman Martin Scholz Shyam Rajaram HP Labs, Palo Alto, CA, USA Feature Shaping for Linear SVM Classifiers
3
Linear SVMs ? In reality: High-dimensional Varying predictiveness Heterogenous features common for feature selection
4
Example: Useful Non-linear Feature
5
Feature Transformations and SVMs Affine transformations no Linear transformation relative Distance between examples yes Non-monotonic transform. yes Change to single featureEffect
6
Wishlist: Raw Data - Things to Fix Detection of irrelevant features Appropriate scaling of feature ranges − Blood pressure vs. BMI: scale = importance ? Linear dependence of feature on target − FIX: Speeding - death rate doubles every 10mph Monotonic relationships with the target − FIX: blood pressure etc. healthy in a specific interval
7
The Transformation Landscape Complexity & Costs Feature Selection x i ’:=w i x i w i {0,1} Feature Scaling w i R + … Feature Shaping Non-linear kernels Feature Construction Kernel Learning Individual features Features sets Raw feature x i Transformed x i ’
8
8 Feature Selection Metrics [Forman CIKM’08]
9
9 BNS for feature selection [Forman, JMLR’02]
10
10 Scaling beats selection [Forman CIKM’08] BNS scaling binary features BNS selection IG selection F-measure
11
11 Scaling beats selection [Forman CIKM’08] BNS scaling binary features BNS selection IG selection F-measure
12
Shaping Example
13
Estimating class distributions Input: labeled examples projected to feature x i Goal: estimate p i := P( y | x i = v ) Large variety of cases: − Nominal, binary features − Ordinal features − Continuous features Output: p i : R [0, 1] Compute blue curve!
14
Input: p i : R [0, 1] Goal: make x i “more linearly dependent” Local probability (LP) shaper − x i ’ := p i ( x i ) − non-monotonic transformation Monotonic transformations: − Use rank as new feature value − Derive values from ROC plots Output: function for each i, mapping x i to x i ’ Reshaping Features
15
Coherent Data Processing Blocks PDF estimation Reshaping Features Feature Scaling Normalization Preserving sparsity
16
Feature Scaling Scale of features should reflect importance BNS scaling for binary features: For continuous case: − use BNS score of best binary split Diffing: scale each feature to [0, |BNS(x i ’)|]
17
Normalization Options tested in our experiments: L2 normalization – standard in text mining L1 normalization – sparse solutions No normalization
18
Preserving Sparsity Text data usually very sparse Substantial impact on complexity Discussed transformations: not sparsity- preserving Solution: − Affine transformation no effect on SVMs − Adapt f i so that f i (x i,m ) = 0 if x i,m is mode of x i
19
Experiments Benchmarks − Text: News articles, TREC, Web data, … − UCI: 11 popular datasets, mixed attribute types − Used as binary classification problems, 50+ positives Learner: − Linear SVM (SMO) − 5x XVal to determine C (out of {.01,.1,1,10,100}) − No internal normalization of input − Logistic scaling activated for output
20
Text: Accuracy vs. training set size
21
UCI data: AUC vs. training set size
22
Overview: All binary UCI tasks
23
Lesion Study on UCI data PDF estimation Reshaping Features Feature Scaling Normalization Preserving sparsity
24
Conclusions Data representation is crucial in data mining “Feature Shaping”: − expressive, local technique for transforming features − generalizes selection and scaling − computationally cheap, very practical − tuned locally for each feature Simplistic implementation decent improvements Case dependent, smart implementation ? Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.