Download presentation
Presentation is loading. Please wait.
Published byLee Hawkins Modified over 9 years ago
1
Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner, Martin Law, Erinija Pranckeviciene, Albert Orriols-Puig, Nuria Macia
2
All Rights Reserved © Alcatel-Lucent 2008 2 Supervised Learning: Many Methods, Data Dependent Performances Bayesian classifiers, logistic regression, linear & polynomial discriminators, nearest-neighbors, decision trees & forests, neural networks, support vector machines, ensemble methods, … No clear winners good for all problems Often, accuracy reaches a limit for a practical problem, even with the best known method
3
All Rights Reserved © Alcatel-Lucent 2008 3 Accuracy Depends on the Goodness of Match between Classifiers and Problems NNXCS error= 0.06% error= 1.9% Better ! Problem A Problem B error= 0.6% error= 0.7% XCS NN Better !
4
All Rights Reserved © Alcatel-Lucent 2008 4 Measuring Geometrical Complexity of Classification Problems Our goal: tools and languages for studying Characteristics of geometry & topology of high-dim data sets How they change with feature transformations and sampling How they interact with classifier geometry We want to know: What are real-world problems like? What is my problem like? What can be expected of a method on a specific problem?
5
All Rights Reserved © Alcatel-Lucent 2008 5 Parameterization of Data Complexity
6
All Rights Reserved © Alcatel-Lucent 2008 6 Some Useful Measures of Geometric Complexity Classical measure of class separability Maximize over all features to find the most discriminating Fisher’s Discriminant RatioDegree of Linear Separability Find separating hyper- plane by linear programming Error counts and distances to plane measure separability Length of Class Boundary Compute minimum spanning tree Count class-crossing edges Shapes of Class Manifolds Cover same-class pts with maximal balls Ball counts describe shape of class manifold
7
All Rights Reserved © Alcatel-Lucent 2008 7 Real-World Data Sets: Benchmarking data from UC-Irvine archive 844 two-class problems 452 are linearly separable, 392 non-separable Synthetic Data Sets: Random labeling of randomly located points 100 problems in 1-100 dimensions Using Complexity Measures to Study Problem Distributions Random labeling Linearly separable real-world data Linearly non- separable real- world data Complexity Metric 1 Metric 2
8
All Rights Reserved © Alcatel-Lucent 2008 8 Measures of Geometrical Complexity
9
All Rights Reserved © Alcatel-Lucent 2008 9 Distribution of Problems in Complexity Space lin.sep lin.nonsep random
10
All Rights Reserved © Alcatel-Lucent 2008 10 The First 6 Principal Components
11
All Rights Reserved © Alcatel-Lucent 2008 11 Interpretation of the First 4 PCs PC 1: 50% of variance: Linearity of boundary and proximity of opposite class neighbor PC 2: 12% of variance: Balance between within-class scatter and between-class distance PC 3: 11% of variance: Concentration & orientation of intrusion into opposite class PC 4: 9% of variance: Within-class scatter
12
All Rights Reserved © Alcatel-Lucent 2008 12 Continuous distribution Known easy & difficult problems occupy opposite ends Few outliers Empty regions Random labels Linearly separable Problem Distribution in 1 st & 2 nd Principal Components
13
All Rights Reserved © Alcatel-Lucent 2008 13 Relating Classifier Behavior to Data Complexity
14
All Rights Reserved © Alcatel-Lucent 2008 14 Class Boundaries Inferred by Different Classifiers XCS: a genetic algorithm Nearest neighbor classifier Linear classifier
15
All Rights Reserved © Alcatel-Lucent 2008 15 Domains of Competence of Classifiers Which classifier works the best for a given classification problem? Can data complexity give us a hint? Complexity metric 1 Metric 2 NN LC XCS Decision Forest ?
16
All Rights Reserved © Alcatel-Lucent 2008 16 Domain of Competence Experiment Use a set of 9 complexity measures Boundary, Pretop, IntraInter, NonLinNN, NonLinLP, Fisher, MaxEff, VolumeOverlap, Npts/Ndim Characterize 392 two-class problems from UCI data, all shown to be linearly non-separable Evaluate 6 classifiers NN (1-nearest neighbor) LP (linear classifier by linear programming) Odt (oblique decision tree) Pdfc (random subspace decision forest) Bdfc (bagging based decision forest) XCS (a genetic-algorithm based classifier) ensemble methods
17
All Rights Reserved © Alcatel-Lucent 2008 17 Identifiable Domains of Competence by NN and LP Best Classifier for Benchmarking Data
18
All Rights Reserved © Alcatel-Lucent 2008 18 Regions in complexity space where the best classifier is (nn,lp, or odt) vs. an ensemble technique Boundary-NonLinNN IntraInter-Pretop MaxEff-VolumeOverlap ensemble + nn,lp,odt Less Identifiable Domains of Competence
19
All Rights Reserved © Alcatel-Lucent 2008 19 Difficulties in Estimating Data Complexity
20
All Rights Reserved © Alcatel-Lucent 2008 20 Apparent vs. True Complexity: Uncertainty in Measures due to Sampling Density 2 points10 points 100 points500 points1000 points Problem may appear deceptively simple or complex with small samples
21
All Rights Reserved © Alcatel-Lucent 2008 21 Uncertainty of Estimates at Two Levels Sparse training data in each problem & complex geometry cause ill-posedness of class boundaries (uncertainty in feature space) Sparse sample of problems causes difficulty in identifying regions of dominant competence (uncertainty in complexity space)
22
All Rights Reserved © Alcatel-Lucent 2008 22 Complexity Estimates and Dimensionality Reduction Feature selection/transformation may change the difficulty of a classification problem: Widening the gap between classes Compressing the discriminatory information Removing irrelevant dimensions It is often unclear to what extent these happen We seek quantitative description of such changes Feature selection Discrimination
23
All Rights Reserved © Alcatel-Lucent 2008 23 Spread of classification accuracy and geometrical complexity due to forward feature selection
24
All Rights Reserved © Alcatel-Lucent 2008 24 Conclusions
25
All Rights Reserved © Alcatel-Lucent 2008 25 Summary: Early Discoveries Problems distribute in a continuum in complexity space Several key measures provide independent characterization There exist identifiable domains of classifier’s dominant competency Sparse sampling, feature selection, and feature transformation induce variability in complexity estimates
26
All Rights Reserved © Alcatel-Lucent 2008 26 For the Future Further progress in statistical learning will need systematic, scientific evaluation of the algorithms with problems that are difficult for different reasons. A “problem synthesizer” will be useful to provide a complete evaluation platform, and reveal the “blind spots” of current learning algorithms. Rigorous statistical characterization of complexity estimates from limited training data will help gauge the uncertainty, and determine applicability of data complexity methods. Ongoing: DCol: Data Complexity Library ICPR 2010 Contest on Domain of Dominant Competence
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.