On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)

Slides:



Advertisements
Similar presentations
Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.
Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Linear Classifiers (perceptrons)
Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Minimum Redundancy and Maximum Relevance Feature Selection
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Classification and Decision Boundaries
Feature Selection Presented by: Nafise Hatamikhah
K nearest neighbor and Rocchio algorithm
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Instance Based Learning
Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers Aparna S. Varde April 28, 2005 CS539: Machine Learning Course Instructor:
Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Instance Based Learning IB1 and IBK Small section in chapter 20.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.
CS Instance Based Learning1 Instance Based Learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
This week: overview on pattern recognition (related to machine learning)
Data mining and machine learning A brief introduction.
Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.
An Introduction to Support Vector Machines (M. Law)
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Linear Models for Classification
Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Data Transformation: Normalization
Semi-Supervised Clustering
Evaluating Classifiers
LECTURE 11: Advanced Discriminant Analysis
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Mining (and machine learning)
CS 4/527: Artificial Intelligence
Machine Learning Feature Creation and Selection
K Nearest Neighbor Classification
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Nearest-Neighbor Classifiers
CSE572, CBS598: Data Mining by H. Liu
COSC 4335: Other Classification Techniques
Chap 8. Instance Based Learning
Machine Learning: UNIT-4 CHAPTER-1
Machine learning overview
Supervised machine learning: creating a model
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)

Dataset

classifier

CB case base maintenance algorithm

Overview  Dataset complexity measures  Classification experiment  Case base maintenance experiment  Going forward

Overview  Dataset complexity measures  Classification experiment  Case base maintenance experiment  Going forward

Dataset Complexity Measures Measures of classification difficulty apparent difficulty, since we measure a dataset which samples the problem space Little impact on CBR Fornells et al., ICCBR 2009 Cummins & Bridge, ICCBR 2009 (Little impact on ML in general!)

Dataset Complexity Measures Survey of 12 geometrical measures Ho & Basu, 2002 DCoL: open source C++ library of 13 measures Orriols-Puig et al., 2009 We have found 4 candidate measures in the CBR literature

Overlap of attribute values F1F1 Maximum Fisher’s Discriminant Ratio F2'F2'Volume of Overlap Region F3'F3'Maximum Attribute Efficiency F4'F4'Collective Attribute Efficiency

Separability of classes N1'N1'Fraction of Instances on a Boundary N2N2 Ratio of Average Intra/Inter Class Distance N3N3 Error Rate of a 1NN classifier L1L1 Minimized Sum of Error Distance of a Linear Classifier L2L2 Training Error of a Linear Classifier C1C1 Complexity Profile C2C2 Similarity-Weighted Complexity Profile N5N5 Separability Emphasis Measure

Manifold Topology & Density L3L3 Nonlinearity of a Linear Classifier N4N4 Nonlinearity of a 1NN Classifier T1T1 Fraction of Maximum Covering Spheres T2T2 Number of Instances per Attribute T3T3 Dataset Competence

Dataset Complexity Measures Desiderata Predictive Independent of what is being analyzed Widely applicable across datasets Cheap-to-compute Incremental Transparent/explainable

Overview Dataset complexity measures  Classification experiment  Case base maintenance experiment  Going forward

Classification experiment 25 datasets 14 Boolean classification; 11 multi-class 21 numeric-valued attributes only (12 Boolean classification; 9 multi-class) 4 Weka classifiers trained on 60% of dataset Neural Net with 1 hidden layer SVM with SMO J48 IBk with k = 3 Error measured on 20% of dataset Repeated 10 times

An example of the results DatasetNNSVMJ48IBkMeanN1'N1' Iris Lung Cancer

An example of the results Correlation coefficient: 0.96

N 1 ' Fraction of instances on a boundary Build a minimum spanning tree Compute fraction of instances directly connected to instances of a different class Shuffle dataset, repeat, & average

Other competitive measures N 3 Error Rate of a 1NN Classifier leave-one-out error rate of 1NN on the dataset N 2 Ratio of Average Intra/Inter Class Distance sum distances to nearest neighbour of same class divide by sum of distances to nearest neighbour of different class L 2 Training Error of a Linear Classifier build, e.g., SVM on dataset compute error on original dataset problems with multi-class; problems with symbolic values

C 1 Complexity Profile Computed for each instance, with parameter K [Massie et al. 2006] For a dataset measure, compute average complexity For K = 3

Other measures from CBR C 2 Similarity-Weighted Complexity Profile use similarity values when computing P k N 5 Separability Emphasis Measure [Fornells et al. ’09] N 5 = N 1 '  ×  N 2 T 3 Dataset Competence [Smyth & McKenna ’98] competence groups based on overlapping coverage sets group coverage based on size and similarity dataset competence as sum of group coverages

Their predictivity C 1 Complexity Profile Correlation coefficient: 0.98 C 2 Similarity-Weighted Complexity Profile Correlation coefficient: 0.97 N 5 Separability Emphasis Measure Between N 1 '  and  N 2 T 3 Dataset Competence Correlation coefficient: near zero

Summary of experiment Very predictive C 1 Complexity Profile N 3 Error Rate of 1NN Classifier N 1 ' Fraction of Instances on a Boundary Predictive but problems with applicability L 2 Training Error of a Linear Classifier Moderately predictive N 2 Ratio of Average Intra/Inter Class Distance All are measures of separability of classes

Overview Dataset complexity measures Classification experiment  Case base maintenance experiment  Going forward

Meta-CBR for Maintenance Case base maintenance algorithms seek to: delete noisy cases delete redundant cases Different case bases require different maintenance algorithms The same case base may require different maintenance algorithms at different times in its life cycle We have been building classifiers to select maintenance algorithms

F 1 : 0.2 F 2 : 0.4 … CB RENN Meta-Case Base F 1 : 0.9 F 2 : 0.4 … RENN F 1 : 0.3 F 2 : 0.4 … RENN F 1 : 0.7 F 2 : 0.4 … RENN F 1 : 0.9 F 2 : 0.4 … RENN F 1 : 0.0 F 2 : 0.4 … RENN F 1 : 0.3 F 2 : 0.4 … RENN F 1 : 0.2 F 2 : 0.3 … CRR kNN Classifier

Case Base Maintenance Experiment Training (building the meta-case base) From 60% of each dataset, create a case base Create a meta-case to describe this case base attributes are complexity measures problem solution run a small set of maintenance algorithms on each case base record % deleted record accuracy on the next 20% of each dataset maintenance algorithm with highest harmonic mean of % deleted and accuracy becomes this meta-case’s solution But, we use feature selection to choose a subset of the complexity measures wrapper method, best-first search

Case Base Maintenance Experiment Testing Target problem is a case base built from remaining 20% of each dataset attributes again are complexity measures Ask the classifier to predict a maintenance algorithm Run the algorithm, record % deleted, accuracy and their harmonic mean Compare meta-CBR with perfect classifier and ones that choose same algorithm each time

Example results Classifier Cases deleted (%) Accuracy (%) Harmonic mean Choose-best Meta-CBR Choose ICF Choose CBE

Which measures get selected?

F 2 ' Volume of Overlap Region For a given attribute, a measure of how many values for that attribute appear in instances labelled with different classes a a

Quick computation of F 2 a a min  max  min  max  max  min  min  max 

A problem for F 2 a max  min  min  max 

F 2 ' Our version o'(a) = count how many values are in the overlap r'(a) = count the number of values of a

Summary of experiment Feature selection chose between 2 and 18 attributes, average 9.2 chose range of measures, across Ho & Basu’s categories always at least one measure of overlap of attribute values, e.g. F 2 ' but measures of class separability only about 50% of the time But this is just one experiment

Overview Dataset complexity measures Classification experiment Case base maintenance experiment  Going forward

Going forward Use of complexity measures in CBR (and ML) More research into complexity measures: experiments with more datasets, different datasets, more classifiers,… new measures, e.g. Information Gain applicability of measures missing values loss functions dimensionality reduction, e.g. PCA the CBR similarity assumption and measures of case alignment [Lamontagne 2006, Hüllermeier 2007, Raghunandan et al. 2008]