Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Unifying View on Instance Selection

Similar presentations


Presentation on theme: "A Unifying View on Instance Selection"— Presentation transcript:

1 A Unifying View on Instance Selection
Thomas Reinartz DaimlerChrysler AG, Research and Technology, Germany

2 Outlines Introduction Focusing Tasks
Evaluation Criteria for Instance Selection Unifying Framework for Instance Selection Evaluation Conclusions

3 Introduction CRISP-DM (Cross-Industry Standard Process of Data Mining)
Business Understanding Data Understanding Data Preparation: data selection, cleaning, construction, integration, formatting Modeling Evaluation Deployment Data Selection: data shrink or data reduction is needed for huge data size, Focusing

4 Focusing Tasks (1) Data as a table (A,T) Focusing Specification
A : Attribute – characterized by name, type, domain of values T : Tuple or Instance – a sequence of attribute values Focusing Specification Focusing Input : table, a component of a table Focusing Output : a subset of Focusing Input Simple subset Constrained subset Constructed subset Focusing Criterion : the relation between input and output

5 Focusing Tasks (2) Focusing Context
Data Mining Goal: classification, prediction, description, concept description, summarization, dependency analysis Data Characteristics: simple statistics, information quality Data Mining Algorithm Instance Selection: a particular focusing task where input is a set of cases, output is a subset of input

6 Evaluation Criteria for Instance Selection
Different Evaluation Strategies Filter and Wrapper Evaluation Filter Approach: only considers data reduction w.r.t. the mean, variance, distribution, joint distribution Wrapper Approach: evaluate with specific data mining aspect e.g. execution time, storage requirements, accuracy, complexity Isolated and Comparative Evaluation (for Solutions) Separated and Combined Evaluation (for Criteria)

7 Unifying Framework for Instance Selection
InputSamplingClusteringPrototypingOutput Evaluations Sampling : simple random sampling, systematic sampling, stratified sampling Clustering Prototyping Order is not important

8 Evaluation Generic Sampling (GENSAM) – implement unifying framework by additional preparation steps Sorting: select by ordering the values of cases for the important attribute Stratification: separate cases by the attribute intervals (continuous) or attribute values (discrete) in the order of attribute relevance Intelligent Sampling: random sampling, stratified sampling, systematic sampling, leader sampling, similarity-driven sampling by combining the above methods Experimental Setting Goal: Classification, Algorithm: C4.5, Instance-based learning (NN classifier) Instance selection methods: simple random sampling (S), simple random sampling with stratification (RS), systematic sampling (S), systematic sampling with sorting (SS), leader sampling (L), leader sampling with sorting and stratification (LS) Data: training set (80%), test set (20%)

9

10 Conclusions Evaluation Criteria More Intelligent Focusing Solutions
Analytical, experimental studies for different instant selection techniques


Download ppt "A Unifying View on Instance Selection"

Similar presentations


Ads by Google