Download presentation
Presentation is loading. Please wait.
Published byKarin Haynes Modified over 9 years ago
1
DM.Lab in University of Seoul Data Mining Laboratory April 24 th, 2008 Summarized by Sungjick Lee An Excel-Based Data Mining Tool iData Analyzer
2
Data Mining Laboratory 2 DM.Lab in University of Seoul Contents The iData Analyzer ESX:A Multipurpose Tool for Data Mining iDAV Foramt for Data Mining A Approach for Unsupervised Clustering A Approach for Supervised Learning
3
Data Mining Laboratory 3 DM.Lab in University of Seoul The iData Analyzer Scanning for errors illegal numeric values balnk lines missing items allows users to extract a representative subset of the data exemplar-based data mining tool builds a concept hierarchy to generalize data A backpropagation neural network for supervised learning A self-organizing feature map for unsupervised clustering
4
Data Mining Laboratory 4 DM.Lab in University of Seoul ESX:A Multipurpose Tool for Data Mining(1/2) Both supervised learning and unsupervised clustering No statistical assumptions about the nature for data An automated method for dealing with missing attrib ute values In domains containg both categorical and numberical data For supervised classification, Determination of those instances and attributes best able to classify new instances of unknown origin For unsupervised clustering, a globally optimizing evaluation function that encourages a best instance clustering
5
Data Mining Laboratory 5 DM.Lab in University of Seoul ESX:A Multipurpose Tool for Data Mining(2/2) define the concept classes summary statistics about the attribute values found within instance-level summary information about the domain Report Generator summary report in spreadsheet format Class resemblance scores
6
Data Mining Laboratory 6 DM.Lab in University of Seoul iDAV Format for Data Mining C : categorical (nomical) R : real-valued (numerical) I : input attribute U : not used D : not used for classification or clustering, but attribute avlue summary information is displayed O : used as an ouput attribute
7
Data Mining Laboratory 7 DM.Lab in University of Seoul A Approach for Unsupervised Clustering 1.Enter data into a new Excep Spreadsheet 2.Perform a data mining session 3.Read and interpret summary results 4.Read and interpret results for individual clusters 5.Visualize and interpret rules defining the individual clusters
8
Data Mining Laboratory 8 DM.Lab in University of Seoul A approach for unsupervised clustering Enter data into a new Excel Spreadsheet CreditCardPromotion.xls
9
Data Mining Laboratory 9 DM.Lab in University of Seoul A approach for unsupervised clustering Perform a data mining session(1/2) A value closer to 100 : encourages the formation of new clusters A value closer to 0 : discourages the formation of new clusters The similarity criteria for real-valued attribute 1.0 is usually appropriate 8 classes are too many!! Change Instance similarity value and try again.
10
Data Mining Laboratory 10 DM.Lab in University of Seoul A approach for unsupervised clustering Perform a data mining session(2/2) Attribute Significance {The largest class mean(class 1 age = 43.33) - The smallest class mean(Class 2 age = 37.00) } / the domain standar deviation
11
Data Mining Laboratory 11 DM.Lab in University of Seoul A approach for unsupervised clustering Result– RES RUL(The generated production rules) Rules for Class 1Rules for Class 2Rules for Class 3 **Total Percent Coverage = 0.00% Income Range = "20-30,000" :rule accuracy 100.00% :rule coverage 80.00% 19.00 <= Age <= 29.00 :rule accuracy 100.00% :rule coverage 60.00% 19.00 <= Age <= 29.00 and Income Range = "20-30,000" :rule accuracy 100.00% :rule coverage 60.00% 19.00 <= Age <= 29.00 and Magazine Promo = No :rule accuracy 100.00% :rule coverage 60.00% ( 중간 생략 ) **Total Percent Coverage = 80.00% Income Range = "30-40,000" :rule accuracy 80.00% :rule coverage 57.14% Magazine Promo = Yes :rule accuracy 75.00% :rule coverage 85.71% Life Ins Promo = Yes :rule accuracy 77.78% :rule coverage 100.00% 35.00 <= Age <= 43.00 :rule accuracy 77.78% :rule coverage 100.00% ( 중간 생략 ) **Total Percent Coverage = 100.00%
12
Data Mining Laboratory 12 DM.Lab in University of Seoul A approach for unsupervised clustering Result– RES SUM(summary statistics) (1/2) Resemblance Score Within-class resemblance scores are higher than the domain resemblance value? If not, why? Bad choice of attributes Bad choice of instances The domain does not contain definable classes Attribute Significance {The largest class mean(class 1 age = 43.33) - The smallest class mean(Class 2 age = 37.00) } / the domain standar deviation (9.51)
13
Data Mining Laboratory 13 DM.Lab in University of Seoul A approach for unsupervised clustering Result–RES CLS(statistics about the individual class) (1/2) Typicality the average similarity of an instance to all other members of its cluster Predictiveness the state of being predicted the probability an instance reside in the Class between-class measures If ‘1’, the value is sufficient Predictability degree that a correct forecast the percent of instances within a class within-class measur es If ‘1’, the value is necessary
14
Data Mining Laboratory 14 DM.Lab in University of Seoul A approach for unsupervised clustering Result–RES CLS(statistics about the individual class) (1/2) Highly greater than or equal to 0.80
15
Data Mining Laboratory 15 DM.Lab in University of Seoul A Approach for Supervised Clustering 1.Enter data into a new Excep Spreadsheet and Choose output attribute 2.Perform a data mining session 3.Read and interpret summary results 4.Read and interpret test set results 5.Read and interpret results for individual clusters 6.Visualize and interpret class rules
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.