1 An Excel-based Data Mining Tool Chapter 4
2 4.1 The iData Analyzer
3
4
5 4.2 ESX: A Multipurpose Tool for Data Mining
6 ESX Supports supervised learning and unsupervised clustering Does not make statistical assumptions Deal with missing attribute values Applied to categorical and numerical data Point out inconsistencies and unusual values
7 For supervised classification, ESX can determine those instances and attributes best able to classify new instances For unsupervised clustering, ESX incorporates a globally optimizing evaluation function that encourages a best instance clustering
8
9 4.3 iDAV Format for Data Mining
10
11
A Five-step Approach for Unsupervised Clustering Step 1: Enter the Data to be Mined Step 2: Perform a Data Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Individual Class Results Step 5: Visualize Individual Class Rules
13 Step 1: Enter The Data To Be Mined
14
15 Step 2: Perform A Data Mining Session
16
17
18 Step 3: Read and Interpret Summary Results Class Resemblance Scores Domain Resemblance Score –Attributes, instances, no model Domain Predictability
19
20
21 Step 4: Read and Interpret Individual Class Results Class Predictability is a within- class measure. Class Predictiveness is a between-class measure.
22
23
24 Step 5: Visualize Individual Class Rules
25
A Six-Step Approach for Supervised Learning Step 1: Choose an Output Attribute Step 2: Perform the Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Test Set Results Step 5: Read and Interpret Class Results Step 6: Visualize and Interpret Class Rules
27 Read and Interpret Test Set Results
Techniques for Generating Rules 1. Choose an attribute 2. use the attribute to subdivide instances into classes 3. –if the instances in the subclass satisfy a predefined criteria, generate a defining rule –If not, repeat 1
Techniques for Generating Rules 1.Define the scope of the rules. 2.Choose the instances. 3.Set the minimum rule correctness. 4.Define the minimum rule coverage. 5.Choose an attribute significance value.
30
Instance Typicality
32 Typicality Scores Identify prototypical and outlier instances. Select a best set of training instances. Used to compute individual instance classification confidence scores.
33
Special Considerations and Features Avoid Mining Delays The Quick Mine Feature Erroneous and Missing Data