A Comparison of Capabilities of Data Mining Tools Jong-Hee Lee, Yong-Seok Choi Dept. of Statistics Pusan National University
1.Introduction The exploration and analysis of large quantities of data in order to discover meaningful patterns and rules (Berry and Linoff, 1997, p. 5) Data Mining(DM)
Data Mining Tools for DM There are many commercial DM tools (http://kdnuggets.com). DM tools vendors increase. DM tools are updated rapidly. Some comparisons : Abbott et al.(1998) , Elder and Abbott(1998): out of date ! Some companies : not objective ! Rapidly-updated tools are needed objective comparisons..
Tools Selected for our comparison Product Company Version Tested Clementine SPSS 5.2 Enterprise Miner SAS Institute 3.0 Intelligent Miner IBM 6.1
Comparisons of DM tools 2. Constitution of Main Windows and Nodes 3. Techniques 3.1 Market Basket Analysis 3.2 Decision Tree
2. Constitution of Main Windows and Nodes Comparative Criteria Visualization using Streams Main Windows Nodes In order of using According to purpose of modeling
Visualization using Streams Main Window of Clementine In order of using Visualization using Streams
Visualization using Streams Main Window of Enterprise Miner In order of using Visualization using Streams
Main Window of Intelligent Miner According to purpose of modeling In order of using
Conclusion of comparison for Constitution of Main Windows and Nodes (excellent) Clementine (poor) Intelligent Miner Main Windows (excellent) Intelligent Miner Nodes
3. Techniques 3.1 Market Basket Analysis 3.2 Decision Tree For the other techniques , see J.H.Lee(2000).
3.1 Market Basket Analysis (MBA) gives insight into the merchandise by telling us which products tend to be purchased together (Berry and Linoff, 1997, p.124)
Comparative Criteria for MBA Market Basket Analysis Algorithm option Result
Algorithm of MBA Mining Tool Clementine Algorithm Intelligent Miner Enterprise ○ Association Rule × Sequential Pattern ○
Market Basket Analysis Mining Tool Clementine Option Intelligent Miner Enterprise × Minimum of Confidence ○ Minimum of Support ○ Minimum of Coverage × ○ Maximum of Items × Item Constraints ○
Market Basket Analysis Mining Tool Clementine Result Intelligent Miner Enterprise ○ Confidence × Support Lift ○ Coverage × × Textual Display ○ ○ Visualization ×
Data Format in MBA ID 132 A 132 428 ID A B C D ... 132 B Y Y 428 C Y Y < Horizontal Format > 132 A 132 428 ID A B C D ... 132 B Y Y 428 C Y Y Y 428 < Vertical Format > D 428 B Upper tables cited at the xore web site(http://www. Exclusiveore.com/index.html)
Conclusion of comparison for MBA (poor) Clementine Algorithm (excellent) Intelligent Miner (poor) Clementine Option (excellent) Intelligent Miner (poor) Clementine Result
3.2 Decision Tree Decision Tree is for classification and prediction
Comparative Criteria for Decision Tree Algorithm option Result
Decision Tree Mining Tool Clementine Algorithm Intelligent Miner Enterprise × CHAID ○ CART C4.5 or C5.0 ○ ID3 × × SPRINT ○
Misclassification Costs Decision Tree Mining Tool Clementine Option Intelligent Miner Enterprise × Misclassification Costs ○ Priors ○ Pruning Severity ○ Stopping Rule ○ Missing Value
Decision Tree Mining Tool Clementine Result Intelligent Miner Enterprise × Tree View ○ × Confusion Matrix ○
Conclusion of comparison for Decision Tree (excellent) Enterprise Miner Algorithm (excellent) Enterprise Miner (poor) Clementine Option (poor) Clementine Result
Constitution of Main Windows 4.Concluding Remarks Constitution of Main Windows (excellent) Clementine (poor) Intelligent Miner Constitution of Nodes (excellent) Intelligent Miner
Market Basket Analysis (excellent) Intelligent Miner (poor) Clementine Decision Tree (excellent) Enterprise Miner (poor) Clementine
Enterprise and University A Comparison of Capabilities of DM tools in this study potential Purchasers: Enterprise and University DM companies