Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Chapter 12 Discovering New Knowledge – Data Mining
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Chapter Objectives Introduce the student to the concept of Data Mining. How it is different from knowledge elicitation from experts How it is different from extracting existing knowledge from databases. The objectives of data mining Explanation of past events (descriptive DM) Prediction of future events (predictive DM) (continued)
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Chapter Objectives (cont.) Introduce the student to the different classes of methods available for DM Symbolic (induction) Connectionist (neural networks) Statistical Introduce the student to the details of some of the methods described in the chapter.
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Section Objectives Introduction of chapter contents
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Section Objectives Defines the concept of data mining and the reasons for performing DM studies Defines the objectives of data mining Descriptive DM Predictive DM Introduces the three basic approaches to DM Symbolic (induction) Connectionist (neural networks) Statistical (curve-fitting, others)
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Section Objectives Present a detailed description of the symbolic approach to data mining - rule induction Present the main algorithm for rule induction - C5.0 and its ancestors, ID3 and CLS Present several example applications of rule induction Present other alternate algorithms for rule induction CART CHAID
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Section Objectives Provide a detailed description of the connectionist approach to data mining - neural networks Present the basic neural network architecture - the multi-layer feed forward neural network Present the main supervised learning algorithm - backpropagation Present the main unsupervised neural network architecture - the Kohonen network
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Section Objectives Provide a detailed description of the most important statistical methods for data mining Curve fitting with least squares method Multi-variate correlation K-Means clustering Market Basket analysis Discriminant analysis Logistic regression
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Section Objectives Provide useful guidelines for determining what technique to use for specific problems
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Section Objectives Discuss the importance of errors in data mining studies Define the types of errors possible in data mining studies
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Section Objectives Summarize the chapter Provide Key terms Provide Review Questions Provide Review Exercises
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.1 Is the stock’s price/earning s ratio > 5? Has the company’s quarterly profit increased over the last year by 10% or more? Don’ t buy No Yes Root node Is the company’s management stable? Leaf node Yes Don’ t buy No Buy Don’ t Buy Yes No
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.2 DS3 - Not Enjoyable DS2 - Not Enjoyable = Sunny = Cloudy = Rainy Rain Outlook {DS1, DS2, DS3, DS4} DS1 – Enjoyable DS4 - Not Enjoyable
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.3 None DS3 - Not Enjoyable DS2 - Not Enjoyable = Sunny = Cloudy= Rain Rain Outlook Temperature DS1 – Enjoyable DS4 – Not Enjoyable = Cold = Hot = Mild DS1 - Enjoyable DS4 – Not Enjoyable {DS1, DS2, DS3, DS4}
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.4 MilliExpert ThoughtGen OffSite Language Genie XS SilverWorks XS Lis p C++ Java
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.5 Language MilliExpert ThoughtGen OffSite Backward Forward Backward C++ Java MilliExpert ThoughtGen OffSite Genie XS SilverWorks XS Lis p OffSiteXSGinie XS SilverWorks Forward Backwards
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.6 MilliExpert ThoughtGen OffSite Backward Forward Backward C++ Java MilliExpert ThoughtGen OffSite Language Genie XS SilverWorks XS Lis p OffSite XSGenie XSSilverWorks Forward Backward MilliExpert ThoughtGen OffSite SpreadsheetXL ASCII Devices dBase
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.7 = Dry= Humid DS2 – Not Enjoyable DS3 – Not Enjoyable DS4 – Not Enjoyable {DS1, DS2, DS3, DS4} Humidity DS1 - Enjoyable
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.8 x2x2 x1x1 xkxk Activation function f() Inputs y W1W1 W2W2 WnWn
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.9 Threshold function Piece-wise Linear function Sigmoid function 1.0
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure Inputs Outputs
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure 12.11
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure Variable A Variable B Cluster #1 Cluster #2
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure Inputs WiWi
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure x y
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Figure x y Best fitting equation x
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Table 12.1
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Table 12.2
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Table 12.3
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Table 12.3 (cont.)
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Table 12.4
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Table 12.5
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Table 12.5 (cont.)
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Table 12.6
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Table 12.7
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Conclusions The student should be able to use: The C5.0 algorithm to capture rules from examples. Basic feedforward neural networks with supervised learning. Unsupervised learning, clustering techniques and the Kohonen networks. Curve-fitting algorithms. Statistical methods for clustering. Other statistical techniques.
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Chapter 12 Discovering New Knowledge – Data Mining