Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.

Similar presentations


Presentation on theme: "Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique."— Presentation transcript:

1 Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique

2 Knowledge Discovery in Databases Chapter 5

3 5.1 A KDD Process Model 1. Goal identification 2. Creating an initial target data set 3.Data preprocessing 4.Data transformation 5.Data mining 6.Interpretation and evaluation 7.Taking action

4 Figure 5.1 A seven-step KDD process model

5 Figure 5.2 Applyiing the scientific method to data mining

6 Step 1: Goal Identification Define the Problem. Choose a Data Mining Tool. Estimate Project Cost. Estimate Project Completion Time. Address Legal Issues. Develop a Maintenance Plan.

7 Step 2: Creating a Target Dataset (1) Flat file or Spread Sheet format (2) Relational Database Collection of tables (rows and columns) RDB  Reduce data redundancy (Decomposition) DM  Uncover the inherent redundancy in data (Join is required) (3) (1) + (2)  Data transformation is required. (4) Data Warehouse : Historical database designed specifically for decision support.

8 Figure 5.3 The Acme credit card database

9 Step 3: Data Preprocessing Noisy Data Locate Duplicate Records. Locate Incorrect Attribute Values. Outliers Missing Data Discard records with missing values. Replace missing real-valued items with the class mean. Replace missing values with values found within highly similar instances.

10 Step 4: Data Transformation (1) Data Normalization (2) Data Type Conversion (3) Attribute and Instance Selection

11 (1) Data Normalization Decimal Scaling Min-Max Normalization Normalization using Z-scores Logarithmic Normalization (2) Data Conversion Categorical  Numeric equivalent

12 (3) Attribute and Instance Selection a. Eliminating Attributes 1. Highly correlated 2. High domain predictability 3. Low attribute significance score

13 b.Creating Attributes - Combining attributes e.g. P/E ratio - Differences between the attributes - Percent increase or decrease c.Instance Selection - Use Instance Typicality Supervised : Use Highly and moderately typical training instance Unsupervised : Eliminate most atypical instance  well defined cluster

14 Step 5: Data Mining 1. Choose training and test data. 2. Designate a set of input attributes. 3. If learning is supervised, choose one or more output attributes. 4. Select learning parameter values. 5. Invoke the data mining tool.

15 Step 6: Interpretation and Evaluation Statistical analysis. Heuristic analysis. Experimental analysis. Human analysis.

16 Step 7: Taking Action Create a report. Relocate retail items. Mail promotional information. Detect fraud. Fund new research.

17 5.9 The Crisp-DM Process Model (Cross Industry Standard Process for Data Mining) 1.Business understanding 2.Data understanding 3.Data preparation 4.Modeling 5.Evaluation 6.Deployment

18 5.10 Experimenting with ESX A Four-Step Model for Knowledge Discovery 1.Identify the goal. 2.Prepare the data. 3.Apply data mining. 4.Interpret and evaluate the results.

19 Experiment 1: Attribute Evaluation *Applying the Four-Step Process Model to the Credit Screening Dataset*  Use unsupervised clustering to see how well the set of input attributes are able to define the classes - Domain Summary: eight, eleven/ nine, ten, twelve - Class Summary: /nine, ten Class 1 : Accept 84% Class 2 : Reject 90%  Repeat DM process to create a best data model

20

21 Experiment 2: Parameter Evaluation *Applying the Four-Step Process Model to the Satellite Image Dataset*

22 Figure 5.4 Satellite image data


Download ppt "Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique."

Similar presentations


Ads by Google