Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining: A Closer Look Chapter 2. 2.1 Data Mining Strategies.

Similar presentations


Presentation on theme: "Data Mining: A Closer Look Chapter 2. 2.1 Data Mining Strategies."— Presentation transcript:

1 Data Mining: A Closer Look Chapter 2

2 2.1 Data Mining Strategies

3 Figure 2.1 A hierarchy of data mining strategies

4 Classification Learning is supervised. The dependent variable is categorical. Well-defined classes. Current rather than future behavior.

5 Estimation Learning is supervised. The dependent variable is numeric. Well-defined classes. Current rather than future behavior.

6 Prediction The emphasis is on predicting future rather than current outcomes. The output attribute may be categorical or numeric.

7 The Cardiology Patient Dataset

8

9

10 A Healthy Class Rule for the Cardiology Patient Dataset IF 169 <= Maximum Heart Rate <=202 THEN Concept Class = Healthy Rule accuracy: 85.07% Rule coverage: 34.55%

11 A Sick Class Rule for the Cardiology Patient Dataset IF Thal = Rev & Chest Pain Type = Asymptomatic THEN Concept Class = Sick Rule accuracy: 91.14% Rule coverage: 52.17%

12 Unsupervised Clustering Determine if concepts can be found in the data. Evaluate the likely performance of a supervised model. Determine a best set of input attributes for supervised learning. Detect Outliers.

13 Market Basket Analysis Find interesting relationships among retail products. Uses association rule algorithms.

14 2.2 Supervised Data Mining Techniques

15 The Credit Card Promotion Database

16

17 A Hypothesis for the Credit Card Promotion Database A combination of one or more of the dataset attributes differentiate Acme Credit Card Company card holders who have taken advantage of the life insurance promotion and those card holders who have chosen not to participate in the promotional offer.

18 A Production Rule for the Credit Card Promotion Database IF Sex = Female & 19 <=Age <= 43 THEN Life Insurance Promotion = Yes Rule Accuracy: 100.00% Rule Coverage: 66.67%

19 Production Rules Rule accuracy is a between-class measure. Rule coverage is a within-class measure.

20 Neural Networks

21 Figure 2.2 A multilayer fully connected neural network

22

23 Statistical Regression Life insurance promotion = 0.5909 (credit card insurance) - 0.5455 (sex) + 0.7727

24 2.3 Association Rules

25 An Association Rule for the Credit Card Promotion Database IF Sex = Female & Age = over40 & Credit Card Insurance = No THEN Life Insurance Promotion = Yes

26 2.4 Clustering Techniques

27 Figure 2.3 An unsupervised cluster of the credit card database

28 2.5 Evaluating Performance

29 Evaluating Supervised Learner Models

30 Confusion Matrix A matrix used to summarize the results of a supervised classification. Entries along the main diagonal are correct classifications. Entries other than those on the main diagonal are classification errors.

31

32 Two-Class Error Analysis

33

34

35 Evaluating Numeric Output Mean absolute error Mean squared error Root mean squared error

36 Comparing Models by Measuring Lift

37 Figure 2.4 Targeted vs. mass mailing

38 Computing Lift

39

40

41 Unsupervised Model Evaluation


Download ppt "Data Mining: A Closer Look Chapter 2. 2.1 Data Mining Strategies."

Similar presentations


Ads by Google