Presentation is loading. Please wait.

Presentation is loading. Please wait.

The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.

Similar presentations


Presentation on theme: "The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation."— Presentation transcript:

1 The CRISP Data Mining Process

2 August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation Deployment Data

3 August 28, 2004Data Mining3 Business Understanding Project objectives Project requirements DM Problem Formulation Preliminary Plan

4 August 28, 2004Data Mining4 Case Study Data mining project done for a large insurance company Consider the use of data mining to improve understanding of customer databases Led by the data warehousing team, which wanted to also improve their expertise

5 August 28, 2004Data Mining5 Business Objectives Understand what coverage packages are of interest to a customer group Targeting of new customers Cross-selling opportunities to existing customers Understand why a customer group terminates coverage Know in advance what groups are likely to terminate Understand what factors influence termination

6 August 28, 2004Data Mining6 What are the Goals? The business goals Improve customer retention Increase cross-selling Success criteria Customer turnover rate Amount of cross-selling

7 August 28, 2004Data Mining7 Data Mining Problems Classify new and existing customers as either interested or not interested in a particular coverage Classify existing customers as either likely or unlikely to terminate coverage

8 August 28, 2004Data Mining8 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data

9 August 28, 2004Data Mining9 Data Evaluation Initial data collections Data quality Initial insights Interesting subsets Data warehousing team

10 August 28, 2004Data Mining10 Case Study: Data Evaluation Data was extracted from select customer databases by company personnel Coverage programs with few customers selected for pilot project Five separate files extracted for five coverage programs

11 August 28, 2004Data Mining11 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data

12 August 28, 2004Data Mining12 Data Preparation Raw Data Finished Data Set Technical tasks: Data selection Attribute selection Data cleaning

13 August 28, 2004Data Mining13 Case Study: Data Preparation Some initial formatting of data in MS Excel Cleaning of data file Combine headers/instances Add a new attribute: interest (yes/no) Must create the no interest cases End up with a CSV formatted file

14 August 28, 2004Data Mining14 Weka Data Mining Software Data in CSV format loaded into Weka: Data preprocessing Attribute selection Modeling Classification Clustering Association rule mining Visualization

15 August 28, 2004Data Mining15 Data Preprocessing in Weka Initial data inspection Missing values Useless attributes Numeric attributes as nominal Some helpful Weka filters RemoveUseless ReplaceMissingValues

16 August 28, 2004Data Mining16 Data Preprocessing in Weka Data reduction: Instance dimension RemovePercentage, and Resample filters Attribute dimension Remove redundant attributes Remove irrelevant attributes Identify most important attributes

17 August 28, 2004Data Mining17 Attribute Selection Methods Three main methods used: InfoGain ChiSquared Relief Combined results from complimentary methods Final pruning of attribute list to twenty attributes

18 August 28, 2004Data Mining18 Selected Attributes Location Tax State Contract State State Code Zip Code

19 August 28, 2004Data Mining19 Selected Attributes Size Case Size Range Industry Industry Classification Industry Classification Name SIC Code

20 August 28, 2004Data Mining20 Selected Attributes Timing New Sale Flag Decision Maker Effective Month Decision Maker Effective Year Next Renewal Month Next Renewal Year

21 August 28, 2004Data Mining21 Selected Attributes Internal Agency Number Office Name Pricing Category Code Product Line Name Small Group Flag

22 August 28, 2004Data Mining22 Relevance of Attribute Selection Improved modeling Faster model induction Higher accuracy Easier to interpret models Structural knowledge gained from the selection of attributes

23 August 28, 2004Data Mining23 Most Important Attributes What attributes effect the purchasing decision of a customer group? E.g., the five most important factor that determine if a customer group purchases a particular insurance coverage Agency Number Small Group Flag Zip Code Decision Maker Effective Year Next Renewal Month

24 August 28, 2004Data Mining24 Customer Segmentation Unique groups of customers Similar characteristics Similar behavior in terms of interest in coverage For example, separate predictive models for customer segments for a particular type of insurance

25 August 28, 2004Data Mining25 Customer Segments Used for Modeling Results Three segments for one database Two segments for two databases One segment for two databases Continue modeling for each segment independently

26 August 28, 2004Data Mining26 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data

27 August 28, 2004Data Mining27 Modeling Select modeling technique(s) Calibrate modeling techniques Make adjustments to data

28 August 28, 2004Data Mining28 Modeling Mathematical models for predicting if a customer is interested in a coverage Understand why a customer is interested For example: If a customer ’ s state is Indiana and the office is Indianapolis_Office1 then the customer is interested in Coverage_3

29 August 28, 2004Data Mining29 Modeling Techniques Three modeling techniques tried for predicting customer interest: Decision trees Artificial neural networks (ANN) Support vector machines (SVM) Decision trees have the advantage of transparency ANN and SVM did not have significantly better prediction accuracy

30 August 28, 2004Data Mining30 Insurance Coverage Interest (Type 6) Small Group Flag Y Product Line Name No N Group_2 Yes Group_1

31 August 28, 2004Data Mining31 Insurance Coverage Interest (Type 7) Pricing Category Code Industry Classification Name A4 Agency Number YesNo <= 430> 430 Next Renewal Year NoYes <= 2000 > 2000 Legal_Services Transportation_and Public_Utilities Next Renewal Year YesNo Group_1 Group_2 A2 YesNo <= 2002 > 2002 Others Branches omitted

32 August 28, 2004Data Mining32 Accuracy of Predicting Customer Interest CoverageAccuracy Type 184.0% Type 297.2% Type 398.3% Type 499.5% Type 588.4% Type 6100% Type 776.3% Type 885.0% Type 994.8%

33 August 28, 2004Data Mining33 Modeling Mathematical models for predicting if a customer will terminate coverage Why do customers terminate a specific type of coverage? What are the important factors in a customers decision to terminate coverage?

34 August 28, 2004Data Mining34 Who Terminates Type 3 Coverage? Customer Effective Year Terminated Next Renewal Month Coverage Effective Year Coverage Effective Year Active Terminated Active Correct for 95% of customers

35 August 28, 2004Data Mining35 Who Terminates Type 1 Coverage? Decision tree based on: Distribution number Underwriting department number Price category Rate type Rate Plan Year Predicts 96.3% of terminations correctly

36 August 28, 2004Data Mining36 Accuracy of Predicting Termination ModelAccuracy Type 196.3% Type 296.5% Type 395.3% Type 488.9% Type 588.3%

37 August 28, 2004Data Mining37 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data

38 August 28, 2004Data Mining38 Evaluation Data analysis results in a good model Are business objectives being achieved? Is there an important business issue that has not been considered? Should the results be used?

39 August 28, 2004Data Mining39 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data

40 August 28, 2004Data Mining40 Deployment Incorporate the results in the organization ’ s decision making process Report Decision support system Personalization of web pages Repeatable data mining process


Download ppt "The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation."

Similar presentations


Ads by Google