Download presentation
Presentation is loading. Please wait.
Published byArron Davidson Modified over 9 years ago
1
The CRISP Data Mining Process
2
August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation Deployment Data
3
August 28, 2004Data Mining3 Business Understanding Project objectives Project requirements DM Problem Formulation Preliminary Plan
4
August 28, 2004Data Mining4 Case Study Data mining project done for a large insurance company Consider the use of data mining to improve understanding of customer databases Led by the data warehousing team, which wanted to also improve their expertise
5
August 28, 2004Data Mining5 Business Objectives Understand what coverage packages are of interest to a customer group Targeting of new customers Cross-selling opportunities to existing customers Understand why a customer group terminates coverage Know in advance what groups are likely to terminate Understand what factors influence termination
6
August 28, 2004Data Mining6 What are the Goals? The business goals Improve customer retention Increase cross-selling Success criteria Customer turnover rate Amount of cross-selling
7
August 28, 2004Data Mining7 Data Mining Problems Classify new and existing customers as either interested or not interested in a particular coverage Classify existing customers as either likely or unlikely to terminate coverage
8
August 28, 2004Data Mining8 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data
9
August 28, 2004Data Mining9 Data Evaluation Initial data collections Data quality Initial insights Interesting subsets Data warehousing team
10
August 28, 2004Data Mining10 Case Study: Data Evaluation Data was extracted from select customer databases by company personnel Coverage programs with few customers selected for pilot project Five separate files extracted for five coverage programs
11
August 28, 2004Data Mining11 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data
12
August 28, 2004Data Mining12 Data Preparation Raw Data Finished Data Set Technical tasks: Data selection Attribute selection Data cleaning
13
August 28, 2004Data Mining13 Case Study: Data Preparation Some initial formatting of data in MS Excel Cleaning of data file Combine headers/instances Add a new attribute: interest (yes/no) Must create the no interest cases End up with a CSV formatted file
14
August 28, 2004Data Mining14 Weka Data Mining Software Data in CSV format loaded into Weka: Data preprocessing Attribute selection Modeling Classification Clustering Association rule mining Visualization
15
August 28, 2004Data Mining15 Data Preprocessing in Weka Initial data inspection Missing values Useless attributes Numeric attributes as nominal Some helpful Weka filters RemoveUseless ReplaceMissingValues
16
August 28, 2004Data Mining16 Data Preprocessing in Weka Data reduction: Instance dimension RemovePercentage, and Resample filters Attribute dimension Remove redundant attributes Remove irrelevant attributes Identify most important attributes
17
August 28, 2004Data Mining17 Attribute Selection Methods Three main methods used: InfoGain ChiSquared Relief Combined results from complimentary methods Final pruning of attribute list to twenty attributes
18
August 28, 2004Data Mining18 Selected Attributes Location Tax State Contract State State Code Zip Code
19
August 28, 2004Data Mining19 Selected Attributes Size Case Size Range Industry Industry Classification Industry Classification Name SIC Code
20
August 28, 2004Data Mining20 Selected Attributes Timing New Sale Flag Decision Maker Effective Month Decision Maker Effective Year Next Renewal Month Next Renewal Year
21
August 28, 2004Data Mining21 Selected Attributes Internal Agency Number Office Name Pricing Category Code Product Line Name Small Group Flag
22
August 28, 2004Data Mining22 Relevance of Attribute Selection Improved modeling Faster model induction Higher accuracy Easier to interpret models Structural knowledge gained from the selection of attributes
23
August 28, 2004Data Mining23 Most Important Attributes What attributes effect the purchasing decision of a customer group? E.g., the five most important factor that determine if a customer group purchases a particular insurance coverage Agency Number Small Group Flag Zip Code Decision Maker Effective Year Next Renewal Month
24
August 28, 2004Data Mining24 Customer Segmentation Unique groups of customers Similar characteristics Similar behavior in terms of interest in coverage For example, separate predictive models for customer segments for a particular type of insurance
25
August 28, 2004Data Mining25 Customer Segments Used for Modeling Results Three segments for one database Two segments for two databases One segment for two databases Continue modeling for each segment independently
26
August 28, 2004Data Mining26 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data
27
August 28, 2004Data Mining27 Modeling Select modeling technique(s) Calibrate modeling techniques Make adjustments to data
28
August 28, 2004Data Mining28 Modeling Mathematical models for predicting if a customer is interested in a coverage Understand why a customer is interested For example: If a customer ’ s state is Indiana and the office is Indianapolis_Office1 then the customer is interested in Coverage_3
29
August 28, 2004Data Mining29 Modeling Techniques Three modeling techniques tried for predicting customer interest: Decision trees Artificial neural networks (ANN) Support vector machines (SVM) Decision trees have the advantage of transparency ANN and SVM did not have significantly better prediction accuracy
30
August 28, 2004Data Mining30 Insurance Coverage Interest (Type 6) Small Group Flag Y Product Line Name No N Group_2 Yes Group_1
31
August 28, 2004Data Mining31 Insurance Coverage Interest (Type 7) Pricing Category Code Industry Classification Name A4 Agency Number YesNo <= 430> 430 Next Renewal Year NoYes <= 2000 > 2000 Legal_Services Transportation_and Public_Utilities Next Renewal Year YesNo Group_1 Group_2 A2 YesNo <= 2002 > 2002 Others Branches omitted
32
August 28, 2004Data Mining32 Accuracy of Predicting Customer Interest CoverageAccuracy Type 184.0% Type 297.2% Type 398.3% Type 499.5% Type 588.4% Type 6100% Type 776.3% Type 885.0% Type 994.8%
33
August 28, 2004Data Mining33 Modeling Mathematical models for predicting if a customer will terminate coverage Why do customers terminate a specific type of coverage? What are the important factors in a customers decision to terminate coverage?
34
August 28, 2004Data Mining34 Who Terminates Type 3 Coverage? Customer Effective Year Terminated Next Renewal Month Coverage Effective Year Coverage Effective Year Active Terminated Active Correct for 95% of customers
35
August 28, 2004Data Mining35 Who Terminates Type 1 Coverage? Decision tree based on: Distribution number Underwriting department number Price category Rate type Rate Plan Year Predicts 96.3% of terminations correctly
36
August 28, 2004Data Mining36 Accuracy of Predicting Termination ModelAccuracy Type 196.3% Type 296.5% Type 395.3% Type 488.9% Type 588.3%
37
August 28, 2004Data Mining37 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data
38
August 28, 2004Data Mining38 Evaluation Data analysis results in a good model Are business objectives being achieved? Is there an important business issue that has not been considered? Should the results be used?
39
August 28, 2004Data Mining39 The Data Mining Process Business objectives Data evaluation Data preparation Modeling Evaluation Deployment Data
40
August 28, 2004Data Mining40 Deployment Incorporate the results in the organization ’ s decision making process Report Decision support system Personalization of web pages Repeatable data mining process
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.