Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation
Last Week OLTP, data warehouse repository and data mart structures (flat and relational files) Data integrity and normalization DB interrogation (SQL) for: OLAP and Reporting Migration into data mining suites
Time/ Cost Cumulated Productivity
Learning by association or problem solving
This Week CRISP ( Cross Industry Standard Procedure for Data Mining) Data preparation (import, aggregate and merge) Data transformation (for analytics)
CRISP-DM Phases Source SPSS Inc. 2008
Case Study A large telecom (XYZ PHONE) has discovered that it is losing customers at a much higher rate than in previous years. Reporting through the corporate dashboard (OLAP)has shown churn rates growing by a large margin last year.
Source SPSS Inc Define Business Objectives Strategic objective definition Increase revenues by retaining more customers Related business goal identification Retain high value customers Identify process problems that need to be changed Clear success factor (metric) Decrease customer churn by 1% Cost-benefit analysis Increase revenues by $750,000 Actionable BI objectives XYZ wants to retain more customers by identifying likely churners 2 months prior and putting an action in place to retain them
Source SPSS Inc Timeline Example XYZ’s project: 13 weeks 8 weeks a) business understanding and b) data preparation Involved line of business manager and data expert Included better defining high-value and churner definition 2 weeks data understanding Heavy reliance on data expert and database administrator 2 weeks modeling and evaluation Models developed by data miner and results evaluated by line of business manager 1 week deployment ? Heavy involvement of database administrator Model deployment entailed setting up a data model for monthly scoring of customer base with resulting reports feeding a mail offer
Source PSS Inc Time Allocation Generally accepted industry timeline standards 50 to 70 percent data preparation 20 to 30 percent data understanding 10 to 20 percent modeling, evaluation, and business understanding 5 to 10 percent deployment
Data Import and Transformation
Lab Objectives Extract data from Customer file Transactional file Transform data into information Data preparation Aggregate data from transactional file Merge aggregate data & customer file
Data Import Step by Step Import files from Access or Excel Customer and Transaction files Document variables labels and value labels using the data dictionary Aggregate the transaction file by cust_id with summary data and key variables Merge Customer and aggregated transaction file using cust_id as a common key
Aggregating Transaction File Order _id DateCust_ id Prod_ num Amt / / / / / / / / / / / Cust_ id FreqDate1Date2Amt_ sum /2111/ /3011/ /0511/ /0511/12380
Lab Objectives (Cont) Data transformation Compute customers’ length on file Compute recency of last purchase Compute frequency of purchases Compute amount spent Compute customer status Purpose CLV (Week4) RFM (Week5)
Data Transformation Step by Step Revisit measurement variables (nominal, ord, scale) Define date formats Auto recode nominal string variables Define missing values Calculate length on file or tenure (Date last purchase – Date first purchase) tenure Calculate time since last purchase (Date of current file – Date last purchase) Define customer status (active or lapsed)
Merging Customer and Transaction Summary Files Cust_ id Na- me Add- ress TypeCC 1011JeanNY1Visa 2234JohnOH1MC 2876JanetCA2Visa 3454JaneNY3Amex FreqDate1Date2Amt_ sum 410/2111/ /3011/ /0511/ /0511/12380
Data Transformation Cust _ ids Na- me Add- ress TypeCC 1011Jean1/NY1/Res1/Visa 2234John2/OH1/Res2/MC 2876Janet3/CA2/Bus1/Visa 3454Jane1/NY3/DNK3/Amx FreqDte1Dte2AmtDaysRec- ency 410/2111/ /3011/ /0511/ /0511/
Purpose of this exercise? Prepare data for next two weeks: Lifetime Customer Value RFM Analysis …