 Mail Order Company in USA › Would like to find out if there is a way › To reduce mailing cost › By analyzing the past data.

Slides:



Advertisements
Similar presentations
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Advertisements

CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Objectives (BPS chapter 24)
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
1 BA 555 Practical Business Analysis Housekeeping Review of Statistics Exploring Data Sampling Distribution of a Statistic Confidence Interval Estimation.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Discovering and Describing Relationships
1 Chapter 16 Linear regression is a procedure that identifies relationship between independent variables and a dependent variable.Linear regression is.
CHAPTER 14, QUANTITATIVE DATA ANALYSIS. Chapter Outline  Quantification of Data  Univariate Analysis  Subgroup Comparisons  Bivariate Analysis  Introduction.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Data Mining Techniques
Inference for regression - Simple linear regression
Chapter 10 Hypothesis Testing
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
1 Immigrant Economic and Social Integration in Canada: Research, Measurement, Data Development By Garnett Picot Director General Analysis Branch Statistics.
Copyright © 2008, The McGraw-Hill Companies, Inc.McGraw-Hill/Irwin Chapter Six Cost-Volume-Profit Relationships.
by B. Zadrozny and C. Elkan
Continuous Probability Distributions Continuous random variable –Values from interval of numbers –Absence of gaps Continuous probability distribution –Distribution.
Welfare Reform and Lone Parents Employment in the UK Paul Gregg and Susan Harkness.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
DATA MINING FINAL REPORT Vipin Saini M 許博淞 M 陳昀志 M
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Outline Introduction Descriptive Data Summarization Data Cleaning Missing value Noise data Data Integration Redundancy Data Transformation.
1 CHAPTER M5 Business Decisions Using Cost Behavior © 2007 Pearson Custom Publishing.
Additional analysis of poverty in Scotland 2013/14 Communities Analytical Services July 2015.
Business Intelligence and Decision Modeling Week 11 Predictive Modeling (2) Logistic Regression.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
DATA MINING By Cecilia Parng CS 157B.
Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.
Chapter Outline Goodness of Fit test Test of Independence.
Customer Relationship Management (CRM) Chapter 4 Customer Portfolio Analysis Learning Objectives Why customer portfolio analysis is necessary for CRM implementation.
Analytical Example Using NHIS Data Files John R. Pleis.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Data Mining and Decision Support
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Nonparametric Statistics
Data Mining What is to be done before we get to Data Mining?
Necessary but not sufficient? Youth responses to localised returns to education Nicholas Biddle Centre for Aboriginal Economic Policy Research, ANU Conference.
 Propensity Model  Propensity Model refers to Statistical Models predicting “willingness” to perform an action like accepting an offer etc. Acquisition.
Essentials of Modern Business Statistics (7e)
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
CHAPTER 11 Inference for Distributions of Categorical Data
Inverse Transformation Scale Experimental Power Graphing
Predicting Government Spending on Professional Services
Week 11 Knowledge Discovery Systems & Data Mining :
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter Fourteen McGraw-Hill/Irwin
Figures adapted from the TIEDI Analytical Report #6: Labour outcomes of immigrants by English and French language skills Report available at:
Diagnostics and Remedial Measures
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Diagnostics and Remedial Measures
Linear Regression and Correlation
Presentation transcript:

 Mail Order Company in USA › Would like to find out if there is a way › To reduce mailing cost › By analyzing the past data

 Business Objectives: › To find out which customers that are good candidate to purchase products › To explore the data to determine company’s valuable customers

 Assess the Situation: › One CSV file from 3 data sources  Census Group A  Census Group B  Tax Filers › Personnel  Six MTech Students  Minimum experience in data mining › Software  MS Excel  Clementine, Data Scope

 Data Mining Goals › Predict which variables affects customer buying decision › Build models and compare the cost against randomly-chosen customers › Suggest a model to achieve >1% mailing response

ActivitiesDays/Resources Data Preparation -Prepare Excel/CSV File n/a Data Understanding -Explore each variable -Perform some normalizations -Derive new useful variables Each team member for 4 days Knowledge Discovery -Generate Decision Tree -Suggest variables as most important Each team member for 2 days Modelling -Build predictive model -Iterate steps to improve results Each team member for 3 days Reporting -Consolidate all results 2 persons for 2 days

 First Insights Discovery › Total record is 2158 › Distribution by Objective › Distribution by Gender

 Data Quality Problems › Some columns are normalized others not › All values are number, harder to visualize › Many data is incomplete › Missing recency, no of transactions and dollars of spending data for individual products

 Describe Data › Gross properties of data  The data is extracted from a larger set with respond rate of ~1%.All 1079 responders and 1079 randomly chosen non- responders › Relationship between attributes  firstmonth and tenure have a linear relationship, Thus tenure can be omited.

 Select Data › Variables chosen  Clean Data › Some normalizations  Construct Data › Chose the variables as input  Data Transformation › Rescaling › Derive new variables

 Reduce redundancy caused by data integration › Replace lowincome and highincome with IncomeGroup. › Replace gender1,gender2 and gender3 with Gender. › Discard V171 Total taxfilers with unemployment benefits › Discard V175, V181,V184, V190,V193,V196. they equal to male data plus female data

 Rescaling › Log() of totalspend and totaltrans to reduce effect of large variables  Derive Data › Derive ActAccInMostRecMon from product recency data(no of active accounts in most recent month) › Derive the ratio of low taxfiler income from V156-V163 › Value=V156/sum(V156:V163) › Convert value to 5 categories.

 Histogram of new variable with Objective overlaid

 Inverse correlation between English and French speaking regions  No region with significant Tagalog, Spanish or other language-speaking populations  Can probably discard amtspanish, amttagalog, amtsingres, amtengnon, amtmultilin  Cluster/segment English/French areas

 Linear relationship for English and French across Census A & B  Can merge amtenglish and bhlenglish  Can merge amtfrench and bhlfrench

 Linear relationship  Merge acflonepar & bfslonepar  Filter out noisy data

 Most data below 0.1  Objective remains constant throughout  Not important to business objective – discard

 Lack of data from other age groups  Very specific targeted marketing to females group  Normalize values from 0 to 0.1 if necessary  Objective improves as proportion increases

 Objective clearly improves when afp1child is on lower end of normal curve

7 regions with acfwchcom = 0.19 and objective = 1

 Most regions have above 60% married couples, assuming normalized data  Acftotmar and acfhuswife mirror one another  Can discard either field  Filter noisy data  Categorical : lone-parent and husband- wife

 As the other cencus and taxfiler data, these data represents the distribution of the region.

 There is a similar trend, the number of construction between the two period is more or less the same number.  The sample population only represents a small number of people of construction in the region.

 Those who does regular maintenance does not have major nor minor repair

 Those who has major repair, tend to have less minor repair.

 These sample population represents majority of the English or British ethnic origin in the region.  Those who has British ethnic origin also has English ethnic origin.  Those who has English ethnic origin is less than British ethnic origin.

 This data only represents a very low number of people who is French ethnic origin.

 Both have the same trend, some who doesn’t answer for family income, answered for household income

 Both of them has the same description. Need to check which one is which.

 The population sample is mostly locals