Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Mail Order Company in USA › Would like to find out if there is a way › To reduce mailing cost › By analyzing the past data.

Similar presentations


Presentation on theme: " Mail Order Company in USA › Would like to find out if there is a way › To reduce mailing cost › By analyzing the past data."— Presentation transcript:

1

2  Mail Order Company in USA › Would like to find out if there is a way › To reduce mailing cost › By analyzing the past data

3

4  Business Objectives: › To find out which customers that are good candidate to purchase products › To explore the data to determine company’s valuable customers

5  Assess the Situation: › One CSV file from 3 data sources  Census Group A  Census Group B  Tax Filers › Personnel  Six MTech Students  Minimum experience in data mining › Software  MS Excel  Clementine, Data Scope

6  Data Mining Goals › Predict which variables affects customer buying decision › Build models and compare the cost against randomly-chosen customers › Suggest a model to achieve >1% mailing response

7 ActivitiesDays/Resources Data Preparation -Prepare Excel/CSV File n/a Data Understanding -Explore each variable -Perform some normalizations -Derive new useful variables Each team member for 4 days Knowledge Discovery -Generate Decision Tree -Suggest variables as most important Each team member for 2 days Modelling -Build predictive model -Iterate steps to improve results Each team member for 3 days Reporting -Consolidate all results 2 persons for 2 days

8  First Insights Discovery › Total record is 2158 › Distribution by Objective › Distribution by Gender

9  Data Quality Problems › Some columns are normalized others not › All values are number, harder to visualize › Many data is incomplete › Missing recency, no of transactions and dollars of spending data for individual products

10  Describe Data › Gross properties of data  The data is extracted from a larger set with respond rate of ~1%.All 1079 responders and 1079 randomly chosen non- responders › Relationship between attributes  firstmonth and tenure have a linear relationship, Thus tenure can be omited.

11

12

13

14

15

16  Select Data › Variables chosen  Clean Data › Some normalizations  Construct Data › Chose the variables as input  Data Transformation › Rescaling › Derive new variables

17  Reduce redundancy caused by data integration › Replace lowincome and highincome with IncomeGroup. › Replace gender1,gender2 and gender3 with Gender. › Discard V171 Total taxfilers with unemployment benefits › Discard V175, V181,V184, V190,V193,V196. they equal to male data plus female data

18  Rescaling › Log() of totalspend and totaltrans to reduce effect of large variables  Derive Data › Derive ActAccInMostRecMon from product recency data(no of active accounts in most recent month) › Derive the ratio of low taxfiler income from V156-V163 › Value=V156/sum(V156:V163) › Convert value to 5 categories.

19  Histogram of new variable with Objective overlaid

20

21

22

23

24

25

26  Inverse correlation between English and French speaking regions  No region with significant Tagalog, Spanish or other language-speaking populations  Can probably discard amtspanish, amttagalog, amtsingres, amtengnon, amtmultilin  Cluster/segment English/French areas

27

28

29  Linear relationship for English and French across Census A & B  Can merge amtenglish and bhlenglish  Can merge amtfrench and bhlfrench

30

31  Linear relationship  Merge acflonepar & bfslonepar  Filter out noisy data

32

33  Most data below 0.1  Objective remains constant throughout  Not important to business objective – discard

34

35  Lack of data from other age groups  Very specific targeted marketing to 40- 44 females group  Normalize values from 0 to 0.1 if necessary  Objective improves as proportion increases

36

37  Objective clearly improves when afp1child is on lower end of normal curve

38 7 regions with acfwchcom = 0.19 and objective = 1

39

40  Most regions have above 60% married couples, assuming normalized data  Acftotmar and acfhuswife mirror one another  Can discard either field  Filter noisy data  Categorical : lone-parent and husband- wife

41  As the other cencus and taxfiler data, these data represents the distribution of the region.

42  There is a similar trend, the number of construction between the two period is more or less the same number.  The sample population only represents a small number of people of construction in the region.

43  Those who does regular maintenance does not have major nor minor repair

44  Those who has major repair, tend to have less minor repair.

45  These sample population represents majority of the English or British ethnic origin in the region.  Those who has British ethnic origin also has English ethnic origin.  Those who has English ethnic origin is less than British ethnic origin.

46  This data only represents a very low number of people who is French ethnic origin.

47  Both have the same trend, some who doesn’t answer for family income, answered for household income

48  Both of them has the same description. Need to check which one is which.

49  The population sample is mostly locals


Download ppt " Mail Order Company in USA › Would like to find out if there is a way › To reduce mailing cost › By analyzing the past data."

Similar presentations


Ads by Google