Download presentation
Presentation is loading. Please wait.
Published byEustace Powell Modified over 9 years ago
1
DATA MINING FINAL REPORT Vipin Saini M964011062 許博淞 M964020009 陳昀志 M964020043
2
Outline Introduction DM Methodology(Step1~Step3) DM Methodology(Step4~Step8) DM Methodology(Step9~Step10) Conclusion
3
Introduction Direct marketing Response rate Telecommunications company Publicly available business data Addition of random companies
4
Step2-Records Some characteristics about each prospect Number of employees at a particular office Number of employees for the entire company Annual sales (in thousands) at a particular office Annual sales (in thousands) for the entire company Whether or not the company does business outside the United States Annual advertising expense Whether the company has moved recently or is a new business The type of ownership Specific industry code General industry code Age of the company (in years)
5
Step3-Data Type Correcting the data types. Make sure "Buyer" is the type Yes/No. Change the type of Age to integer. Make sure the "International" type is string or Boolean. Change "Local Employees" to integer. Change "Local Sales" to integer. Change "Industry Type" to categorical. Change "Total Employees" to integer. Change "Total Sales" to integer.
6
Step4 Create a Model Set The number of employees and the number of sales differ based on the size of the company. All of these characteristics represent a picture of company size. Employee Ratio, Sales Ratio, Productivity Ratio
7
Step4: Create a Model Set With our newly applied rules, the World dataset now has redundant columns.
8
Step5: Fix Problems with the Data Categorical variables with too many values
9
Step6: Transform the data create a training and testing set Total Records : 13117
10
Step7: Build Model We use PolyAnalyst to help us to mine the data, and the version is 5.0.
11
Step7: Build Model We used MarketData.CSV file which we edited as the source. After the software filtrated out missing values, we had the decision tree.
12
the Decision Tree Root Local Employee<23 Age<3 Age<2 Sales Ratio < 0.0027 Sales Ratio >= 0.0027 Sales Ratio =N/A Age>=2 Age>=3 Local Employee <10 Local Employee >=10 Local Employee>=23 Industry Category = C Industry Category = H Industry Category = F Industry Category = E Employee Ratio< 0.214 Industry Category = D Industry Category = A Industry Category = B Industry Category = G Industry Category = I
13
the Decision Tree We made a decision tree with: Number of non-terminal nodes : 41 Number of leaves : 91 Depth of the tree : 8
14
Step 8:Assess model the result of decision tree of Training set: Total classification error: 14.04% Classification accuracy: 85.96% Classification error for class No: 14.89% Classification error for class Yes: 13.01% Real/predictNoYes undefined No 301852849 Yes 379253549
15
Step 8:Assess model If we use top 40% of data and can use this model to predict 80% corrected response.
16
Step 9. Deploy models The testing set is random selected 50 % of records from the whole dataset. Total classification error: 15.54% Classification accuracy: 84.46% Classification error for class No: 16.56% Classification error for class Yes: 14.19% Real/predictNoYes undefined No 307461045 Yes 396239539
17
Step 10. Assess result Root Local Employee<23 Local Employee>=23 Yes No
18
Step 10. Assess result Almost every company that have more than 23 employee have higher ratio to respond. (Class label is Yes and the ratio is 75.5%). a bigger company with more employee which have higher trends to response. the number of employee is smaller than 23, are likely not to response (Class label is No and the ratio is 72.9%) a small company doesn’t have trends to response
19
Step 10. Assess result Root Local Employee<23 Local Employee>=23 Industry Category = C Industry Category = H Industry Category = F Industry Category = E Employee Ratio< 0.214 Employee Ratio >= 0.214 Industry Category = D Industry Category = A Industry Category = B Industry Category = G Industry Category = I YesNo
20
Step 10. Assess result if the Local Employee ratio is smaller than 0.214 then the response ratio is low. (class label is No and the ratio is 85.7%) if the Local Employee ratio is bigger than 0.214 then the response ratio is high. (class label is Yes and the ratio is 66.2%) the Local employee ratio have influence on response ratio of the bigger companies and Industry Category is E, depends on how is the Local employee Ratio is.
21
Step 10. Assess result Root Local Employee<23 Age<3 Age<2 Sales Ratio < 0.0027 Sales Ratio >= 0.0027Sales Ratio =N/A Age>=2 Age>=3 Local Employee <10 Local Employee >=10 Local Employee>=23 Yes
22
Step 10. Assess result if the Sales ratio is more than 0.27% then the response ration is high (class label is Yes and the ratio is 98.2%) a new beginning company and his sales rate is good, so he likes to response.
23
Conclusion We use a decision tree to approach the target marketing. Knowing how the industry category type is, we can get more information from this mining result.
24
Thanks For Your Listening!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.