Download presentation
Presentation is loading. Please wait.
Published byTerence Cooper Modified over 8 years ago
1
IT Management Case # 8 - A Case on Decision Tree: Customer Churning Forecasting and Strategic Implication in Online Auto Insurance using Decision Tree Algorithms
2
3. 고객이탈 예측 (Data mining) Prediction of churn customers in on-line auto insurqnce :who are leaving our company? ○ ID3 (Interactive Dichotomizer 3) ○ C4.5, C5.0 ○ CART (Classification and Regression Tree) ○ CHARD (Chi-square Automatic Interaction Detection Decision Tree ※ Logistic regression model(LRM) Multivariate discriminant analysis(MDA) Data mining Statistics 3/10
3
TypeC5.0CHAIDCART Target Class (ouput) Values CategoricalAll # of Branches In each node Multi binary 3. 고객이탈 예측 (Data mining)
4
4. 의사결정나무 알고리즘의 장단점 Advantages ○ Generate understandable rules ○ Able to perform in rule-oriented domains ○ Easy of calculation at classification time ○ Able to handle continuous and categorical variables ○ Able to indicate best fields clearly Disadvantages ○ error-prone with too many classes ○ computationally expensive to train Decision Tree Algorithm 4/10
5
Data Collection ○ Sample Data : 13,200 - year 2003 ∼ year 2004, Auto insurance contracts Insurance agents Variables ○ 25 candidate variables - select 15 variables by t-test and chi-square 5/10
6
6. ` 유의한 변수 도출 Inducing 15 variables including numbers and categorical variables T-test(numbers ) ○ Driver’age ○ Price of car ○ Medical expenses ○ Last premium. ○ Date ○ Year of car ○ comprehensive BI. ○ Zip code ○ Type of car ○ # of air bag ○ Deductible. ○ Gender chi-square(categorical ) Selected No selected Not selelcted 6/10
7
7. 유의한 변수 data 요약 Variables ○ Depedendent variables : Switch= 1, No Switch = 0 ○ Independent variables : numeric (t-test)or categorical (Chi-square) variables 7/10
8
8. Sample Test Sample Data Classification ○ Sample Data : 13,200 - Switches: 6,600 / No Switch: 6,600 Test ○ Prediction methods : C5.0, LRM, MDA ○ Training data : 10,560(80%) Holdout data : 2,640(20%) Prediction Accuracy Data setsLRMMDAC5.0 Training data65.359.767.39 Holdout data60.059.468.71 8/10
9
9. 예측기법 비교 C5.0 ○ Program : Clementine 8.1 ○ Accuracy from test - Training data : 67.39%, Holdout data : 68.71% ○ Inducing Rules - Switch: 58 Rules - No Switch: 65 Rules LRM ○ Program : SPSS 11.1 ○ Accuracy from test - Training data : 60.0%, Holdout data : 65.3% MDA ○ Program : SPSS 11.1 ○ Accuracy from test - Training data : 59.7%, Holdout data : 59.4% Comparison results ○ C5.0 is superve in predicting churn customers - it can be used to analyze on-line auto insurance and predict Churn customers 9/10
10
10. 종합 및 적용 가능한 Rule 예시 Conclusion ○ Rule-based analysis of auto insurance market & inducing marketing strategy - reduce churn rate(keep them) Churn Rules 10/10
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.