Download presentation
Presentation is loading. Please wait.
Published byAlbert Paul Modified over 9 years ago
1
© Deloitte Consulting, 2004 Introduction to Data Mining James Guszcza, FCAS, MAAA CAS 2004 Ratemaking Seminar Philadelphia March 11-12, 2004
2
© Deloitte Consulting, 2004 2 Themes What is Data Mining? How does it relate to statistics? Insurance applications Data sources The Data Mining Process Model Design Modeling Techniques Louise Francis’ Presentation
3
© Deloitte Consulting, 2004 3 Themes How does data mining need actuarial science? Variable creation Model design Model evaluation How does actuarial science need data mining? Advances in computing, modeling techniques Ideas from other fields can be applied to insurance problems
4
© Deloitte Consulting, 2004 4 Themes “The quiet statisticians have changed our world; not by discovering new facts or technical developments, but by changing the ways that we reason, experiment and form our opinions.” -- Ian Hacking Data mining gives us new ways of approaching the age-old problems of risk selection and pricing…. ….and other problems not traditionally considered ‘actuarial’.
5
© Deloitte Consulting, 2004 What is Data Mining?
6
© Deloitte Consulting, 2004 6 What is Data Mining? My definition: “Statistics for the Computer Age” Many new techniques have come from Computer Science, Marketing, Biology… but all can (should!) be brought under the framework of “statistics” Not a radical break with traditional statistics Complements, builds on traditional statistics Statistics enriched with brute-force capabilities of modern computing Opens the door to new techniques Therefore Data Mining tends to be associated with industrial-sized data sets
7
© Deloitte Consulting, 2004 7 Buzz-words Data Mining Knowledge Discovery Machine Learning Statistical Learning Predictive Modeling Supervised Learning Unsupervised Learning ….etc
8
© Deloitte Consulting, 2004 8 What is Data Mining? Supervised learning: predict the value of a target variable based on several predictive variables “Predictive Modeling” Credit / non-credit scoring engines Retention, cross-sell models Unsupervised learning: describe associations and patterns along many dimensions without any target information Customer segmentation Data Clustering Market basket analysis (“diapers and beer”)
9
© Deloitte Consulting, 2004 9 So Why Should Actuaries Do This Stuff? Any application of statistics requires subject-matter expertise Psychometricians Econometricians Bioinformaticians Marketing scientists …are all applied statisticians with a particular subject- matter expertise & area of specialty Add actuarial modelers to this list! “Insurometricians”!? Actuarial knowledge is critical to the success of insurance data mining projects
10
© Deloitte Consulting, 2004 10 Three Concepts Scoring engines A “predictive model” by any other name… Lift curves How much worse than average are the policies with the worst scores? Out-of-sample tests How well will the model work in the real world? Unbiased estimate of predictive power
11
© Deloitte Consulting, 2004 11 Classic Application: Scoring Engines Scoring engine: formula that classifies or separates policies (or risks, accounts, agents…) into profitable vs. unprofitable Retaining vs. non-retaining… (Non-)Linear equation f( ) of several predictive variables Produces continuous range of scores score = f(X 1, X 2, …, X N )
12
© Deloitte Consulting, 2004 12 What “Powers” a Scoring Engine? Scoring Engine: score = f(X 1, X 2, …, X N ) The X 1, X 2,…, X N are at least as important as the f( )! Again why actuarial expertise is necessary Think of the predictive power of credit variables A large part of the modeling process consists of variable creation and selection Usually possible to generate 100’s of variables Steepest part of the learning curve
13
© Deloitte Consulting, 2004 13 Model Evaluation: Lift Curves Sort data by score Break the dataset into 10 equal pieces Best “decile”: lowest score lowest LR Worst “decile”: highest score highest LR Difference: “Lift” Lift = segmentation power Lift translates into ROI of the modeling project
14
© Deloitte Consulting, 2004 14 Out-of-Sample Testing Randomly divide data into 3 pieces Training data, Test data, Validation data Use Training data to fit models Score the Test data to create a lift curve Perform the train/test steps iteratively until you have a model you’re happy with During this iterative phase, validation data is set aside in a “lock box” Once model has been finalized, score the Validation data and produce a lift curve Unbiased estimate of future performance
15
© Deloitte Consulting, 2004 15 Data Mining: Applications The classic: Profitability Scoring Model Underwriting/Pricing applications Credit models Retention models Elasticity models Cross-sell models Lifetime Value models Agent/agency monitoring Target marketing Fraud detection Customer segmentation no target variable (“unsupervised learning”)
16
© Deloitte Consulting, 2004 16 Skills needed Statistical Beyond college/actuarial exams… fast-moving field Actuarial The subject-matter expertise Programming! Need scalable software, computing environment IT - Systems Administration Data extraction, data load, model implementation Project Management Absolutely critical because of the scope & multidisciplinary nature of data mining projects
17
© Deloitte Consulting, 2004 17 Data Sources Company’s internal data Policy-level records Loss & premium transactions Billing VIN…….. Externally purchased data Credit CLUE MVR Census ….
18
© Deloitte Consulting, 2004 The Data Mining Process
19
© Deloitte Consulting, 2004 19 Raw Data Research/Evaluate possible data sources Availability Hit rate Implementability Cost-effectiveness Extract/purchase data Check data for quality (QA) At this stage, data is still in a “raw” form Often start with voluminous transactional data Much of the data mining process is “messy”
20
© Deloitte Consulting, 2004 20 Variable Creation Create predictive and target variables Need good programming skills Need domain and business expertise Steepest part of the learning curve Discuss specifics of variable creation with company experts Underwriters, Actuaries, Marketers… Opportunity to quantify tribal wisdom
21
© Deloitte Consulting, 2004 21 Variable Transformation Univariate analysis of predictive variables Exploratory Data Analysis (EDA) Data Visualization Use EDA to cap / transform predictive variables Extreme values Missing values …etc
22
© Deloitte Consulting, 2004 22 Multivariate Analysis Examine correlations among the variables Weed out redundant, weak, poorly distributed variables Model design Build candidate models Regression/GLM Decision Trees/MARS Neural Networks Select final model
23
© Deloitte Consulting, 2004 23 Model Analysis & Implementation Perform model analytics Necessary for client to gain comfort with the model Calibrate Models Create user-friendly “scale” – client dictates Implement models Programming skills again are critical Monitor performance Distribution of scores/variables, usage of the models,..etc Plan model maintenance schedule
24
© Deloitte Consulting, 2004 Model Design Where Data Mining Needs Actuarial Science
25
© Deloitte Consulting, 2004 25 Model Design Issues Which target variable to use? Frequency & severity Loss Ratio, other profitability measures Binary targets: defection, cross-sell …etc How to prepare the target variable? Period - 1-year or Multi-year? Losses evaluated @? Cap large losses? Cat losses? How / whether to re-rate, adjust premium? What counts as a “retaining” policy? …etc
26
© Deloitte Consulting, 2004 26 Model Design Issues Which data points to include/exclude Certain classes of business? Certain states? …etc Which variables to consider? Credit, or non-credit only? Include rating variables in the model? Exclude certain variables for regulatory reasons? …etc What is the “level” of the model? Policy-term level, HH-level, Risk-level..etc Or should data be summarized into “cells” à la minimum bias?
27
© Deloitte Consulting, 2004 27 Model Design Issues How should model be evaluated? Lift curves, Gains chart, ROC curve? How to measure ROI? How to split data into train/test/validation? Or cross- validation? Is there enough data for lift curve to be “credible”? Are your “incredible” results credible? …etc Not an exhaustive list – every project raises different actuarial issues!
28
© Deloitte Consulting, 2004 28 Reference My favorite textbook: The Elements of Statistical Learning --Jerome Friedman, Trevor Hastie, Robert Tibshirani
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.