Predictive Modeling CAS Reinsurance Seminar May 7, 2007 Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
2 Francis Analytics Why Predictive Modeling? Better use of data than traditional methods Advanced methods for dealing with messy data now available
3 Francis Analytics Data Mining Goes Prime Time
4 Francis Analytics Becoming A Popular Tool In All Industries
5 Francis Analytics Real Life Insurance Application – The “Boris Gang”
6 Francis Analytics Predictive Modeling Family Predictive Modeling Classical Linear Models GLMsData Mining
7 Francis Analytics A Casualty Actuary’s Perspective on Data Modeling The Stone Age: 1914 – … –Simple deterministic methods Use of blunt instruments: the analytical analog of bows and arrows –Often ad-hoc –Slice and dice data –Based on empirical data – little use of parametric models The Pre – Industrial age: … –Fit probability distribution to model tails –Simulation models and numerical methods for variability and uncertainty analysis –Focus is on underwriting, not claims The Industrial Age – 1985 … –Begin to use computer catastrophe models The 20 th Century – 1990… –European actuaries begin to use GLMs The Computer Age 1996… –Begin to discuss data mining at conferences –At end of 20 st century, large consulting firms starts to build a data mining practice The Current era – A mixture of above –In personal lines, modeling the rule rather than the exception Often GLM based, though GLMs evolving to GAMs –Commercial lines beginning to embrace modeling
8 Francis Analytics Data Quality: A Data Mining Problem Actuary reviewing a database
9 Francis Analytics CHAID: The First CAS DM Paper: 1990
10 Francis Analytics A Problem: Nonlinear Functions An Insurance Nonlinear Function: Provider Bill vs. Probability of Independent Medical Exam
11 Francis Analytics Classical Statistics: Regression Estimation of parameters: Fit line that minimizes deviation between actual and fitted values
12 Francis Analytics Generalized Linear Models (GLMs) Relax normality assumption –Exponential family of distributions Models some kinds of nonlinearity
13 Francis Analytics Generalized Linear Models Common Links for GLMs The identity link: h(Y) = Y The log link: h(Y) = ln(Y) The inverse link: h(Y) = The logit link: h(Y) = The probit link: h(Y) =
14 Francis Analytics Major Kinds of Data Mining Supervised learning –Most common situation –A dependent variable Frequency Loss ratio Fraud/no fraud –Some methods Regression CART Some neural networks Unsupervised learning –No dependent variable –Group like records together A group of claims with similar characteristics might be more likely to be fraudulent Ex: Territory assignment, Text Mining –Some methods Association rules K-means clustering Kohonen neural networks
15 Francis Analytics Desirable Features of a Data Mining Method Any nonlinear relationship can be approximated A method that works when the form of the nonlinearity is unknown The effect of interactions can be easily determined and incorporated into the model The method generalizes well on out-of sample data
16 Francis Analytics The Fraud Surrogates used as Dependent Variables Independent Medical Exam (IME) requested Special Investigation Unit (SIU) referral –(IME successful) –(SIU successful) Data: Detailed Auto Injury Claim Database for Massachusetts Accident Years ( )
17 Francis Analytics Predictor Variables Claim file variables –Provider bill, Provider type –Injury Derived from claim file variables –Attorneys per zip code –Docs per zip code Using external data –Average household income –Households per zip
18 Francis Analytics Different Kinds of Decision Trees Single Trees (CART, CHAID) Ensemble Trees, a more recent development (TREENET, RANDOM FOREST) –A composite or weighted average of many trees (perhaps 100 or more)
19 Francis Analytics Non Tree Methods MARS – Multivariate Adaptive Regression Splines Neural Networks Naïve Bayes (Baseline) Logistic Regression (Baseline)
20 Francis Analytics Illustrations Dependent variables –Paid auto BI losses –Fraud surrogates: IME Requested; SIU Requested Predictors –Provider 2 Bill and other variables –Known to have nonlinear relationship The following illustrations will show how some of the techniques model nonlinearity using these predictor and dependent variables
21 Francis Analytics Classification and Regression Trees (CART) Tree Splits are binary If the variable is numeric, split is based on R 2 or sum or mean squared error –For any variable, choose the two way split of data that reduces the mse the most –Do for all independent variables –Choose the variable that reduces the squared errors the most When dependent is categorical, other goodness of fit measures (gini index, deviance) are used
22 Francis Analytics CART – Example of 1 st split on Provider 2 Bill, With Paid as Dependent For the entire database, total squared deviation of paid losses around the predicted value (i.e., the mean) is 4.95x1013. The SSE declines to 4.66x10 13 after the data are partitioned using $5,021 as the cutpoint. Any other partition of the provider bill produces a larger SSE than 4.66x For instance, if a cutpoint of $10,000 is selected, the SSE is 4.76*10 13.
23 Francis Analytics Continue Splitting to get more homogenous groups at terminal nodes
24 Francis Analytics CART
25 Francis Analytics Ensemble Trees: Fit More Than One Tree Fit a series of trees Each tree added improves the fit of the model Average or Sum the results of the fits There are many methods to fit the trees and prevent overfitting Boosting: Iminer Ensemble and Treenet Bagging: Random Forest
26 Francis Analytics First Tree PredictionSecond (Weighted) Tree Prediction Create Sequence of Predictions
27 Francis Analytics Treenet Prediction of IME Requested
28 Francis Analytics Random Forest: IME vs. Provider 2 Bill
29 Francis Analytics Neural Networks =
30 Francis Analytics Classical Statistics: Regression One of the most common methods of fitting a function is linear regression Models a relationship between two variables by fitting a straight line through points A lot of infrastructure is available for assessing appropriateness of model
31 Francis Analytics Neural Networks Also minimizes squared deviation between fitted and actual values Can be viewed as a non-parametric, non-linear regression
32 Francis Analytics Hidden Layer of Neural Network (Input Transfer Function)
33 Francis Analytics The Activation Function (Transfer Function) The sigmoid logistic function
34 Francis Analytics Neural Network: Provider 2 Bill vs. IME Requested
35 Francis Analytics MARS: Provider 2 Bill vs. IME Requested
36 Francis Analytics How MARS Fits Nonlinear Function MARS fits a piecewise regression –BF1 = max(0, X – 1,401.00) –BF2 = max(0, 1, X ) –BF3 = max(0, X ) –Y = E-03 * BF E-03 * BF E-03 * BF3; BF1 is basis function –BF1, BF2, BF3 are basis functions MARS uses statistical optimization to find best basis function(s) Basis function similar to dummy variable in regression. Like a combination of a dummy indicator and a linear independent variable
37 Francis Analytics The Basis Functions: Category Grouping and Interactions Injury type 4 (neck sprain), and type 5 (back sprain) increase faster and have higher scores than the other injury types –BF1 = max(0, X ) –BF2 = ( INJTYPE = 4 OR INJTYPE = 5) –BF3 = max(0, X - 159) * BF2 –Y = * BF * BF E-03 * BF3 where –X is the provider bill –INJTYPE is the injury type
38 Francis Analytics Interactions: MARS Fit
39 Francis Analytics Baseline Method: Naive Bayes Classifier Naive Bayes assumes feature (predictor variables) independence conditional on each category Probability that an observation X will have a specific set of values for the independent variables is the product of the conditional probabilities of observing each of the values given target category c j, j=1 to m (m typically 2)
40 Francis Analytics Naïve Bayes Formula A constant
41 Francis Analytics Naïve Bayes (cont.) Only classification problems (categorical dependents) Only categorical independents Create groupings for numeric variables such as provider bill and age –We split provider bill and age into quintiles (5 groups)
42 Francis Analytics Naïve Bayes: Probability of Independent Variable Given No IME/ IME
43 Francis Analytics Put it together to make a prediction Plug in specific values for Provider 2 Bill and provider type and get a prediction Chain together product of bill* probability of type probability * probability IME
44 Francis Analytics Advantages/Disadvantages Computationally efficient Under many circumstances has performed well Assumption of conditional independence often does not hold Can’t be used for numeric variables
45 Francis Analytics Naïve Bayes Predicted IME vs. Provider 2 Bill
46 Francis Analytics True/False Positives and True/False Negatives (Type I and Type II Errors) The “Confusion” Matrix Choose a “cut point” in the model score. Claims > cut point, classify “yes”.
47 Francis Analytics ROC Curves and Area Under the ROC Curve Want good performance both on sensitivity and specificity Sensitivity and specificity depend on cut points chosen –Choose a series of different cut points, and compute sensitivity and specificity for each of them –Graph results Plot sensitivity vs 1-specifity Compute an overall measure of “lift”, or area under the curve
48 Francis Analytics TREENET ROC Curve – IME Explain AUROC AUROC = 0.701
49 Francis Analytics Logistic ROC Curve – SIU AUROC = 0.612
50 Francis Analytics Ranking of Methods/Software – IME Requested
51 Francis Analytics Some Software Packages That Can be Used Excel Access Free Software R Web based software S-Plus (similar to commercial version of R) SPSS CART/MARS Data Mining suites – (SAS Enterprise Miner/SPSS Clementine)
52 Francis Analytics References Derrig, R., Francis, L., “Distinguishing the Forest from the Trees: A Comparison of Tree Based Data Mining Methods”, CAS Winter Forum, March 2006, Derrig, R., Francis, L., “A Comparison of Methods for Predicting Fraud ”,Risk Theory Seminar, April 2006 Francis, L., “Taming Text: An Introduction to Text Mining”, CAS Winter Forum, March 2006, Francis, L.A., Neural Networks Demystified, Casualty Actuarial Society Forum, Winter, pp , Francis, L.A., Martian Chronicles: Is MARS better than Neural Networks? Casualty Actuarial Society Forum, Winter, pp , 2003b. Dahr, V, Seven Methods for Transforming Corporate into Business Intelligence, Prentice Hall, 1997 The web site has some tutorials and presentations
Predictive Modeling CAS Reinsurance Seminar May,