Download presentation
Presentation is loading. Please wait.
Published byPreston Patterson Modified over 9 years ago
1
Predictive Modeling CAS Reinsurance Seminar May 7, 2007 Louise Francis, FCAS, MAAA Louise.francis@data-mines.com Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com
2
2 Francis Analytics www.data-mines.com Why Predictive Modeling? Better use of data than traditional methods Advanced methods for dealing with messy data now available
3
3 Francis Analytics www.data-mines.com Data Mining Goes Prime Time
4
4 Francis Analytics www.data-mines.com Becoming A Popular Tool In All Industries
5
5 Francis Analytics www.data-mines.com Real Life Insurance Application – The “Boris Gang”
6
6 Francis Analytics www.data-mines.com Predictive Modeling Family Predictive Modeling Classical Linear Models GLMsData Mining
7
7 Francis Analytics www.data-mines.com A Casualty Actuary’s Perspective on Data Modeling The Stone Age: 1914 – … –Simple deterministic methods Use of blunt instruments: the analytical analog of bows and arrows –Often ad-hoc –Slice and dice data –Based on empirical data – little use of parametric models The Pre – Industrial age: 1970 - … –Fit probability distribution to model tails –Simulation models and numerical methods for variability and uncertainty analysis –Focus is on underwriting, not claims The Industrial Age – 1985 … –Begin to use computer catastrophe models The 20 th Century – 1990… –European actuaries begin to use GLMs The Computer Age 1996… –Begin to discuss data mining at conferences –At end of 20 st century, large consulting firms starts to build a data mining practice The Current era – A mixture of above –In personal lines, modeling the rule rather than the exception Often GLM based, though GLMs evolving to GAMs –Commercial lines beginning to embrace modeling
8
8 Francis Analytics www.data-mines.com Data Quality: A Data Mining Problem Actuary reviewing a database
9
9 Francis Analytics www.data-mines.com CHAID: The First CAS DM Paper: 1990
10
10 Francis Analytics www.data-mines.com A Problem: Nonlinear Functions An Insurance Nonlinear Function: Provider Bill vs. Probability of Independent Medical Exam
11
11 Francis Analytics www.data-mines.com Classical Statistics: Regression Estimation of parameters: Fit line that minimizes deviation between actual and fitted values
12
12 Francis Analytics www.data-mines.com Generalized Linear Models (GLMs) Relax normality assumption –Exponential family of distributions Models some kinds of nonlinearity
13
13 Francis Analytics www.data-mines.com Generalized Linear Models Common Links for GLMs The identity link: h(Y) = Y The log link: h(Y) = ln(Y) The inverse link: h(Y) = The logit link: h(Y) = The probit link: h(Y) =
14
14 Francis Analytics www.data-mines.com Major Kinds of Data Mining Supervised learning –Most common situation –A dependent variable Frequency Loss ratio Fraud/no fraud –Some methods Regression CART Some neural networks Unsupervised learning –No dependent variable –Group like records together A group of claims with similar characteristics might be more likely to be fraudulent Ex: Territory assignment, Text Mining –Some methods Association rules K-means clustering Kohonen neural networks
15
15 Francis Analytics www.data-mines.com Desirable Features of a Data Mining Method Any nonlinear relationship can be approximated A method that works when the form of the nonlinearity is unknown The effect of interactions can be easily determined and incorporated into the model The method generalizes well on out-of sample data
16
16 Francis Analytics www.data-mines.com The Fraud Surrogates used as Dependent Variables Independent Medical Exam (IME) requested Special Investigation Unit (SIU) referral –(IME successful) –(SIU successful) Data: Detailed Auto Injury Claim Database for Massachusetts Accident Years (1995-1997)
17
17 Francis Analytics www.data-mines.com Predictor Variables Claim file variables –Provider bill, Provider type –Injury Derived from claim file variables –Attorneys per zip code –Docs per zip code Using external data –Average household income –Households per zip
18
18 Francis Analytics www.data-mines.com Different Kinds of Decision Trees Single Trees (CART, CHAID) Ensemble Trees, a more recent development (TREENET, RANDOM FOREST) –A composite or weighted average of many trees (perhaps 100 or more)
19
19 Francis Analytics www.data-mines.com Non Tree Methods MARS – Multivariate Adaptive Regression Splines Neural Networks Naïve Bayes (Baseline) Logistic Regression (Baseline)
20
20 Francis Analytics www.data-mines.com Illustrations Dependent variables –Paid auto BI losses –Fraud surrogates: IME Requested; SIU Requested Predictors –Provider 2 Bill and other variables –Known to have nonlinear relationship The following illustrations will show how some of the techniques model nonlinearity using these predictor and dependent variables
21
21 Francis Analytics www.data-mines.com Classification and Regression Trees (CART) Tree Splits are binary If the variable is numeric, split is based on R 2 or sum or mean squared error –For any variable, choose the two way split of data that reduces the mse the most –Do for all independent variables –Choose the variable that reduces the squared errors the most When dependent is categorical, other goodness of fit measures (gini index, deviance) are used
22
22 Francis Analytics www.data-mines.com CART – Example of 1 st split on Provider 2 Bill, With Paid as Dependent For the entire database, total squared deviation of paid losses around the predicted value (i.e., the mean) is 4.95x1013. The SSE declines to 4.66x10 13 after the data are partitioned using $5,021 as the cutpoint. Any other partition of the provider bill produces a larger SSE than 4.66x10 13. For instance, if a cutpoint of $10,000 is selected, the SSE is 4.76*10 13.
23
23 Francis Analytics www.data-mines.com Continue Splitting to get more homogenous groups at terminal nodes
24
24 Francis Analytics www.data-mines.com CART
25
25 Francis Analytics www.data-mines.com Ensemble Trees: Fit More Than One Tree Fit a series of trees Each tree added improves the fit of the model Average or Sum the results of the fits There are many methods to fit the trees and prevent overfitting Boosting: Iminer Ensemble and Treenet Bagging: Random Forest
26
26 Francis Analytics www.data-mines.com First Tree PredictionSecond (Weighted) Tree Prediction Create Sequence of Predictions
27
27 Francis Analytics www.data-mines.com Treenet Prediction of IME Requested
28
28 Francis Analytics www.data-mines.com Random Forest: IME vs. Provider 2 Bill
29
29 Francis Analytics www.data-mines.com Neural Networks =
30
30 Francis Analytics www.data-mines.com Classical Statistics: Regression One of the most common methods of fitting a function is linear regression Models a relationship between two variables by fitting a straight line through points A lot of infrastructure is available for assessing appropriateness of model
31
31 Francis Analytics www.data-mines.com Neural Networks Also minimizes squared deviation between fitted and actual values Can be viewed as a non-parametric, non-linear regression
32
32 Francis Analytics www.data-mines.com Hidden Layer of Neural Network (Input Transfer Function)
33
33 Francis Analytics www.data-mines.com The Activation Function (Transfer Function) The sigmoid logistic function
34
34 Francis Analytics www.data-mines.com Neural Network: Provider 2 Bill vs. IME Requested
35
35 Francis Analytics www.data-mines.com MARS: Provider 2 Bill vs. IME Requested
36
36 Francis Analytics www.data-mines.com How MARS Fits Nonlinear Function MARS fits a piecewise regression –BF1 = max(0, X – 1,401.00) –BF2 = max(0, 1,401.00 - X ) –BF3 = max(0, X - 70.00) –Y = 0.336 +.145626E-03 * BF1 -.199072E-03 * BF2 -.145947E-03 * BF3; BF1 is basis function –BF1, BF2, BF3 are basis functions MARS uses statistical optimization to find best basis function(s) Basis function similar to dummy variable in regression. Like a combination of a dummy indicator and a linear independent variable
37
37 Francis Analytics www.data-mines.com The Basis Functions: Category Grouping and Interactions Injury type 4 (neck sprain), and type 5 (back sprain) increase faster and have higher scores than the other injury types –BF1 = max(0, 2185 - X ) –BF2 = ( INJTYPE = 4 OR INJTYPE = 5) –BF3 = max(0, X - 159) * BF2 –Y = 2.815 - 0.001 * BF1 + 0.685 * BF2 +.360E-03 * BF3 where –X is the provider bill –INJTYPE is the injury type
38
38 Francis Analytics www.data-mines.com Interactions: MARS Fit
39
39 Francis Analytics www.data-mines.com Baseline Method: Naive Bayes Classifier Naive Bayes assumes feature (predictor variables) independence conditional on each category Probability that an observation X will have a specific set of values for the independent variables is the product of the conditional probabilities of observing each of the values given target category c j, j=1 to m (m typically 2)
40
40 Francis Analytics www.data-mines.com Naïve Bayes Formula A constant
41
41 Francis Analytics www.data-mines.com Naïve Bayes (cont.) Only classification problems (categorical dependents) Only categorical independents Create groupings for numeric variables such as provider bill and age –We split provider bill and age into quintiles (5 groups)
42
42 Francis Analytics www.data-mines.com Naïve Bayes: Probability of Independent Variable Given No IME/ IME
43
43 Francis Analytics www.data-mines.com Put it together to make a prediction Plug in specific values for Provider 2 Bill and provider type and get a prediction Chain together product of bill* probability of type probability * probability IME
44
44 Francis Analytics www.data-mines.com Advantages/Disadvantages Computationally efficient Under many circumstances has performed well Assumption of conditional independence often does not hold Can’t be used for numeric variables
45
45 Francis Analytics www.data-mines.com Naïve Bayes Predicted IME vs. Provider 2 Bill
46
46 Francis Analytics www.data-mines.com True/False Positives and True/False Negatives (Type I and Type II Errors) The “Confusion” Matrix Choose a “cut point” in the model score. Claims > cut point, classify “yes”.
47
47 Francis Analytics www.data-mines.com ROC Curves and Area Under the ROC Curve Want good performance both on sensitivity and specificity Sensitivity and specificity depend on cut points chosen –Choose a series of different cut points, and compute sensitivity and specificity for each of them –Graph results Plot sensitivity vs 1-specifity Compute an overall measure of “lift”, or area under the curve
48
48 Francis Analytics www.data-mines.com TREENET ROC Curve – IME Explain AUROC AUROC = 0.701
49
49 Francis Analytics www.data-mines.com Logistic ROC Curve – SIU AUROC = 0.612
50
50 Francis Analytics www.data-mines.com Ranking of Methods/Software – IME Requested
51
51 Francis Analytics www.data-mines.com Some Software Packages That Can be Used Excel Access Free Software R Web based software S-Plus (similar to commercial version of R) SPSS CART/MARS Data Mining suites – (SAS Enterprise Miner/SPSS Clementine)
52
52 Francis Analytics www.data-mines.com References Derrig, R., Francis, L., “Distinguishing the Forest from the Trees: A Comparison of Tree Based Data Mining Methods”, CAS Winter Forum, March 2006, WWW.casact.org Derrig, R., Francis, L., “A Comparison of Methods for Predicting Fraud ”,Risk Theory Seminar, April 2006 Francis, L., “Taming Text: An Introduction to Text Mining”, CAS Winter Forum, March 2006, WWW.casact.org Francis, L.A., Neural Networks Demystified, Casualty Actuarial Society Forum, Winter, pp. 254-319, 2001. Francis, L.A., Martian Chronicles: Is MARS better than Neural Networks? Casualty Actuarial Society Forum, Winter, pp. 253-320, 2003b. Dahr, V, Seven Methods for Transforming Corporate into Business Intelligence, Prentice Hall, 1997 The web site WWW.data-mines.com has some tutorials and presentations
53
Predictive Modeling CAS Reinsurance Seminar May, 2006 Louise.francis@data-mines.com www.data-mines.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.