Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.

Similar presentations


Presentation on theme: "Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa."— Presentation transcript:

1 Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

2 Francis Analytics and Actuarial Data Mining, Inc.
Objectives Answer the question: Why use data mining? Introduce the main data mining methods Decision Trees Neural Networks MARS Clustering Francis Analytics and Actuarial Data Mining, Inc.

3 Francis Analytics and Actuarial Data Mining, Inc.
The Data Simulated Data for Automobile Claim Frequency Three Factors Territory Four Territories Age Continuous Function Mileage High, Low Francis Analytics and Actuarial Data Mining, Inc.

4 Francis Analytics and Actuarial Data Mining, Inc.
Data Challenges Nonlinearities Relation between dependent variable and independent variables is not linear or cannot be transformed to linear Interactions Relation between independent and dependent variable varies by one or more other variables Correlations Predictor variables are correlated with each other Francis Analytics and Actuarial Data Mining, Inc.

5 Francis Analytics and Actuarial Data Mining, Inc.
Simulated Example: Probability of Claim vs. Age by Territory and Mileage Group Francis Analytics and Actuarial Data Mining, Inc.

6 Francis Analytics and Actuarial Data Mining, Inc.
Claim Frequency Data Francis Analytics and Actuarial Data Mining, Inc.

7 Independent Probabilities for Each Variable
Francis Analytics and Actuarial Data Mining, Inc.

8 Francis Analytics and Actuarial Data Mining, Inc.
Decision Trees Recursively partitions the data Often sequentially bifurcates the data – but can split into more groups Applies goodness of fit to select best partition at each step Selects the partition which results in largest improvement to goodness of fit statistic Francis Analytics and Actuarial Data Mining, Inc.

9 Goodness of Fit Statistics
Chi Square  CHAID (Fish, Gallagher, Monroe- Discussion Paper Program, 1990) Deviance  CART Francis Analytics and Actuarial Data Mining, Inc.

10 Goodness of Fit Statistics
Gini Measure  CART Francis Analytics and Actuarial Data Mining, Inc.

11 Goodness of Fit Statistics
Entropy  C4.5 Francis Analytics and Actuarial Data Mining, Inc.

12 Territory = North / South
First Split All Policyholders P = 1.00 Territory = North / South P = 0.11 Territory = East / West P = 0.06 Francis Analytics and Actuarial Data Mining, Inc.

13 Example of Goodness of Fit Calculation
Francis Analytics and Actuarial Data Mining, Inc.

14 Francis Analytics and Actuarial Data Mining, Inc.
Example of Fitted Tree Francis Analytics and Actuarial Data Mining, Inc.

15 Francis Analytics and Actuarial Data Mining, Inc.
MARS Multivariate Adaptive Regression Splines An extension of regression which Uses automated search procedures Models nonlinearities Models interactions Produces a regression-like formula Francis Analytics and Actuarial Data Mining, Inc.

16 Nonlinear Relationships
Fits piecewise regression to continuous variables Francis Analytics and Actuarial Data Mining, Inc.

17 Francis Analytics and Actuarial Data Mining, Inc.
Interactions Fits basis functions (which are like dummy variables) to model interactions An interaction between Territory=East and Mileage can be modeled by a dummy variable which is 1 if the Territory=East and mileage =High and 0 otherwise. Francis Analytics and Actuarial Data Mining, Inc.

18 Goodness of Fit Statistics
Generalized Cross-Validation Francis Analytics and Actuarial Data Mining, Inc.

19 Francis Analytics and Actuarial Data Mining, Inc.
Fitted MARS Model Francis Analytics and Actuarial Data Mining, Inc.

20 Francis Analytics and Actuarial Data Mining, Inc.
Neural Networks Developed by artificial intelligence experts – but now used by statisticians also Based on how neurons function in brain Francis Analytics and Actuarial Data Mining, Inc.

21 Neural Network Structure
Francis Analytics and Actuarial Data Mining, Inc.

22 Francis Analytics and Actuarial Data Mining, Inc.
Neural Networks Fit by minimizing squared deviation between fitted and actual values Can be viewed as a non-parametric, non-linear regression Often thought of as a “black box” Due to complexity of fitted model it is difficult to understand relationship between dependent and predictor variables Francis Analytics and Actuarial Data Mining, Inc.

23 Understanding the Model: Variable Importance
Look at weights to hidden layer Compute sensitivities: a measure of how much the predicted value’s error increases when the variables are excluded from the model one at a time Francis Analytics and Actuarial Data Mining, Inc.

24 Francis Analytics and Actuarial Data Mining, Inc.
Importance Ranking Neural Network and Mars ranked variables in same order Francis Analytics and Actuarial Data Mining, Inc.

25 Visualizing Fitted Neural Network
Francis Analytics and Actuarial Data Mining, Inc.

26 ROC Curves for the Data Mining Methods
Francis Analytics and Actuarial Data Mining, Inc.

27 Francis Analytics and Actuarial Data Mining, Inc.
Correlation Variable gender added Its only impact on probability of a claim: correlation with mileage variable – males had higher mileage MARS did not use the variable in model CART used it in two places to split tree Neural Network ranked gender as least important variable Francis Analytics and Actuarial Data Mining, Inc.

28 How the Methods Did Correlation with “True” Claim Frequency
Francis Analytics and Actuarial Data Mining, Inc.

29 Unsupervised Learning
Common Method: Clustering No dependent variable – records are grouped into classes with similar values on the variable Start with a measure of similarity or dissimilarity Maximize dissimilarity between members of different clusters Francis Analytics and Actuarial Data Mining, Inc.

30 Dissimilarity (Distance) Measure
Euclidian Distance Manhattan Distance Francis Analytics and Actuarial Data Mining, Inc.

31 Francis Analytics and Actuarial Data Mining, Inc.
Binary Variables Francis Analytics and Actuarial Data Mining, Inc.

32 Francis Analytics and Actuarial Data Mining, Inc.
Binary Variables Sample Matching Rogers and Tanimoto Francis Analytics and Actuarial Data Mining, Inc.

33 Francis Analytics and Actuarial Data Mining, Inc.
Example: Fraud Data Data from 1993 closed claim study conducted by Automobile Insurers Bureau of Massachusetts Claim files often have variables which may be useful in assessing suspicion of fraud, but a dependent variable is often not available Variables used for clustering: Injury type Provider type Legal representation Prior Claim SIU Investigation Francis Analytics and Actuarial Data Mining, Inc.

34 Francis Analytics and Actuarial Data Mining, Inc.
Results for 2 Clusters Francis Analytics and Actuarial Data Mining, Inc.

35 Francis Analytics and Actuarial Data Mining, Inc.
Beginners Library Berry, Michael J. A., and Linoff, Gordon, Data Mining Techniques, John Wiley and Sons, 1997 Kaufman, Leonard and Rousseeuw, Peter, Finding Groups in Data, John Wiley and Sons, 1990 Smith, Murry, Neural Networks for Statistical Modeling, International Thompson Computer Press, 1996 Francis Analytics and Actuarial Data Mining, Inc.

36 Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.


Download ppt "Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa."

Similar presentations


Ads by Google