Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters
Introduction Tariff ? Professional use (Y/N) Postal code Age of the permit Kilowatt of the vehicle Age of the vehicle Vehicle type (4x4 Y/N) Policyholder’s Age Categorical variable Continuous variable Multi-Level Factor
GLMs remain a very important statistical regression technique for pricing car insurance products GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost GAM as a complementary modelling tool Introduction GLM = Generalized Linear Model GAM = Generalized Additive Model
AGENDA Binning continuous variables – GAM to explore nonlinear effects – GAM and regression trees for binning Modelling geographical information
GLM is satisfying modelling tool Industry-wide standard Only categorical variables Continuous variables High computational cost No parametric functional form Binning continuous variables GLM GAM
Binning continuous variables GAM to explore nonlinear effects
Often not desirable to keep the continuous effect in the tariff » GAM has a high computational cost (iterative method) » GAM lacks a parametric functional form GAMs provide insight in defining risk homogeneous groupings of variables
Binning continuous variables GAM for binning Results of the GAM as a starting point for binning – Broader categories where the risk is similar – More categories when the risk varies a lot Defining boundaries by means of regression trees
Binning continuous variables Regression tree Divide variables into groups based on GAM estimate Find splits that minimize overall sum of squared errors Grow tree with desired number of classes Figure: The black coloured nodes correspond to the regression tree used, the blue coloured nodes are the following splits, and the light blue nodes are the subsequent splits
Binning continuous variables Binning results Figure: Visualization of the classes suggested by the regression tree
AGENDA Binning continuous variables Geographical information – Modelling GLM without geographical information GAM with geographical information – Visualizing and binning
Geographical information Introduction
Latitude Longitude Bree: 51°07'08.8"N 5°38'32.5"E
Geographical information Step 1: GLM without geographical information
Predicted number of claims per district Observed number of claims per district
Geographical information Step 2: GAM with geographical information
Geographical information Visualizing and binning the geographic effect
Problematic issue – Different classification methods can yield dissimilar classes – Maps are very sensitive to the classification method used – Visualization of the same data can convey different impressions
Geographical information Visualizing and binning the geographic effect
Conclusion GLMs remain a very important statistical regression technique for pricing car insurance products. GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost. Care is needed when reading and interpreting choropleth maps – Different classification techniques produce different results. – Classification strongly affects the visual impressions readers obtain.