Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters.

Slides:



Advertisements
Similar presentations
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
Advertisements

Chapter 7 Classification and Regression Trees
Generalized Additive Models Keith D. Holler September 19, 2005 Keith D. Holler September 19, 2005.
SPM 2002 C1C2C3 X =  C1 C2 Xb L C1 L C2  C1 C2 Xb L C1  L C2 Y Xb e Space of X C1 C2 Xb Space X C1 C2 C1  C3 P C1C2  Xb Xb Space of X C1 C2 C1 
Discriminant Analysis Database Marketing Instructor:Nanda Kumar.
 Coefficient of Determination Section 4.3 Alan Craig
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Business Statistics - QBM117 Scatter diagrams and measures of association.
Chapter 7 – Classification and Regression Trees
CMPUT 466/551 Principal Source: CMU
Chapter 7 – Classification and Regression Trees
Maureen Meadows Senior Lecturer in Management, Open University Business School.
x – independent variable (input)
Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting Louise Francis,
Simple Linear Regression
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Correlation and Regression Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Quantitative Demand Analysis
Regression and Correlation Methods Judy Zhong Ph.D.
Regression Analysis What is regression ?What is regression ? Best-fit lineBest-fit line Least squareLeast square What is regression ?What is regression.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 24 Statistical Inference: Conclusion.
Simple Linear Regression
Anthony Greene1 Correlation The Association Between Variables.
Chapter 9 – Classification and Regression Trees
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Statistical Methods Statistical Methods Descriptive Inferential
Business Intelligence and Decision Modeling Week 11 Predictive Modeling (2) Logistic Regression.
Curve-Fitting Regression
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
Modeling the Loss Process for Medical Malpractice Bill Faltas GE Insurance Solutions CAS Special Interest Seminar … Predictive Modeling “GLM and the Medical.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Bivariate Poisson regression models for automobile insurance pricing Lluís Bermúdez i Morata Universitat de Barcelona IME 2007 Piraeus, July.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
9-1 ESTIMATION Session Factors Affecting Confidence Interval Estimates The factors that determine the width of a confidence interval are: 1.The.
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Correlation and Linear Regression
Chapter 7. Classification and Prediction
Bagging and Random Forests
Introduction to Machine Learning and Tree Based Methods
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15
Dimension Reduction in Workers Compensation
Ch3: Model Building through Regression
Validation of Regression Models
Regression Computer Print Out
Regression Models - Introduction
Prediction of new observations
Least-Squares Regression
Introduction to Predictive Modeling
CSCI N317 Computation for Scientific Applications Unit Weka
1/18/2019 ST3131, Lecture 1.
MIS2502: Data Analytics Clustering and Segmentation
MIS2502: Data Analytics Clustering and Segmentation
Least-Squares Regression
STT : Intro. to Statistical Learning
Presentation transcript:

Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters

Introduction Tariff ? Professional use (Y/N) Postal code Age of the permit Kilowatt of the vehicle Age of the vehicle Vehicle type (4x4 Y/N) Policyholder’s Age Categorical variable Continuous variable Multi-Level Factor

GLMs remain a very important statistical regression technique for pricing car insurance products GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost GAM as a complementary modelling tool Introduction GLM = Generalized Linear Model GAM = Generalized Additive Model

AGENDA Binning continuous variables – GAM to explore nonlinear effects – GAM and regression trees for binning Modelling geographical information

GLM is satisfying modelling tool Industry-wide standard Only categorical variables Continuous variables High computational cost No parametric functional form Binning continuous variables GLM GAM

Binning continuous variables GAM to explore nonlinear effects

Often not desirable to keep the continuous effect in the tariff » GAM has a high computational cost (iterative method) » GAM lacks a parametric functional form GAMs provide insight in defining risk homogeneous groupings of variables

Binning continuous variables GAM for binning Results of the GAM as a starting point for binning – Broader categories where the risk is similar – More categories when the risk varies a lot Defining boundaries by means of regression trees

Binning continuous variables Regression tree Divide variables into groups based on GAM estimate Find splits that minimize overall sum of squared errors Grow tree with desired number of classes Figure: The black coloured nodes correspond to the regression tree used, the blue coloured nodes are the following splits, and the light blue nodes are the subsequent splits

Binning continuous variables Binning results Figure: Visualization of the classes suggested by the regression tree

AGENDA Binning continuous variables Geographical information – Modelling GLM without geographical information GAM with geographical information – Visualizing and binning

Geographical information Introduction

Latitude Longitude Bree: 51°07'08.8"N 5°38'32.5"E

Geographical information Step 1: GLM without geographical information

Predicted number of claims per district Observed number of claims per district

Geographical information Step 2: GAM with geographical information

Geographical information Visualizing and binning the geographic effect

Problematic issue – Different classification methods can yield dissimilar classes – Maps are very sensitive to the classification method used – Visualization of the same data can convey different impressions

Geographical information Visualizing and binning the geographic effect

Conclusion GLMs remain a very important statistical regression technique for pricing car insurance products. GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost. Care is needed when reading and interpreting choropleth maps – Different classification techniques produce different results. – Classification strongly affects the visual impressions readers obtain.