Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.

Slides:



Advertisements
Similar presentations
DATA & STATISTICS 101 Presented by Stu Nagourney NJDEP, OQA.
Advertisements

Properties of Least Squares Regression Coefficients
1 General Iteration Algorithms by Luyang Fu, Ph. D., State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting LLP 2007 CAS.
Multiple Regression Analysis
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Part V The Generalized Linear Model Chapter 16 Introduction.
Use Macro to Enter TP Logo March 11-12, 1999, Nashville, Tennessee Mark Scully, Tillinghast-Towers Perrin The Use of Multivariate Analysis Techniques to.
An Introduction to Stochastic Reserve Analysis Gerald Kirschner, FCAS, MAAA Deloitte Consulting Casualty Loss Reserve Seminar September 2004.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Generalised linear models
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Log-linear and logistic models
Generalised linear models Generalised linear model Exponential family Example: logistic model - Binomial distribution Deviances R commands for generalised.
Chapter 8 Estimation: Single Population
Generalised linear models Generalised linear model Exponential family Example: Log-linear model - Poisson distribution Example: logistic model- Binomial.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Topic 3: Regression.
Linear and generalised linear models
Ch. 14: The Multiple Regression Model building
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Lecture 5 Correlation and Regression
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper
Inference for regression - Simple linear regression
Simple Linear Regression
Stephen Mildenhall September 2001
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Generalized Minimum Bias Models
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Linear Model. Formal Definition General Linear Model.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Byron Gangnes Econ 427 lecture 3 slides. Byron Gangnes A scatterplot.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Reserve Variability – Session II: Who Is Doing What? Mark R. Shapland, FCAS, ASA, MAAA Casualty Actuarial Society Spring Meeting San Juan, Puerto Rico.
COTOR Training Session II GL Data: Long Tails, Volatility, Data Transforms September 11, 2006.
Correlation & Regression Analysis
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Remembering way back: Generalized Linear Models Ordinary linear regression What if we want to model a response that is not Gaussian?? We may have experiments.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Confidence Intervals Cont.
Chapter 11: Linear Regression and Correlation
Regression Analysis AGEC 784.
Simple Linear Regression
Statistics II: An Overview of Statistics
Product moment correlation
Regression Assumptions
Generalized Linear Models
Generalized Additive Model
Regression Assumptions
Presentation transcript:

Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado Springs, Colorado May 18, 2004

2 Session Outline Introduction Distribution Assumptions Simulation Method Simulation Results Conclusions

3 Introduction Common characteristics of loss distributions Typical GLM forms in actuarial practice Lognormal and Gamma are most widely-used distributions in size of loss (severity) analysis Lognormal or Gamma?

4 Distribution Characteristics of Insurance Losses Non-negative Positively skewed Variance is positively correlated with mean. Normal is not appropriate: negative, symmetric, constant variance

5 Advantages of GLMs Exponential Distribution Selections: Poisson, Gamma, Binomial, Inverse Gaussian, Negative Binomial, etc. Lognormal is not in exponential family. Link Function Selections: Identity, Log, Logit, Power, Probit, etc.

6 Typical GLM Forms in Actuarial Practice Severity: Log link, Gamma Distribution Frequency: Log link, Poisson Distribution Retention (Renewal): Logit link, Binomial Distribution

7 Gamma or Lognormal? Gamma and lognormal are the two most popular selections of loss distributions On CAS website ( we found 31 papers by searching “Lognormal” and 37 papers by searching “Gamma”

8 Lognormal Is One of Most Widely-Used Loss Distributions Proceedings of the Casualty Actuarial Society Ratemaking and Reinsurance Wacek, Michael G.(1997) Bear, Robert A.; Nemlick, Kenneth J. (1990) Hayne, Roger M. (1985) Mack, Thomas (1984) Ter Berg, Peter (1980) Benckert, Lars-Gunnar (1962)

9 Lognormal Is One of Most Widely-Used Loss Distributions Proceedings of the Casualty Actuarial Society Reserving and Reinsurance Kreps, Rodney E. (1997) Ramsay, Colin M.; Usabel, Miguel A. (1997) Doray, Louis G. (1996) Levi, Charles; Partratm, Christian (1991) Hertig, Joakim (1985)

10 Lognormal Is One of Most Widely-Used Loss Distributions In actuarial practice Increased Limit Factors Excess of Loss Calculations Weather Load Quantile Loss Reserve Variability

11 Gamma or Lognormal? Desirable Features of Gamma and Lognormal Distributions: 1. Non-negative 2. Positively skewed 3. Variance is proportional to the mean- squared (Constant Coefficient of Variation)

12 Gamma or Lognormal? Advantages of Lognormal: Easy to understand (related to normal distribution) Consistent with other actuarial procedures, such as increased limits ratemaking Fits data with large skewness well Disadvantage of Lognormal: Not in exponential family, and GLM coefficients need volatility adjustment

13 Gamma or Lognormal? Under what conditions are the severity distribution assumptions important? If severity distribution is unknown, which distribution yields most accurate and stable results (i.e., minimized estimation bias and standard error)?

14 Classical Distribution Assumptions Normal Constant Variance Gamma Constant Coefficient of Variation

15 Classical Distribution Assumptions Lognormal Constant Coefficient of Variation

16 Does Normal Necessarily Imply Constant Variance? Normal Constant Coefficient of Variation: Variance function is like Gamma Normal Variance proportional to mean: Variance function is like Poisson

17 Does Gamma Necessarily Imply Constant Coefficient of Variation? Gamma Variance is proportional to mean: Variance function is like Poisson.

18 Distribution Assumptions One of two parameters is constant Which one is selected as constant should be based on data Classical assumptions are most-widely used distribution forms, and generally fit data better Can we assume none of them are constant? Yes, but it will increase the number of parameters and reduce the degrees of freedom

19 Why Simulation? The distributions of GLM coefficients and predicted values are unknown in the case of small samples Statistical analysis based on asymptotic distributions is not reliable In an individual regression, we don’t know if the difference between predicted value and observed value is from random variation or systematic bias

20 Simulation Assumptions 32 Severity Observations for Two Class Variables 8 Age Groups 4 Vehicle-Use Groups Data Source: Private Passenger Auto Collision used in Mildenhall (1999) and McCullagh and Nelder (1989)

21 Simulation Assumptions Individual Losses Have Constant Coefficient of Variation Multiplicative Relationship Between Severities and Rating Variables Known “True” Base Severities & Relativities Known CVs for the Severity Distribution

22 Simulation Procedures 1.Generate individual losses based on lognormal and gamma distributions and calculate 32 claim severities 2.Fit three regressions: GLM with Gamma, GLM with Normal, and GLM with log- transformed severity 3.Repeat Steps 1-2 one thousand times, and generate sampling distributions of GLM coefficients and predicted values

23 Performance Measurements Weighted Absolute Bias, which measures the systematic bias (accuracy): Weighted Standard Error, which measures random variation (stability):

24 Adjustments for Log-Transformed Regressions GLMs with Gamma and Normal Log-transformed Regression is called the “Volatility Adjustment Factor”

25 Simulation Results Data Generated Regression Results Residual Diagnostics

26 Data Generated Reporting on Two Different Classes: Classification I - Age and Pleasure Use, with 21 observations. Classification II - Age and Short Drive to Work, with 970 observations.

27 Data Generated: Gamma Severity for Age and Pleasure Use with Coefficient of Variation 3.0

28 Data Generated: Gamma Severity for Age and DTW Short Use with Coefficient of Variation 3.0

29 Data Generated: Lognormal Severity for Age and Pleasure Use with Coefficient of Variation 3.0

30 Data Generated: Lognormal Severity for Age and DTW Short Use with Coefficient of Variation 3.0

31 Regression Results Overall Unbiasedness and Stability of Predicted Severities for Gamma Loss CVwabwse G-GG-LG-NG-GG-LG-N

32 Regression Results Overall Unbiasedness and Stability of Predicted Severities for Lognormal Loss CVwabwse L-GL-LL-NL-GL-LL-N

33 Regression Results: Predicted Severities for Gamma Loss with Coefficient of Variation 3.0 for Age and Pleasure Use

34 Regression Results: Predicted Severities for Gamma Loss with Coefficient of Variation 3.0 for Age and DTW Short Use

35 Regression Results: Predicted Severities for Lognormal Loss with Coefficient of Variation 3.0 for Age and Pleasure Use

36 Regression Results: Predicted Severities for Lognormal Loss with Coefficient of Variation 3.0 for Age and DTW Short Use

37 Residual Diagnostics: Standardized Residuals for Gamma Loss with Coefficient of Variation 3.0

38 Residual Diagnostics: Predicted Severities vs Standardized Residuals for Gamma Loss with Coefficient of Variation 3.0

39 Residual Diagnostics: Standardized Residuals for Lognormal Loss with Coefficient of Variation 3.0

40 Residual Diagnostics: Predicted Severities vs Standardized Residuals for Lognormal Loss with Coefficient of Variation 3.0

41 Residual Diagnostics: Standardized Residuals for Gamma Loss with Coefficient of Variation 1.0 Based on Individual Data

42 Residual Diagnostics: Predicted Severities vs Standardized Residuals for Gamma Loss with Coefficient of Variation 1.0 Based on Individual Data

43 Residual Diagnostics: Standardized Residuals for Lognormal Loss with Coefficient of Variation 1.0 Based on Individual Data

44 Residual Diagnostics: Predicted Severities vs Standardized Residuals for Lognormal Loss with Coefficient of Variation 1.0 Based on Individual Data

45 Conclusions When the gamma distribution is “true”, the G-G model is dominant in both unbiasedness and stability (except the G-L model is slightly more stable in the case of large volatility).

46 Conclusions When the lognormal distribution is “true”, the L-L model is dominant in terms of stability.

47 Conclusions GLMs with a normal distribution never dominate based on any criteria, and they have the worst weighted standard error.

48 Conclusions GLMs with a gamma distribution are dominant in terms of unbiasedness, no matter whether the “true” distribution is gamma or lognormal.

49 Conclusions In general, GLMs with a gamma distribution are recommended because they perform slightly better than the log-transformed model.

50 Conclusions When the data is not volatile, the distribution selection for GLMs may not be as important because all distribution assumptions yield small biases and standard errors.

51 Conclusions When the data is very volatile, the log-transformed regression is recommended because it provides the most stable estimation.

52 Conclusions When the log-transformed model is used, the classification relativities should be adjusted by a volatility-adjustment factor. Without the adjustment, the relativities could be undervalued.

53 Conclusions Residual plots may work well to examine the distribution assumptions on individual data, but not necessarily on summarized/average data.

54 Questions & Answers Questions? Thank You!