Personal Lines Actuarial Research Department Generalized Linear Models CAS - Boston Monday, November 11, 2002 Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA
Personal Lines Actuarial Research Department 2 High Level e.g.Eye Color Age Weight Coffee Size Given Characteristics: Predict Response: e.g. Probability someone takes Friday off, given it’s sunny and 70°+ e.g. Expected amount spent on lunch
Personal Lines Actuarial Research Department 3 Example – Personal Auto Log (Loss Cost) = Intercept + Driver + Car Age Size Factor i Factor j Driver AgeCar Size InterceptYoungOlderSmallMediumLarge e.g. Young Driver, Large Car Loss Cost = exp ( ) = $1,408 Parameters
Personal Lines Actuarial Research Department 4 Technical Bits Exponential families – gamma, poisson, normal, binomial Fit parameters via maximum likelihood Solve MLE by IRLS or Newton-Raphson Link Function (e.g. Log Loss Cost) 1-1 function Range Predicted Variable ( - , ) LN multiplicative model, id additive model logit binomial model (yes/no) g(E[Y]) = X + Different means, same scale
Personal Lines Actuarial Research Department 5 Why GLMS? Multivariate – adjusts for presence of other variables. No overlap. For non-normal data, GLMS better than OLS. Preprogrammed – easy to run, flexible model structures. Maximum likelihood allows testing importance of variables. Linear structure allows balance between amount of data and number of variables. Condense data – mean estimate unchanged, scale estimate changes.
Personal Lines Actuarial Research Department 6 Example – Personal Auto Property Damage Frequency Model N – Random number of claims – Average or Expected Value of N Model N ~ Poisson (mean = ) Log ( ) = Intercept + Age + Gender + Marital + Gender * Marital + Credit + SM + Year + * Accidents + log (exposure)
Personal Lines Actuarial Research Department 7 Model Output
Personal Lines Actuarial Research Department 8 Example – Personal Auto Property Damage Frequency How Use? Have N ~ Poisson ( ), depends on classification variables. Really want relative difference to a base class. Example Base Class 40-59, UM, NOHIT, S, 0 accidents All factors are 0 Don’t care about intercept, policy year, or exposure Base rate set for base class e.g. $100 To rate anyone else – factor X base rate E.g. 30, U, F, E06, S, 2 accidents Factor – exp( x.28) = 2.14 Rate = 2.14 x 100 = $214
Personal Lines Actuarial Research Department 9 Diagnostics 1.Actual vs Modeled on Training and Test data 2.P-values and confidence intervals 3.Actual vs Modeled on variables NOT used in model. 4.Graphs – Standardized deviance residuals vs linear predictor OR Q-Q Plot. 5.Leverage and influential points. 6.Likelihood ratio tests for entire variables. 7.50/50 modeling.
Personal Lines Actuarial Research Department 10 Personal Auto Class Plan Issues: Territories or other many level variables Deductibles and Limits Loss Development Trend Frequency, Severity or Pure Premium Exposure Model Selection – penalized likelihood an option
Personal Lines Actuarial Research Department 11 Software and References Software:SAS, GLIM, SPLUS, EMBLEM, Pretium GENSTAT, MATLAB, STATA, SPSS References:Part 9 paper bibliography Greg Taylor (Melbourne 1997) Stephen Mildenhall (1999) Hosmer and Lemeshow (2000) Farrokh Guiahi (June 2000) Karl P. Murphy (Winter 2000) Other:R “ Venables and Ripley (SPLUS)
Personal Lines Actuarial Research Department 12 R Code Example > Options(contrasts = c(“contr.treatment”, “contr.treatment”)) > pd.data_read.table(“c:\\kdh\\temp\\tree1000.dat”,header=F) > pd.data[1:3,] V1 V2 V3 V4 V5 V6 V7 V8 V A39 F M E02 M A39 F M E02 M A39 F M E02 M > model1_ glm(V2~V3+V4+V5+V4*V5+V6+V7+ as.factor(V8)+V9+offset(log(V1)), family=poisson(link=“log”),data=pd.data) > summary(model1)