Personal Lines Actuarial Research Department Generalized Linear Models CAS - Boston Monday, November 11, 2002 Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA.

Slides:



Advertisements
Similar presentations
1 General Iteration Algorithms by Luyang Fu, Ph. D., State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting LLP 2007 CAS.
Advertisements

Copula Regression By Rahul A. Parsa Drake University &
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Generalized Additive Models Keith D. Holler September 19, 2005 Keith D. Holler September 19, 2005.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
Logistic Regression Example: Horseshoe Crab Data
Graphs in HLM. Model setup, Run the analysis before graphing Sector = 0 public school Sector = 1 private school.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Considerations in P&C Pricing Segmentation February 25, 2015 Bob Weishaar, Ph.D., FCAS, MAAA.
An Introduction to Logistic Regression
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.
Generalized Linear Models
Review of Lecture Two Linear Regression Normal Equation
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper
Inference for regression - Simple linear regression
The maximum likelihood method Likelihood = probability that an observation is predicted by the specified model Plausible observations and plausible models.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Stephen Mildenhall September 2001
Travelers Analytics: U of M Stats 8053 Insurance Modeling Problem
Generalized Minimum Bias Models
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
CAS Seminar on Ratemaking Introduction to Ratemaking Relativities March 17-18, 2008 Royal Sonesta Hotel Boston, Mass. Presented by: Michael J. Miller,
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Practical GLM Modeling of Deductibles
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Linear Model. Formal Definition General Linear Model.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
1 Combining GLM and data mining techniques Greg Taylor Taylor Fry Consulting Actuaries University of Melbourne University of New South Wales Casualty Actuarial.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Institute for Mathematics and Its Applications
Negative Binomial Regression NASCAR Lead Changes
CAS Seminar on Ratemaking Introduction to Ratemaking Relativities (INT - 3) March 11, 2004 Wyndham Franklin Plaza Hotel Philadelphia, Pennsylvania Presented.
Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,
America CAS Seminar on Ratemaking March 2005 Presented by: Serhat Guven An Introduction to GLM Theory Refinements.
1999 CAS RATEMAKING SEMINAR PRODUCT DEVELOPMENT (MIS - 32) BETH FITZGERALD, FCAS, MAAA.
Logistic Regression. Linear Regression Purchases vs. Income.
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Generalized Linear Models (GLMs) and Their Applications.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
Personal Lines Actuarial Research Department Generalized Linear Models CAGNY Wednesday, November 28, 2001 Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA.
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Logistic Regression Analysis Gerrit Rooks
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free.
Logistic Regression and Odds Ratios Psych DeShon.
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.
BINARY LOGISTIC REGRESSION
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
Generalized Linear Models
Generalized Linear Models
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Introduction to Logistic Regression
Generalized Linear Models
Logistic Regression.
Presentation transcript:

Personal Lines Actuarial Research Department Generalized Linear Models CAS - Boston Monday, November 11, 2002 Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA

Personal Lines Actuarial Research Department 2 High Level e.g.Eye Color Age Weight Coffee Size Given Characteristics: Predict Response: e.g. Probability someone takes Friday off, given it’s sunny and 70°+ e.g. Expected amount spent on lunch

Personal Lines Actuarial Research Department 3 Example – Personal Auto Log (Loss Cost) = Intercept + Driver + Car Age Size Factor i Factor j Driver AgeCar Size InterceptYoungOlderSmallMediumLarge e.g. Young Driver, Large Car Loss Cost = exp ( ) = $1,408 Parameters

Personal Lines Actuarial Research Department 4 Technical Bits Exponential families – gamma, poisson, normal, binomial Fit parameters via maximum likelihood Solve MLE by IRLS or Newton-Raphson Link Function (e.g. Log Loss Cost) 1-1 function Range Predicted Variable  ( - ,  ) LN  multiplicative model, id  additive model logit  binomial model (yes/no) g(E[Y]) = X  +  Different means, same scale

Personal Lines Actuarial Research Department 5 Why GLMS? Multivariate – adjusts for presence of other variables. No overlap. For non-normal data, GLMS better than OLS. Preprogrammed – easy to run, flexible model structures. Maximum likelihood allows testing importance of variables. Linear structure allows balance between amount of data and number of variables. Condense data – mean estimate unchanged, scale estimate changes.

Personal Lines Actuarial Research Department 6 Example – Personal Auto Property Damage Frequency Model N – Random number of claims – Average or Expected Value of N Model N ~ Poisson (mean = ) Log ( ) = Intercept + Age + Gender + Marital + Gender * Marital + Credit + SM + Year + * Accidents + log (exposure)

Personal Lines Actuarial Research Department 7 Model Output

Personal Lines Actuarial Research Department 8 Example – Personal Auto Property Damage Frequency How Use? Have N ~ Poisson ( ), depends on classification variables. Really want relative difference to a base class. Example Base Class 40-59, UM, NOHIT, S, 0 accidents All factors are 0 Don’t care about intercept, policy year, or exposure Base rate set for base class e.g. $100 To rate anyone else – factor X base rate E.g. 30, U, F, E06, S, 2 accidents Factor – exp( x.28) = 2.14 Rate = 2.14 x 100 = $214

Personal Lines Actuarial Research Department 9 Diagnostics 1.Actual vs Modeled on Training and Test data 2.P-values and confidence intervals 3.Actual vs Modeled on variables NOT used in model. 4.Graphs – Standardized deviance residuals vs linear predictor OR Q-Q Plot. 5.Leverage and influential points. 6.Likelihood ratio tests for entire variables. 7.50/50 modeling.

Personal Lines Actuarial Research Department 10 Personal Auto Class Plan Issues: Territories or other many level variables Deductibles and Limits Loss Development Trend Frequency, Severity or Pure Premium Exposure Model Selection – penalized likelihood an option

Personal Lines Actuarial Research Department 11 Software and References Software:SAS, GLIM, SPLUS, EMBLEM, Pretium GENSTAT, MATLAB, STATA, SPSS References:Part 9 paper bibliography Greg Taylor (Melbourne 1997) Stephen Mildenhall (1999) Hosmer and Lemeshow (2000) Farrokh Guiahi (June 2000) Karl P. Murphy (Winter 2000) Other:R “ Venables and Ripley (SPLUS)

Personal Lines Actuarial Research Department 12 R Code Example > Options(contrasts = c(“contr.treatment”, “contr.treatment”)) > pd.data_read.table(“c:\\kdh\\temp\\tree1000.dat”,header=F) > pd.data[1:3,] V1 V2 V3 V4 V5 V6 V7 V8 V A39 F M E02 M A39 F M E02 M A39 F M E02 M > model1_ glm(V2~V3+V4+V5+V4*V5+V6+V7+ as.factor(V8)+V9+offset(log(V1)), family=poisson(link=“log”),data=pd.data) > summary(model1)