America CAS Seminar on Ratemaking March 2005 Presented by: Serhat Guven An Introduction to GLM Theory Refinements.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
Probability & Statistical Inference Lecture 9
The General Linear Model Or, What the Hell’s Going on During Estimation?
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
BA 555 Practical Business Analysis
Chapter 4 Multiple Regression.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions.
Log-linear and logistic models
Topic 3: Regression.
Linear and generalised linear models
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Business Statistics - QBM117 Statistical inference for regression.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Correlation and Regression Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.
Linear Regression/Correlation
Generalized Linear Models
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Review of Lecture Two Linear Regression Normal Equation
Objectives of Multiple Regression
A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Simple Linear Regression
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Practical GLM Modeling of Deductibles
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
© 2012 Towers Watson. All rights reserved. GLM II Basic Modeling Strategy 2012 CAS Ratemaking and Product Management Seminar by Len Llaguno March 20, 2012.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Question paper 1997.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
Logistic Regression Analysis Gerrit Rooks
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 9.
Principal Component Analysis
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Stats Methods at IC Lecture 3: Regression.
Estimating standard error using bootstrap
BINARY LOGISTIC REGRESSION
Generalized Linear Models
CJT 765: Structural Equation Modeling
Undergraduated Econometrics
Generalized Linear Models
Generalized Additive Model
Multiple Regression Berlin Chen
Presentation transcript:

America CAS Seminar on Ratemaking March 2005 Presented by: Serhat Guven An Introduction to GLM Theory Refinements

America 2 An Introduction to GLM Theory OUTLINE Background GLM Building Blocks ­ Link Function ­ Error Distribution ­ Model Structure Diagnostics Summary PURPOSE: To discuss techniques that refine the structural design, including proper tools and diagnostics, of the GLM thereby allowing for a more flexible model Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 3 Purpose of Predictive Modeling To predict a response variable using a series of explanatory variables (or rating factors). Traditional methods focus on the parameters, modeling requires the analyst to consider the validation of the parameters. Dependent/Response Losses Claims Retention Weights Claims Exposures Premium Statistical Model Model Results Parameters Validation Statistics Independent/Predictors AgeAccidents LimitConvictions TerritoryCredit Score Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 4 Purpose of Predictive Modeling To produce a sensible model that explains recent historical experience and is likely to be predictive of future experience. Strong predictive power yet very poor explanatory power Good predictor of previous experience but poor predictor of future experience Overall Mean “Best” Model 1 parameter for each observation Model Complexity (Number of Parameters) Traditional methods tend to create unnecessarily complex structures that tend to overfit the data. Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 5 GLMs generalize the traditional regression models by introducing nonlinearity through the link function and loosening the normality assumption Generalized Linear Models y = h(Linear Combination of Rating Factors) + Error g=h -1 is called the LINK function and is chosen to measure the “signal” most accurately Error should reflect underlying process and can come from the exponential family Response Variable Systematic Component Random Component = + Linear combination of rating factors is the model structure SignalNoise Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 6 Generalized Linear Models More formally: Response Variable Systematic Component Random Component = + Where: And: Link function ( g=h -1 ) Links random and systematic component. Design Matrix Identifies predictor variables for each observation. Parameters Quantities estimated via log likelihood function. Offset Term Allows incorporation of known effects or restrictions. Scale Parameter Variance Function Prior Weights Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 7 Generalized Linear Models The general solution for the GLM parameters: Where: and: Link function Error Distribution Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 8 GLMs generalize the traditional regression models by introducing nonlinearity through the link function and loosening the normality assumption Components of a GLM y = h(Linear Combination of Rating Factors) + Error Response Variable Systematic Component Random Component = + SignalNoise Building Blocks Basic Building Blocks ­ Link Function ­ Error Structure ­ Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 9 GLM Building Blocks: Link Functions Link function (g=h -1 ) chosen to based on how the factors are related to produce the best signal: - Log: variables related multiplicatively (e.g., risk modeling) - Identity:variables related additively (e.g., risk modeling) - Logit:retention or risk modeling - Reciprocal: canonical link for gamma distribution (e.g., severity modeling) - Mixed:additive/multiplicative rating algorithms y = h(Linear Combination of Rating Factors) + Error Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 10 Example: Log Link The rating structure is multiplicative and the premium for a youthful policyholder living in Area C is: $1,955=$1,000 x x Policyholder Age (p) Relativity Youthful Adult Mature Seniors Rating Area (r) Relativity ABCDEABCDE Base Premium = 1,000 The signal allows us to populate rating tables: Premium = Base Premium x Policyholder Age x Rating Area =exp(.531) =exp(0.000) =exp(-0.223) =exp(0.095) =exp(-0.105) =exp(0.000) =exp(0.140) =exp(0.262) =exp(0.405) =exp(6.908) = exp(6.908) * exp(0.531) * exp(0.140) = exp( ) = exp(b+p+r) h(linear combination of rating factors) Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 11 Link function relates the independent predictors to the response in a non linear form : ­ Pure Multiplicative – Log - Pure Additive - Identity - Logit - Reciprocal GLM Building Blocks: Link Functions Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 12 Mixed Rating Algorithms Mixed additive – multiplicative rating algorithm: Base Rate*(Age+Gender+Usage)*Driving Record*Territory*Limit Mixture model structures do not fit within the framework of garden variety GLMs Solutions: - Create n way tables from the pure multiplicative models - Build hierarchal GLM to systematically estimate the additive component from the multiplicative factors - Remove the restriction on the link function to calculate parameters directly Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 13 Generalized Nonlinear Models Examples models that do not fit into traditional GLMs: ­ Mixed Additive Multiplicative Model ­ Alternative Mixtures ­ Complicating the logit functions Alternate link functions allow the ability to introduce additional nonlinearities into the solution. Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 14 Solution for Non-Linear Models Use identity link function Replace design matrix X in GLM solution with D ij th element of D is derivative of  with regard to the j th parameter for the i th observation Sort of equivalent to solving GLM with different design matrix. However, there are two “linear predictors” in use Where “…the method is likely to be most useful for determining if a reasonable fit can be improved, rather than for the somewhat more optimistic goal of correcting a hopeless situation.” -Pregibon (1980) Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 15 GLM Building Blocks: Error Structure Reflects the variability of the underlying process - Gamma consistent with severity modeling, may want to try Inverse Gaussian - Poisson consistent with frequency modeling - Tweedie consistent with pure premium modeling - Normal useful for a variety of applications y = h(Linear Combination of Rating Factors) + Error Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 16 Error Structure: Variance Function Observed ResponseError Structure Variance Function V(  ) Scale Parameter  Normalµ0µ0  Claim FrequencyPoissonµ1 Claim SeverityGammaµ2µ2  Risk PremiumGamma or TweedieµTµT µTµT New/Renewal RateBinomialµ (1-µ)1 Error structure is also used to incorporate assumptions about uncertainty and the prediction Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 17 Example: Binomial Distribution Binomial ­ Basic functional form in decision modeling ­ Belongs to the exponential family of distributions ­ Can be extended to multinomial distributions Extreme probabilities of success/failure related to low variability Higher variability associated with less certain probability outcomes Variance Function =  (1-  ) Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 18 Additional Variance Functions Observed ResponseError Structure Variance Function V(  ) Scale Parameter  Normalµ0µ0  Claim FrequencyPoissonµ1 Claim SeverityGammaµ2µ2  Risk PremiumGamma or TweedieµTµT µTµT New/Renewal RateBinomialµ (1-µ)1 Claim Frequency Over-dispersed Poisson µK Claim SeverityInverse Gaussianµ3µ3  Error structure is also used to incorporate assumptions about the uncertainty and the predicted value Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 19 Heterogeneous exposure bases ­ If model together, exposures with high variability may mask patterns of less random risks ­ If loss trends vary by exposure class, the proportion each represents of the total will change and may mask important trends ­ Independent predictors can have different effects on different perils If cannot split data, use joint modeling techniques to improve overall fit Error Structure: Scale Parameter Cluster of negative residuals caused by combination of two types of risks Joint modeling techniques improved the residual plot Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 20 GLM Building Blocks: Model Structure Include variables that are predictive, exclude those that are not - Gender may not have major impact on theft severity Simplify some rating factors, if full inclusion not necessary - Some levels within a particular predictor may be grouped together (e.g., year olds) - A curve may replicate the signal (e.g., amount of insurance) - Scoring levels to combine rating factors into a single concept thereby untangling impacts of various factors Complicate model if the relationship between levels of one variable depends on another characteristic - The difference between males and females depends on age y = h(Linear Combination of Rating Factors) + Error Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 21 Complicating the Model: Interactions Interactions are required when the combined effect of multiple rating levels of two different independent rating factors is different than the additive effect of the simple parameters. Interaction topics - Interactions versus correlations - Identifying interactions - Full and partial interactions - Simplifying interactions Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 22 Interactions versus Correlations MaleFemaleTotals Youthful w YM w YF w Y. Adult w AM w AF w A. Mature w MM w MF w M. Seniors w SM w SF w S. Totals w.M w.F w.. Distributional Correlations: Observed Weights Let: Assumption of distributional independence Testing the assumption – Cramer’s V scales this statistic from (-1,+1) A simple GLM model addresses distributional correlations. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 23 Interactions versus Correlations MaleFemale Youthful y YM y YF Adult y AM y AF Mature y MM y MF Seniors y SM y SF Interactions: Observed Responses Assumption of simple model adequacy. Testing the assumption- Chi Squared Test scales this statistic from (0,1) Let: A more complex GLM model is required to handles such interactions. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 24 Identifying Interactions Decomposing the Interaction for simplification - graphically - Chi-squared test - Signs test Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 25 Identifying Interactions Understanding complex relationships between multiple rating factors. - Can view interactions on an “absolute” scale … - … or on a rescaled basis. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 26 Identifying Interactions Consistency of Interaction over time. - Three way interaction with time … Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 27 Parameter Notation: Simple Model Relationship between rating levels of one rating factor is constant for all levels of other rating variables. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female Simple Model: Age + Gender 5 Parameters MaleFemale Youthful β 0 + β Y β 0 + β Y + β F Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F Seniors β 0 + β S β 0 + β S + β F Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 28 Simple Factor Model Simple Model: Age + Gender Relationship between males and females is a constant exp(β F ) at each age. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 29 Parameter Notation: Full Interaction MaleFemale Youthful β 0 + β Y β 0 + β Y + β F Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F Seniors β 0 + β S β 0 + β S + β F MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Relationship between rating levels of one rating factor is different for all levels of another rating variable. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female No interaction: Age + Gender Interaction: Age + Gender+Age.Gender # Parameters: 5 # Parameters: 8 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 30 Full Interaction Factor Model Full Interaction Model: Age + Gender + Age.Gender Relationship between males and females is a different at each age. Simple Model: Age + Gender Relationship between males and females is a constant exp(β F ) at each age. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 31 Parameter Notation: Partial Interaction MaleFemale Youthful β 0 + β Y β 0 + β Y + β YF Adult β0β0 β0β0 Mature β 0 + β M β 0 + β M + β MF Seniors β 0 + β S β 0 + β S + β SF Relationship between rating levels of one rating factor is different for all levels of another rating variable, except there is no difference at the base level of the other rating variable. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female Partial Interaction: Age + Age.Gender MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Interaction: Age + Gender + Age.Gender # Parameters: 8 # Parameters: 7 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 32 Partial Interaction Factor Model Partial Interaction Model: Age + Age.Gender Interaction adjusts for removal of the simple gender term, except at the base level (36-40). Full Interaction Model: Age + Gender + Age.Gender Relationship between males and females is a different at each age. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 33 Parameter Notation: Partial Interaction MaleFemale Youthful β0β0 β 0 + β F + β YF Adult β0β0 β 0 + β F Mature β0β0 β 0 + β F + β MF Seniors β0β0 β 0 + β F + β SF Relationship between rating levels of one rating factor is different for all levels of another rating variable, except there is no difference at the base level of the other rating variable. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female Partial Interaction: Gender + Age.Gender MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Interaction: Age + Gender + Age.Gender # Parameters: 8 # Parameters: 5 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 34 Partial Interaction Factor Model Partial Interaction Model: Gender + Age.Gender Interaction adjusts for removal of the simple age term, except at the base level (male). Full Interaction Model: Age + Gender + Age.Gender Relationship between males and females is a different at each age. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 35 Parameter Notation: Interaction Only MaleFemale Youthful β0β0 β 0 + β YF Adult β0β0 β0β0 Mature β0β0 β 0 + β MF Seniors β0β0 β 0 + β SF Only variation allowed is at non-base levels. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female Interaction Only: Age.Gender MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Interaction: Age + Gender + Age.Gender # Parameters: 8 # Parameters: 4 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 36 Interaction Only Factor Model Interaction adjusts for removal of the simple terms, except at the base levels (male and 36-40). Interaction adjusts for removal of the simple age term, except at the base level (male). Partial Interaction Model: Gender + Age.Gender Interaction Only Model: Age.Gender Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 37 Parameter Notation: Summaries MaleFemale Youthful β0β0 β 0 + β YF Adult β0β0 β0β0 Mature β0β0 β 0 + β MF Seniors β0β0 β 0 + β SF Interaction Only: Age.Gender MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Interaction: Age + Gender + Age.Gender # Parameters: 8 # Parameters: 4 MaleFemale Youthful β0β0 β 0 + β F + β YF Adult β0β0 β 0 + β F Mature β0β0 β 0 + β F + β MF Seniors β0β0 β 0 + β F + β SF Partial Interaction: Gender + Age.Gender # Parameters: 5 MaleFemale Youthful β 0 + β Y β 0 + β Y + β YF Adult β0β0 β0β0 Mature β 0 + β M β 0 + β M + β MF Seniors β 0 + β S β 0 + β S + β SF Partial Interaction: Age + Age.Gender # Parameters: 7 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 38 Simplifying Interactions Complex relationships can be simplified using curves, groups, etc. 3 rd Degree Curve 4th Degree Curve Ages Grouped Males same as Females Male/Female Relativity same for Age curve simplified with several curves and a grouping. - Relationship between males and females simplified too. Male/Female Relativity varies by Age Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics  Definitions Definitions  Parameterization Parameterization  Identification Identification  Simplification Simplification

America 39 Testing Assumptions: Macro Residual Analysis - Asymmetrical appearance suggests power of variance function is too low Plot of all residuals tests selected error structure/link function - Elliptical pattern is ideal - Two concentrations suggests two perils: split of use joint modeling - Use crunched residuals for frequency Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 40 Testing Assumptions: Micro Residual Analysis SmallLarge GoodOK PoorOKProblem Examine largest residuals… Influence Fit - Standardized deviance gives a measure of “fit” (performance) - Cook’s deviance gives a measure of “influence” “Problem” points may require further investigation Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 41 Testing Predictiveness: Sampling Training and Testing Data Training Data Test Data 80% 20% Model Structure and Parameters Build Test OK Not OK Done Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 42 Testing Predictiveness: Bootstrapping Bootstrapping Data Trainin g Data Test Data 80% 20% Model Structure Build Test Model Parameters Done OK Not OK Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 43 Testing Predictiveness: Gains Curve Gains Curves ­ Order observations by fitted values (descending). ­ Plot cumulative fitted against cumulative weight. ­ High fitted values should correspond to high observed values. ­ Gini Coefficient : The larger coefficient implies greater predictiveness. Fitted Number of Claims Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America 44 Summary Predictive Models corrects for methodological flaws associated with traditional approaches - Excludes unsystematic effect - Corrects distributional bias - Identifies and models response correlation GLM Building Blocks can be refined to find the signals for stochastic processes ­ Link Functions can be adjusted to create non linear solutions ­ Error Distributions are expanded to understand the relationship between the uncertainty and prediction ­ Model Structures are flexible to reflect underlying process GLM diagnostics aid to better decision making abilities Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics

America Questions?