Download presentation
Presentation is loading. Please wait.
Published byRoderick Mosley Modified over 9 years ago
1
America CAS Seminar on Ratemaking March 2005 Presented by: Serhat Guven An Introduction to GLM Theory Refinements
2
America 2 An Introduction to GLM Theory OUTLINE Background GLM Building Blocks Link Function Error Distribution Model Structure Diagnostics Summary PURPOSE: To discuss techniques that refine the structural design, including proper tools and diagnostics, of the GLM thereby allowing for a more flexible model Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
3
America 3 Purpose of Predictive Modeling To predict a response variable using a series of explanatory variables (or rating factors). Traditional methods focus on the parameters, modeling requires the analyst to consider the validation of the parameters. Dependent/Response Losses Claims Retention Weights Claims Exposures Premium Statistical Model Model Results Parameters Validation Statistics Independent/Predictors AgeAccidents LimitConvictions TerritoryCredit Score Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
4
America 4 Purpose of Predictive Modeling To produce a sensible model that explains recent historical experience and is likely to be predictive of future experience. Strong predictive power yet very poor explanatory power Good predictor of previous experience but poor predictor of future experience Overall Mean “Best” Model 1 parameter for each observation Model Complexity (Number of Parameters) Traditional methods tend to create unnecessarily complex structures that tend to overfit the data. Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
5
America 5 GLMs generalize the traditional regression models by introducing nonlinearity through the link function and loosening the normality assumption Generalized Linear Models y = h(Linear Combination of Rating Factors) + Error g=h -1 is called the LINK function and is chosen to measure the “signal” most accurately Error should reflect underlying process and can come from the exponential family Response Variable Systematic Component Random Component = + Linear combination of rating factors is the model structure SignalNoise Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
6
America 6 Generalized Linear Models More formally: Response Variable Systematic Component Random Component = + Where: And: Link function ( g=h -1 ) Links random and systematic component. Design Matrix Identifies predictor variables for each observation. Parameters Quantities estimated via log likelihood function. Offset Term Allows incorporation of known effects or restrictions. Scale Parameter Variance Function Prior Weights Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
7
America 7 Generalized Linear Models The general solution for the GLM parameters: Where: and: Link function Error Distribution Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
8
America 8 GLMs generalize the traditional regression models by introducing nonlinearity through the link function and loosening the normality assumption Components of a GLM y = h(Linear Combination of Rating Factors) + Error Response Variable Systematic Component Random Component = + SignalNoise Building Blocks Basic Building Blocks Link Function Error Structure Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
9
America 9 GLM Building Blocks: Link Functions Link function (g=h -1 ) chosen to based on how the factors are related to produce the best signal: - Log: variables related multiplicatively (e.g., risk modeling) - Identity:variables related additively (e.g., risk modeling) - Logit:retention or risk modeling - Reciprocal: canonical link for gamma distribution (e.g., severity modeling) - Mixed:additive/multiplicative rating algorithms y = h(Linear Combination of Rating Factors) + Error Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
10
America 10 Example: Log Link The rating structure is multiplicative and the premium for a youthful policyholder living in Area C is: $1,955=$1,000 x 1.700 x 1.150 Policyholder Age (p) Relativity Youthful Adult Mature Seniors 1.700 1.000 0.800 1.100 Rating Area (r) Relativity ABCDEABCDE 0.900 1.000 1.150 1.300 1.500 Base Premium = 1,000 The signal allows us to populate rating tables: Premium = Base Premium x Policyholder Age x Rating Area =exp(.531) =exp(0.000) =exp(-0.223) =exp(0.095) =exp(-0.105) =exp(0.000) =exp(0.140) =exp(0.262) =exp(0.405) =exp(6.908) = exp(6.908) * exp(0.531) * exp(0.140) = exp(6.908 + 0.531 + 0.140) = exp(b+p+r) h(linear combination of rating factors) Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
11
America 11 Link function relates the independent predictors to the response in a non linear form : Pure Multiplicative – Log - Pure Additive - Identity - Logit - Reciprocal GLM Building Blocks: Link Functions Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
12
America 12 Mixed Rating Algorithms Mixed additive – multiplicative rating algorithm: Base Rate*(Age+Gender+Usage)*Driving Record*Territory*Limit Mixture model structures do not fit within the framework of garden variety GLMs Solutions: - Create n way tables from the pure multiplicative models - Build hierarchal GLM to systematically estimate the additive component from the multiplicative factors - Remove the restriction on the link function to calculate parameters directly Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
13
America 13 Generalized Nonlinear Models Examples models that do not fit into traditional GLMs: Mixed Additive Multiplicative Model Alternative Mixtures Complicating the logit functions Alternate link functions allow the ability to introduce additional nonlinearities into the solution. Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
14
America 14 Solution for Non-Linear Models Use identity link function Replace design matrix X in GLM solution with D ij th element of D is derivative of with regard to the j th parameter for the i th observation Sort of equivalent to solving GLM with different design matrix. However, there are two “linear predictors” in use Where “…the method is likely to be most useful for determining if a reasonable fit can be improved, rather than for the somewhat more optimistic goal of correcting a hopeless situation.” -Pregibon (1980) Link Functions Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
15
America 15 GLM Building Blocks: Error Structure Reflects the variability of the underlying process - Gamma consistent with severity modeling, may want to try Inverse Gaussian - Poisson consistent with frequency modeling - Tweedie consistent with pure premium modeling - Normal useful for a variety of applications y = h(Linear Combination of Rating Factors) + Error Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
16
America 16 Error Structure: Variance Function Observed ResponseError Structure Variance Function V( ) Scale Parameter Normalµ0µ0 Claim FrequencyPoissonµ1 Claim SeverityGammaµ2µ2 Risk PremiumGamma or TweedieµTµT µTµT New/Renewal RateBinomialµ (1-µ)1 Error structure is also used to incorporate assumptions about uncertainty and the prediction Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
17
America 17 Example: Binomial Distribution Binomial Basic functional form in decision modeling Belongs to the exponential family of distributions Can be extended to multinomial distributions Extreme probabilities of success/failure related to low variability Higher variability associated with less certain probability outcomes Variance Function = (1- ) Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
18
America 18 Additional Variance Functions Observed ResponseError Structure Variance Function V( ) Scale Parameter Normalµ0µ0 Claim FrequencyPoissonµ1 Claim SeverityGammaµ2µ2 Risk PremiumGamma or TweedieµTµT µTµT New/Renewal RateBinomialµ (1-µ)1 Claim Frequency Over-dispersed Poisson µK Claim SeverityInverse Gaussianµ3µ3 Error structure is also used to incorporate assumptions about the uncertainty and the predicted value Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
19
America 19 Heterogeneous exposure bases If model together, exposures with high variability may mask patterns of less random risks If loss trends vary by exposure class, the proportion each represents of the total will change and may mask important trends Independent predictors can have different effects on different perils If cannot split data, use joint modeling techniques to improve overall fit Error Structure: Scale Parameter Cluster of negative residuals caused by combination of two types of risks Joint modeling techniques improved the residual plot Error Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
20
America 20 GLM Building Blocks: Model Structure Include variables that are predictive, exclude those that are not - Gender may not have major impact on theft severity Simplify some rating factors, if full inclusion not necessary - Some levels within a particular predictor may be grouped together (e.g., 50-54 year olds) - A curve may replicate the signal (e.g., amount of insurance) - Scoring levels to combine rating factors into a single concept thereby untangling impacts of various factors Complicate model if the relationship between levels of one variable depends on another characteristic - The difference between males and females depends on age y = h(Linear Combination of Rating Factors) + Error Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
21
America 21 Complicating the Model: Interactions Interactions are required when the combined effect of multiple rating levels of two different independent rating factors is different than the additive effect of the simple parameters. Interaction topics - Interactions versus correlations - Identifying interactions - Full and partial interactions - Simplifying interactions Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
22
America 22 Interactions versus Correlations MaleFemaleTotals Youthful w YM w YF w Y. Adult w AM w AF w A. Mature w MM w MF w M. Seniors w SM w SF w S. Totals w.M w.F w.. Distributional Correlations: Observed Weights Let: Assumption of distributional independence Testing the assumption – Cramer’s V scales this statistic from (-1,+1) A simple GLM model addresses distributional correlations. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
23
America 23 Interactions versus Correlations MaleFemale Youthful y YM y YF Adult y AM y AF Mature y MM y MF Seniors y SM y SF Interactions: Observed Responses Assumption of simple model adequacy. Testing the assumption- Chi Squared Test scales this statistic from (0,1) Let: A more complex GLM model is required to handles such interactions. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
24
America 24 Identifying Interactions Decomposing the Interaction for simplification - graphically - Chi-squared test - Signs test Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
25
America 25 Identifying Interactions Understanding complex relationships between multiple rating factors. - Can view interactions on an “absolute” scale … - … or on a rescaled basis. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
26
America 26 Identifying Interactions Consistency of Interaction over time. - Three way interaction with time … Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
27
America 27 Parameter Notation: Simple Model Relationship between rating levels of one rating factor is constant for all levels of other rating variables. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female Simple Model: Age + Gender 5 Parameters MaleFemale Youthful β 0 + β Y β 0 + β Y + β F Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F Seniors β 0 + β S β 0 + β S + β F Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
28
America 28 Simple Factor Model Simple Model: Age + Gender Relationship between males and females is a constant exp(β F ) at each age. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
29
America 29 Parameter Notation: Full Interaction MaleFemale Youthful β 0 + β Y β 0 + β Y + β F Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F Seniors β 0 + β S β 0 + β S + β F MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Relationship between rating levels of one rating factor is different for all levels of another rating variable. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female No interaction: Age + Gender Interaction: Age + Gender+Age.Gender # Parameters: 5 # Parameters: 8 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
30
America 30 Full Interaction Factor Model Full Interaction Model: Age + Gender + Age.Gender Relationship between males and females is a different at each age. Simple Model: Age + Gender Relationship between males and females is a constant exp(β F ) at each age. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
31
America 31 Parameter Notation: Partial Interaction MaleFemale Youthful β 0 + β Y β 0 + β Y + β YF Adult β0β0 β0β0 Mature β 0 + β M β 0 + β M + β MF Seniors β 0 + β S β 0 + β S + β SF Relationship between rating levels of one rating factor is different for all levels of another rating variable, except there is no difference at the base level of the other rating variable. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female Partial Interaction: Age + Age.Gender MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Interaction: Age + Gender + Age.Gender # Parameters: 8 # Parameters: 7 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
32
America 32 Partial Interaction Factor Model Partial Interaction Model: Age + Age.Gender Interaction adjusts for removal of the simple gender term, except at the base level (36-40). Full Interaction Model: Age + Gender + Age.Gender Relationship between males and females is a different at each age. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
33
America 33 Parameter Notation: Partial Interaction MaleFemale Youthful β0β0 β 0 + β F + β YF Adult β0β0 β 0 + β F Mature β0β0 β 0 + β F + β MF Seniors β0β0 β 0 + β F + β SF Relationship between rating levels of one rating factor is different for all levels of another rating variable, except there is no difference at the base level of the other rating variable. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female Partial Interaction: Gender + Age.Gender MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Interaction: Age + Gender + Age.Gender # Parameters: 8 # Parameters: 5 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
34
America 34 Partial Interaction Factor Model Partial Interaction Model: Gender + Age.Gender Interaction adjusts for removal of the simple age term, except at the base level (male). Full Interaction Model: Age + Gender + Age.Gender Relationship between males and females is a different at each age. Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
35
America 35 Parameter Notation: Interaction Only MaleFemale Youthful β0β0 β 0 + β YF Adult β0β0 β0β0 Mature β0β0 β 0 + β MF Seniors β0β0 β 0 + β SF Only variation allowed is at non-base levels. Assume two rating variables: - Age: Youthful, Adult (base), Mature, Seniors - Gender: Male (base), Female Interaction Only: Age.Gender MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Interaction: Age + Gender + Age.Gender # Parameters: 8 # Parameters: 4 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
36
America 36 Interaction Only Factor Model Interaction adjusts for removal of the simple terms, except at the base levels (male and 36-40). Interaction adjusts for removal of the simple age term, except at the base level (male). Partial Interaction Model: Gender + Age.Gender Interaction Only Model: Age.Gender Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
37
America 37 Parameter Notation: Summaries MaleFemale Youthful β0β0 β 0 + β YF Adult β0β0 β0β0 Mature β0β0 β 0 + β MF Seniors β0β0 β 0 + β SF Interaction Only: Age.Gender MaleFemale Youthful β 0 + β Y β 0 + β Y + β F + β YF Adult β0β0 β 0 + β F Mature β 0 + β M β 0 + β M + β F + β MF Seniors β 0 + β S β 0 + β S + β F + β SF Interaction: Age + Gender + Age.Gender # Parameters: 8 # Parameters: 4 MaleFemale Youthful β0β0 β 0 + β F + β YF Adult β0β0 β 0 + β F Mature β0β0 β 0 + β F + β MF Seniors β0β0 β 0 + β F + β SF Partial Interaction: Gender + Age.Gender # Parameters: 5 MaleFemale Youthful β 0 + β Y β 0 + β Y + β YF Adult β0β0 β0β0 Mature β 0 + β M β 0 + β M + β MF Seniors β 0 + β S β 0 + β S + β SF Partial Interaction: Age + Age.Gender # Parameters: 7 Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
38
America 38 Simplifying Interactions Complex relationships can be simplified using curves, groups, etc. 3 rd Degree Curve 4th Degree Curve Ages Grouped Males same as Females Male/Female Relativity same for 21-24 - Age curve simplified with several curves and a grouping. - Relationship between males and females simplified too. Male/Female Relativity varies by Age Model Structure Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics Definitions Definitions Parameterization Parameterization Identification Identification Simplification Simplification
39
America 39 Testing Assumptions: Macro Residual Analysis - Asymmetrical appearance suggests power of variance function is too low Plot of all residuals tests selected error structure/link function - Elliptical pattern is ideal - Two concentrations suggests two perils: split of use joint modeling - Use crunched residuals for frequency Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
40
America 40 Testing Assumptions: Micro Residual Analysis SmallLarge GoodOK PoorOKProblem Examine largest residuals… Influence Fit - Standardized deviance gives a measure of “fit” (performance) - Cook’s deviance gives a measure of “influence” “Problem” points may require further investigation Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
41
America 41 Testing Predictiveness: Sampling Training and Testing Data Training Data Test Data 80% 20% Model Structure and Parameters Build Test OK Not OK Done Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
42
America 42 Testing Predictiveness: Bootstrapping Bootstrapping Data Trainin g Data Test Data 80% 20% Model Structure Build Test Model Parameters Done OK Not OK Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
43
America 43 Testing Predictiveness: Gains Curve Gains Curves Order observations by fitted values (descending). Plot cumulative fitted against cumulative weight. High fitted values should correspond to high observed values. Gini Coefficient : The larger coefficient implies greater predictiveness. Fitted Number of Claims Diagnostics Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
44
America 44 Summary Predictive Models corrects for methodological flaws associated with traditional approaches - Excludes unsystematic effect - Corrects distributional bias - Identifies and models response correlation GLM Building Blocks can be refined to find the signals for stochastic processes Link Functions can be adjusted to create non linear solutions Error Distributions are expanded to understand the relationship between the uncertainty and prediction Model Structures are flexible to reflect underlying process GLM diagnostics aid to better decision making abilities Background –Link FunctionLink Function –Error DistributionError Distribution –Model StructureModel Structure Summary GLM Building Blocks Diagnostics
45
America Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.