© 2012 Towers Watson. All rights reserved. GLM II Basic Modeling Strategy 2012 CAS Ratemaking and Product Management Seminar by Len Llaguno March 20, 2012.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Correlation and regression
Use Macro to Enter TP Logo March 11-12, 1999, Nashville, Tennessee Mark Scully, Tillinghast-Towers Perrin The Use of Multivariate Analysis Techniques to.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Statistics for Managers Using Microsoft® Excel 5th Edition
BA 555 Practical Business Analysis
Statistics for Managers Using Microsoft® Excel 5th Edition
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Considerations in P&C Pricing Segmentation February 25, 2015 Bob Weishaar, Ph.D., FCAS, MAAA.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Ch. 14: The Multiple Regression Model building
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Today Concepts underlying inferential statistics
Correlation and Regression Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.
Classification and Prediction: Regression Analysis
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
THE SCIENCE OF RISK SM 1 Interaction Detection in GLM – a Case Study Chun Li, PhD ISO Innovative Analytics March 2012.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Inference for regression - Simple linear regression
Hypothesis Testing in Linear Regression Analysis
Basic Data Analysis for Quantitative Research
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Day 7 Model Evaluation. Elements of Model evaluation l Goodness of fit l Prediction Error l Bias l Outliers and patterns in residuals.
Chapter 12 Multiple Regression and Model Building.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Integrating Reserve Risk Models into Economic Capital Models Stuart White, Corporate Actuary Casualty Loss Reserve Seminar, Washington D.C September.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Two Approaches to Calculating Correlated Reserve Indications Across Multiple Lines of Business Gerald Kirschner Classic Solutions Casualty Loss Reserve.
Practical GLM Modeling of Deductibles
America CAS Predictive Modeling Seminar September 2005 Presented by: Rich Moncher – Bristol West Tom Hettinger – EMB America Vehicle Ratemaking Vehicles.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Reserve Variability – Session II: Who Is Doing What? Mark R. Shapland, FCAS, ASA, MAAA Casualty Actuarial Society Spring Meeting San Juan, Puerto Rico.
America CAS Seminar on Ratemaking March 2005 Presented by: Serhat Guven An Introduction to GLM Theory Refinements.
Privileged & Confidential Frequency and Severity vs. Loss Cost Modeling CAS 2012 Ratemaking and Product Management Seminar March 2012 Philadelphia, PA.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
CANE 2007 Spring Meeting Visualizing Predictive Modeling Results Chuck Boucek (312)
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
ANOVA, Regression and Multiple Regression March
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
The Box-Jenkins (ARIMA) Methodology
1 Solving the Puzzle: The Hybrid Reinsurance Pricing Method John Buchanan CAS Ratemaking Seminar – REI 4 March 17, 2008 CAS RM 2008 – The Hybrid Reinsurance.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Chapter 4 Basic Estimation Techniques
Regression model Y represents a value of the response variable.
CHAPTER 29: Multiple Regression*
Regression Forecasting and Model Building
The Examination of Residuals
Generalized Linear Models
Presentation transcript:

© 2012 Towers Watson. All rights reserved. GLM II Basic Modeling Strategy 2012 CAS Ratemaking and Product Management Seminar by Len Llaguno March 20, 2012

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 1

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 2 Set project goals and review background Gather and prepare data Explore Data Build Component Predictive Models Validate Component Models Combine Component Models Building predictive models is a multi-step process Ernesto walked us through the first 3 components We will now go through an example of the remaining steps: Building component predictive models — We will illustrate how to build a frequency model Validating component models — We will illustrate how to validate your component model We will also briefly discuss combining models and incorporating implementation constraints — Goal should be to build best predictive models now and incorporate constraints later Incorporate Constraints

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 3 Set project goals and review background Gather and prepare data Explore Data Build Component Predictive Models Validate Component Models Combine Component Models Building component predictive models can be separated into two steps Initial Modeling Selecting error structure and link function Build simple initial model Testing basic modeling assumptions and methodology Iterative modeling Refining your initial models through a series of iterative steps complicating the model, then simplifying the model, then repeating Incorporate Constraints

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 4 Initial modeling Initial modeling is done to test basic modeling methodology Is my link function appropriate? Is my error structure appropriate? Is my overall modeling methodology appropriate (e.g. do I need to cap losses? Exclude expense only claims? Model by peril?)

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 5 Examples of error structures Error functions reflect the variability of the underlying process and can be any distribution within the exponential family, for example: Gamma consistent with severity modeling; may want to try Inverse Gaussian Poisson consistent with frequency modeling Tweedie consistent with pure premium modeling Normal useful for a variety of applications

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 6 Generally accepted error structure and link functions Observed Response Most Appropriate Link Function Most Appropriate Error Structure Variance Function -- Normalµ0µ0 Claim FrequencyLogPoissonµ1µ1 Claim SeverityLogGammaµ2µ2 Claim SeverityLogInverse Gaussianµ3µ3 Pure PremiumLogGamma or TweedieµTµT Retention RateLogitBinomialµ(1-µ) Conversion RateLogitBinomialµ(1-µ) Use generally accepted standards as starting point for link functions and error structures

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 7 Build an initial model Reasonable starting points for model structure Prior model Stepwise regression General insurance knowledge CART (Classification and Regression Trees) or similar algorithms

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 8 Test model assumptions Plot of all residuals tests selected error structure/link function Asymmetrical appearance suggests power of variance function is too low Elliptical pattern is ideal Two concentrations suggests two perils: split or use joint modeling Use crunched residuals for frequency

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 9 Example: initial frequency model Link function: Log Error structure: Poisson Initial variable selected based on industry knowledge: Gender Driver age Vehicle value Area (territory) Variable NOT in initial model: Vehicle body Vehicle age Gender Relativity

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 10 Example: initial frequency model Link function: Log Error structure: Poisson Initial variable selected based on industry knowledge: Gender Driver age Vehicle value Area (territory) Variable NOT in initial model: Vehicle body Vehicle age Area Relativity

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 11 Example: initial frequency model Link function: Log Error structure: Poisson Initial variable selected based on industry knowledge: Gender Driver age Vehicle value Area (territory) Variable NOT in initial model: Vehicle body Vehicle age Driver Age Relativity

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 12 Example: initial frequency model Link function: Log Error structure: Poisson Initial variable selected based on industry knowledge: Gender Driver age Vehicle value Area (territory) Variable NOT in initial model: Vehicle body Vehicle age Vehicle Value Relativity

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 13 Example: initial frequency model - residuals Frequency residuals are hard to interpret without ‘Crunching’ Two clusters: Data points with claims Data points without claims

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 14 Example: initial frequency model - residuals Order observations from smallest to largest predicted value Group residuals into 500 buckets The graph plots the average residual in the bucket Crunched residuals look good!

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 15 Iterative Modeling Initial models are refined using an iterative modeling approach Iterative modeling involves many decisions to complicate and simplify the models Your modeling toolbox can help you make these decisions We will discuss your tools shortly Review Model Complicate Include Interactions Simplify Exclude Group Curves

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 16 Ideal Model Structure To produce a sensible model that explains recent historical experience and is likely to be predictive of future experience Underfit: Predictive Poor explanatory power Overfit: Poor predictive power Explains history Overall mean Best Models One parameter per observation Model Complexity (number of parameters)

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 17 Your modeling tool box Model decisions include: Simplification: excluding variables, grouping levels, fitting curves Complication: including variables, adding interactions Your modeling toolbox will help you make these decisions Your tools include: — Parameters/standard errors — Consistency of patterns over time or random data sets — Type III statistical tests (e.g., chi-square tests, F-tests) — Balance tests (i.e. actual vs. expected test) — Judgment (e.g., do the trends make sense?)

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 18 Modeling toolbox: judgment The modeler should also ask, ‘does this pattern make sense?’ Patterns may often be counterintuitive, but become reasonable after investigation Uses: Inclusion/exclusion Grouping Fitting curves Assessing interactions Modeled Frequency Relativity – Vehicle Value

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 19 Modeling toolbox: balance test Balance test is essentially an actual vs. expected Can identify variables that are not in the model where the model is not in ‘balance’ Indicates variable may be explaining something not in the model Uses: Inclusion Actual vs. Expected Frequency - Vehicle Age

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 20 Modeling toolbox: parameters/standard errors Parameters and standard errors provide confidence in the pattern exhibited by the data Uses: Horizontal line test for exclusion Plateaus for grouping A measure of credibility Modeled Frequency Relativities With Standard Errors - Vehicle Body

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 21 Modeling toolbox: consistency of patterns Checking for consistency of patterns over time or across random parts of a data set is a good practical test Uses: Validating modeling decisions — Including/excluding factors — Grouping levels — Fitting curves — Adding Interactions Modeled Frequency Relativity – Age Category

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 22 Modeling toolbox: type III tests Chi test and/or F-Test is a good statistical test to compare nested models H o : Two models are essentially the same H 1 : Two models are not the same Principle of parsimony: If two models are the same, choose the simpler model Uses: Inclusion/exclusion Chi-Square Percentage MeaningAction* <5%Reject H o Use More Complex Model 5%-15%Grey Area??? 15%-30%Grey Area??? >30%Accept H o Use Simpler Model

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 23 Example: frequency model iteration 1 – simplification Modeling decision: Grouping Age Category and Area Tools Used: judgment, parameter estimates/std deviations, type III test Area Relativity Age Category Relativity Chi Sq P Val = 0.0% Chi Sq P Val = 6.5%

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 24 Example: frequency model iteration 1 – simplification Modeling decision: fitting a curve to vehicle value Tools used: judgment, type III test, consistency test Vehicle Value Relativity – Curve Fit Vehicle Value Relativity – Initial Model Chi Sq P Val = 0.0%

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 25 Example: frequency model iteration 2 – complication Modeling decision: adding vehicle body type Tools used: balance test, parameter estimates/std deviations, type III test Balance Test: Actual vs. Expected Across Vehicle Body Type Vehicle Body Type Not In Model Vehicle Body Type Relativities Vehicle Body Type Included in Model Chi Sq P Val = 1.3%

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 26 Example: iterative modeling continued…. Iteration 3 - simplification Group vehicle body type Iteration 4 – complication Add vehicle age Iteration 5 – simplification group vehicle age levels

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 27 Example: frequency model iteration 6 – complication Action: adding age x gender interaction Tools used: balance test, type III test, consistency test, judgment Balance Test: Two Way Actual vs. Expected Across Age x Gender Age x Gender Interaction NOT in model Vehicle Body Type Relativities Vehicle Body Type Included in Model Chi Sq P Val = 47.5% M F

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 28 Set project goals and review background Gather and prepare data Explore Data Build Component Predictive Models Validate Component Models Combine Component Models Predictive models must be validated to have confidence in the predictive power of the models Model validation techniques include: Examining residuals Examining gains curves Examining hold out samples — Changes in parameter estimates — Actual vs. expected on hold out sample Component models and combined risk premium model should be validated Incorporate Constraints

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 29 Model validation: residual analysis Recheck residuals to ensure appropriate shape Crunched residuals are symmetric For Severity - Does the Box- Whisker show symmetry across levels?

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 30 Model validation: residual analysis (cont’d) Common issues with residual plots Asymmetrical appearance suggests power of variance function is too low Elliptical pattern is ideal Two concentrations suggests two perils: split or use joint modeling Use crunched residuals for frequency

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 31 Model validation: gains curves Gains curve are good for comparing predictiveness of models Order observations from largest to smallest predicted value on X axis Cumulative actual claim counts (or losses) on Y axis As you move from left to right, the better model should accumulate actual losses faster

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 32 Partial Test/Training for Smaller Data Sets Full Test/Training for Large Data Sets Model validation: hold out samples Holdout samples are effective at validating models Determine estimates based on part of data set Uses estimates to predict other part of data set Predictions should be close to actuals for heavily populated cells Data Split Data Train Data Build Models Test Data Compare Predictions to Actual All Data Build Models Split Data Train Data Refit Parameters Test Data Compare Predictions to Actual

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 33 Model validation: lift charts on hold out data Actual vs. expected on holdout data is an intuitive validation technique Good for communicating model performance to non-technical audiences Can also create actual vs. expected across predictor dimensions

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 34 Set project goals and review background Gather and prepare data Explore Data Build Component Predictive Models Validate Component Models Combine Component Models Component frequency and severity models can be combined to create pure premium models Component models can be constructed in many different ways The standard model: Incorporate Constraints COMPONENT MODELS Frequency Severity Poisson/ Negative Binomial COMBINE Frequency Severity x Gamma

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 35 Building a model on modeled pure premium When using modeled pure premiums, select the gamma/log link (not the Tweedie) Modeled pure premiums will not have a point mass at zero Raw pure premiums are bimodal (i.e., have a point mass at zero) and require a distribution such as the Tweedie

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 36 Set project goals and review background Gather and prepare data Explore Data Build Component Predictive Models Validate Component Models Combine Component Models Various constraints often need to be applied to the modeled pure premiums Incorporate Constraints Goal: Convert modeled pure premiums into indications after consideration of internal and external constraints Not always possible or desirable to charge the fully indicated rates in the short run Marketing decisions Regulatory constraints Systems constraints Need to adjust the indications for known constraints

© 2012 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only. towerswatson.com 37 Constraints to give desired subsidies Offsetting one predictor changes parameters of other correlated predictors to make up for the restrictions The stronger the exposure correlation, the more that can be made up through the other variable Consequently, the modeler should not refit models when a desired subsidy is incorporated into the rating plan Insurer-Desired SubsidyRegulatory Subsidy Example Sr. mgmt wants subsidy to attract drivers 65+ Regulatory constraint requires subsidy of drivers 65+ Result of refitting with constraint Correlated factors will adjust to partially make up for the difference. For example, territories with retirement communities will increase. Potential action Do not refit models with constraint Consider implication of refitting and make a business decision