CAS Predictive Modeling Seminar Evaluating Predictive Models Glenn Meyers ISO Innovative Analytics October 5, 2006.

Slides:



Advertisements
Similar presentations
Significance Testing.  A statistical method that uses sample data to evaluate a hypothesis about a population  1. State a hypothesis  2. Use the hypothesis.
Advertisements

Considerations in P&C Pricing Segmentation February 25, 2015 Bob Weishaar, Ph.D., FCAS, MAAA.
THEORY OF DISTRIBUTION OF INCOMES
Distribution of Income and Wealth
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Decision Tree Models in Data Mining
CHAPTER 18 Models for Time Series and Forecasting
THE SCIENCE OF RISK SM 1 Interaction Detection in GLM – a Case Study Chun Li, PhD ISO Innovative Analytics March 2012.
Chapter 2 Dimensionality Reduction. Linear Methods
Mining Insurance Data to Promote Traffic Safety and Better Match Rates to Risk 2002 CAS Seminar on Ratemaking Greg Hayward.
Going Beyond Averages, Using Spatial Data to Analyze Insurance Risk Scott Tracy, QPC Jennifer Lemus, ISO Innovative Analytics David Lapp, Farallon Geographics.
Chapter 8 In-Class Notes. Background on Insurance and Managing Risks Types of insurance Property insurance (auto and home insurance) Health insurance.
Car Insurance Terms to know: Terms to know: Financial Responsibility Laws Financial Responsibility Laws Premium Premium Deductible Deductible.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Data Cleansing for Predictive Models: The Next Level Roosevelt C. Mosley, Jr., FCAS, MAAA CAS Ratemaking & Product Management Seminar Philadelphia, PA.
Forecasting Techniques: Single Equation Regressions Su, Chapter 10, section III.
Practical GLM Modeling of Deductibles
America CAS Predictive Modeling Seminar September 2005 Presented by: Rich Moncher – Bristol West Tom Hettinger – EMB America Vehicle Ratemaking Vehicles.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
The Common Shock Model for Correlations Between Lines of Insurance
Types of Risk Risk Management Insurance Terminology Property & Liability Insurance Health and Life Insurance.
Testing Models on Simulated Data Presented at the Casualty Loss Reserve Seminar September 19, 2008 Glenn Meyers, FCAS, PhD ISO Innovative Analytics.
2007 CAS PREDICTIVE MODELING SEMINAR PROJECT MANAGEMENT FOR PREDICTIVE MODELS BETH FITZGERALD, ISO.
SUVs and Automobile Insurance Costs SUV Drivers Have Different Underlying Liability Loss Costs Michael C. Dubin, FCAS, MAAA, MCA 1999 CAS Seminar on Ratemaking.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
A comparison of the ability of artificial neural network and polynomial fitting was carried out in order to model the horizontal deformation field. It.
Looking at Insurance: Auto and Home Chapter 9. *Risk Factors – Auto Insurance costs Rating Territory Driver Classification Age Gender Marital status Driving.
Auto Insurance Policy Pre Test. Bodily Injury Property Damage Collision Coverage Comprehensive Coverage Uninsured Motorist Coverage This coverage provides.
Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007.
Equity, then Social Insurance … Allen C. Goodman © 2013.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
On Predictive Modeling for Claim Severity Paper in Spring 2005 CAS Forum Glenn Meyers ISO Innovative Analytics Predictive Modeling Seminar September 19,
2008 CAS SPRING MEETING PROJECT MANAGEMENT FOR PREDICTIVE MODELS JOHN BALDAN, ISO.
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
© English Matthews Brockman Business Planning in Personal Lines using DFA A Talk by Mike Brockman and Karl Murphy 2001 Joint GIRO/CAS Conference.
1 - © ISO, Inc., 2008 London CARe Seminar: Trend – U.S. Trend Sources and Techniques, A Comparison to European Methods Beth Fitzgerald, FCAS, MAAA, CPCU.
Real Financial Services 1 UN STATISTICS DIVISION Economic Statistics Branch National Accounts Section UNSD/ECA National accounts workshop November 2005.
SESSION 8: MACROECONOMIC INDICATORS: GDP, CPI, AND THE UNEMPLOYMENT RATE Talking Points Macroeconomic Indicators: GDP, CPI, and the Unemployment Rate 1.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Timothy Aman, FCAS MAAA Managing Director, Guy Carpenter Miami Statistical Limitations of Catastrophe Models CAS Limited Attendance Seminar New York, NY.
1 Can Vehicle Maintenance Records Predict Automobile Accidents? Shyi-Tarn Bair CEO, Ho-An Insurance Agency CO., LTD, Taiwan Rachel J. Huang Associate Professor,
© 2005 Towers Perrin Determination of Statistically Optimal Geographical Territory Boundaries Casualty Actuarial Society Seminar on Ratemaking Session.
Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.
Dancing With Dirty Data: Methods for Exploring and Cleaning Data 2005 CAS Ratemaking Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial.
Liability insurance - Financial protection against accidents that cause bodily injury and property damage. comprehensive insurance - Insurance protection.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Lorenz Curves and Index of Income Distribution (Gini Index)
Lecture 2 Macroeconomic Data and Variables
Chapter 10: Comparing Two Populations or Groups
Variable Reduction for Predictive Modeling with Clustering
Cost of Capital Issues April 16, 2002 John J. Kollar.
Chapter 10: Comparing Two Populations or Groups
Managing Underwriting Risk & Capital
Dr. Morgan C. Wang Department of Statistics
Insurance Basics (Don’t Risk It)
Undergraduated Econometrics
DEVELOPMENT OF IMPUTATION MODEL FOR SMALL ENTERPRISES
Indicator Variables Response: Highway MPG
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Presentation transcript:

CAS Predictive Modeling Seminar Evaluating Predictive Models Glenn Meyers ISO Innovative Analytics October 5, 2006

Choosing Models Predicting losses for individual insurance policies involves: –Millions of policy records –Hundreds (or thousands) of variables There are a number of models that provide good predictions –GLM, GAM, CART, MARS, Neural Nets, etc. Business objectives influence choice of model

The Modeling Process Modeling process involves dimension reduction techniques –Clustering, Principal Components, Factor Analysis –Building submodels and using predicted values as input into a higher level model The modeling cycle –1. Build model with training data –2. Evaluate model with test data –3. Identify improvements in models and data –4. Go back to Step 1

Hidden Parameters Classic model building methods correct for the number of parameters using “degrees of freedom.” The model exploration process “eats up degrees of freedom” in ways that cannot be captured by formal model adjustments. In essence the “test” data gets merged into the “training” data.

What Is Significant? Statistical packages will often identify improvements that are “statistically significant” but not “practically significant.” This talk is about determining when a model identifies “practically significant” improvements. Illustrate how to do this on a real example.

The Example A Personal Auto Model Under Development Preliminary Results Input – Address of insured vehicle Output – Address Specific Loss Cost –30 year old, single car with no SDIP points –500 deductible or 25/50/25 policy limits –Symbol 8, model year 2006 –etc. Model derived from over 1,200 variables reflecting weather, traffic, demographic, topographical and economic conditions.

Difference Between Address Specific and ISO Territory Loss Cost

Differences Abound Some Questions to Ask Can the model output be used to improve insurer underwriting results? Are the results statistically significant? Define ELI

Use Expected Loss Index for Risk Selection

Propose a Standard Way of Evaluating Lift – The Gini Index Originally proposed by Corrado Gini in 1912 Most often used to measure income and/or wealth inequality –Search for “Gini” in wikipedia.org In insurance underwriting, we want to evaluate systematic methods of finding “loss” inequality.

Gini Index Look at set of policy records below cutoff point, ELI < 1. This set of records accounts for 59% of total ISO (full) loss cost. This set of records accounts for 48% of total loss. 1 − 48/59 → 19% reduction in loss ratio.

Gini Index Do this calculation for other cutoff points. The results make up the what we call the Lorenz Curve

Gini Index If ELI is random, the Lorenz curve will be on the diagonal line. The Gini index is the percentage of the area under the “random” line that is above the Lorenz curve. Higher Gini means better predictive model.

A Gini Index Thought Experiment If we had the ability to predict who will have losses, what would the Gini index be? It would be 100% if only one risk had all the losses

Bodily Injury

Property Damage

Collision

Statistical Significance How much random fluctuation is in the Gini index calculation? Use bootstrapping to evaluate –Take a random sample of records, with replacement. –Calculate Gini index for the sample. –Repeat 250 times. Plot a histogram of the results.

Bootstrap Results

Summary Standard tests of statistical significance are suspect. –Informal model selection process –Statistical/Practical significance Propose Gini index as a test of practical significance. Divide data into three samples 1.Training – Used to fit models 2.Test – Used to evaluate fits 3.Holdout – “Final” evaluation R2R2