Download presentation
Presentation is loading. Please wait.
Published byWarren Newton Modified over 9 years ago
1
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. Louise_francis@msn.com www.data-mines.com
2
Objectives Information sharing Introduce some useful sources of data to augment company internal databases Show examples of applications using external data
3
Why Augment Data? For small companies, new lines of business, internal data may not be sufficient Add variables (i.e, demographic and economic) that are not in data
4
Some Kinds of External Data Demographic Geographic Economic –Unemployment rate, avg wage, etc –Financial Market Insurance data Occupational Weather
5
Zip Code Level Data Census bureau web site, www.census.gov has a wealth of informationwww.census.gov May require some processing effort to put into useful format for analysis For a small fee there are vendors who pre- process some of the useful data One of them is zip-codes.com
6
Zip-codes.com
7
Some Useful Variables Average Income Population Average house value # people per house Latitude, longitude –Use to compute distances City, county
8
Distance formula
9
The Data
10
California Auto Data by ZIP BI Exposures BI Losses BI Claims PD Exposures PD Losses PD Claims
11
CAARP Data CAARP data California Auto Assigned Risk Plan Collected by state Aggregated data Request from Statistical Analysis Division of department
12
California Proposed Changes to Territory Rating
13
Effect of Change by County
14
Effect of Change by Pure Premium Group
15
Effect of Change by Average House Value
16
Effect of Change by Average Income
17
The Data used for Fraud Model Described in “Distinguishing the Forest From the Trees”, Derrig and Francis, 2005 CAS Winter Forum
18
The Fraud Surrogates used as Dependent Variables Independent Medical Exam (IME) requested Special Investigation Unit (SIU) referral –(IME successful) –(SIU successful) Data: Detailed Auto Injury Claim Database for Massachusetts Accident Years (1995-1997)
19
Predictor Variables Claim file variables –Provider bill, Provider type –Injury Derived from claim file variables –Attorneys per zip code –Docs per zip code Using external data –Average household income –Households per zip
20
Neural Network Ranking of Variables
21
Variable Importance for IME Requested for 3 Methods
22
Variable Importance (IME) Based on Average of Methods
23
Trends Using External Information People still rely on Masterson’s indices and other indices based on the CPI Shortcomings –Hedonic adjustment –Substitution –Imputed rental cost –Geometric chaining –See www.shadowstats.com or Getting Prices Right by Economic Policy Institute and Dean Bakerwww.shadowstats.com Insurance inflation has typically been much higher than these indications Many need reliable trend indications on smaller segments of their data Trend is another weak link in the modeling process
24
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.