Data Cleansing for Predictive Models: The Next Level Roosevelt C. Mosley, Jr., FCAS, MAAA CAS Ratemaking & Product Management Seminar Philadelphia, PA.

Slides:



Advertisements
Similar presentations
CAS Seminar on Ratemaking Introduction to Ratemaking Relativities March 13-14, 2006 Salt Lake City Marriott Salt Lake City, Utah Presented by: Brian M.
Advertisements

Copyright © 2008 Pearson Addison-Wesley. All rights reserved. Chapter 7 Financial Operations of Insurers.
Determination of Geographical Territories by Michael J. Miller EPIC Consulting, LLC 2004 CAS Ratemaking Seminar.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Chapter 17 Overview of Multivariate Analysis Methods
Chapter Seventeen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Multivariate Techniques for the Research Process.
Continuous Audit at Insurance Companies
x – independent variable (input)
Multivariate Data Analysis Chapter 9 - Cluster Analysis
1 Math 479 / 568 Casualty Actuarial Mathematics Fall 2014 University of Illinois at Urbana-Champaign Professor Rick Gorvett Session 9: Risk Classification.
Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
CAS Predictive Modeling Seminar Evaluating Predictive Models Glenn Meyers ISO Innovative Analytics October 5, 2006.
2006 CAS RATEMAKING SEMINAR CONSIDERATIONS FOR SMALL BUSINESSOWNERS POLICIES (COM-3) Beth Fitzgerald, FCAS, MAAA.
Data Mining Techniques
A New Exposure Base for Vehicle Service Contracts – Miles Driven CAS Ratemaking Seminar – Atlanta 2007 March 8, 2007Slide 1 Discussion Paper Presentation.
Proprietary & Confidential 1 Product Development Workshop Part 7: Product Monitoring/Risk Management 2012 CAS Ratemaking and Product Management Seminar.
Incorporating Catastrophe Models in Property Ratemaking Prop-8 Jeffrey F. McCarty, FCAS, MAAA State Farm Fire and Casualty Company 2000 Seminar on Ratemaking.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Multivariate Data Analysis CHAPTER seventeen.
1 Multivariate Analysis (Source: W.G Zikmund, B.J Babin, J.C Carr and M. Griffin, Business Research Methods, 8th Edition, U.S, South-Western Cengage Learning,
Module 6: Quantifying gaps and measuring coverage ILO, 2013.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Businessowners (BOP) Class Plans 2003 CAS Ratemaking Seminar March 27-28, 2003 Robert J. Walling, FCAS, MAAA.
2007 CAS PREDICTIVE MODELING SEMINAR PROJECT MANAGEMENT FOR PREDICTIVE MODELS BETH FITZGERALD, ISO.
2004 CAS RATEMAKING SEMINAR INCORPORATING CATASTROPHE MODELS IN PROPERTY RATEMAKING (PL - 4) ROB CURRY, FCAS.
Rating Plan Design… Where the Rubber meets the Road in Ratemaking… Ellen Berning, FCASJeff Thompson, MBA Segmentation DirectorState Manager Allied Insurance.
SUVs and Automobile Insurance Costs SUV Drivers Have Different Underlying Liability Loss Costs Michael C. Dubin, FCAS, MAAA, MCA 1999 CAS Seminar on Ratemaking.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Predictive Modeling for Small Commercial Risks CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.
CAS Seminar on Ratemaking Introduction to Ratemaking Relativities (INT - 3) March 11, 2004 Wyndham Franklin Plaza Hotel Philadelphia, Pennsylvania Presented.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Credit History Impact on Personal Lines Loss Experience Session CPP-49 James E. Monaghan Thurs. March 9, 2000 CAS Ratemaking Seminar.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
2004 CAS RATEMAKING SEMINAR INCORPORATING CATASTROPHE MODELS IN PROPERTY RATEMAKING (PL - 4) PRICING EARTHQUAKE INSURANCE DAVE BORDER, FCAS, MAAA.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
HOUSEHOLD AVERAGING CAS Annual Meeting 2007 Alice Gannon November 2007.
1999 CAS RATEMAKING SEMINAR PRODUCT DEVELOPMENT (MIS - 32) BETH FITZGERALD, FCAS, MAAA.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Hurricane Class Plans Michael A. Walters FCAS, MAAA CAS Ratemaking Seminar March 11-12, 1999.
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
CANE 2007 Spring Meeting Visualizing Predictive Modeling Results Chuck Boucek (312)
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Businessowners (BOP) Class Plans 2004 CAS Ratemaking Seminar March 11, 2004 Robert J. Walling, FCAS, MAAA.
Chapter 16 Social Statistics. Chapter Outline The Origins of the Elaboration Model The Elaboration Paradigm Elaboration and Ex Post Facto Hypothesizing.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Loss Rating Models: Value Proposition? Brian Ingle, FCAS, MAAA WC-4 Perspectives on Pricing Large Accounts 2006 CAS Ratemaking Seminar Salt Lake City,
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
© 2005 Towers Perrin Determination of Statistically Optimal Geographical Territory Boundaries Casualty Actuarial Society Seminar on Ratemaking Session.
Multivariate Data Analysis Chapter 3 – Factor Analysis.
Multidimensional Scaling
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
2005 CAS Ratemaking Seminar Pricing and Market Conditions: Financial Lines Measuring Risk for D&O Liability Ben Fidlow, FCAS, MAAA.
2003 CAS RATEMAKING SEMINAR CONSIDERATIONS FOR SMALL BUSINESSOWNERS POLICIES (COM - 5) BETH FITZGERALD, FCAS, MAAA.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
FACTOR ANALYSIS CLUSTER ANALYSIS Analyzing complex multidimensional patterns.
2006 CAS RATEMAKING SEMINAR PRODUCT DEVELOPMENT (COM -2)
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
2000 CAS RATEMAKING SEMINAR
Variable Reduction for Predictive Modeling with Clustering
Geographical Information Systems for Statistics Mar 2007
Machine Learning in Practice Lecture 23
Multidimensional Space,
Machine Learning – a Probabilistic Perspective
Cluster analysis Presented by Dr.Chayada Bhadrakom
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Data Cleansing for Predictive Models: The Next Level Roosevelt C. Mosley, Jr., FCAS, MAAA CAS Ratemaking & Product Management Seminar Philadelphia, PA March 19 – 21, 2012 Experience the Pinnacle Difference!

Data Cleaning Why simple visualization may not tell the whole story Data cleansing – the next level There are distinct groups in your underlying data Data homogeneity Certain combinations of variables may point to data issues Multivariate data anomalies

Data Cleansing – The Next Level

Data Validation – One and Two Way Summaries

Data Cleansing – the Next Level  One and two way data summarization and visualization is absolutely key in determining that individual factors are valid  In building predictive models, multivariate techniques consider independent variables simultaneously to account for dependencies  Data issues don’t just exist in one and two dimensions, they can exist in n dimensions (where n is the number of individual elements)  Underlying causes: heterogeneity, data anomalies  Multivariate data exploration techniques can be used to address these issues

Data Homogeneity

Clustering/Segmentation  Unsupervised classification technique  Groups data into set of discrete clusters or contiguous groups of cases  Performs disjoint cluster analysis on the basis of Euclidean distances computed from one or more quantitative input variables and cluster seeds  Objects in each cluster tend to be similar, objects in different clusters tend to be dissimilar  Can be used as a dimension reduction technique

Example  Homeowners dataset  Ran clustering analysis using key risk characteristics  Amount of insurance  Age of home  Billing option  Construction  Protection class  Deductible  Multiline  State/territory  Developed predictive model on clusters independently

Cluster Distance Map

Cluster Characteristics

Billing Plan Indications

Deductible Indications

Multi-Line Indications

Multivariate Data Anomalies – Back to Cluster 1 Cluster 1Total Average Amount of Insurance $1,109,048$219,585 Average Age of Home 19.6 years42.7 years Percentage of Deductibles > $ %1.9%  Higher value homes  Segment of the business that is certainly heterogeneous – will behave differently that overall population  Represents 0.2% of the overall exposures  Should we exclude data points such as these?

Outlier Data Points Midpoint of the cluster, represents an average risk for that cluster Risk that is slightly different than average, but still fits well with that cluster Potential anomaly – data point fits best within this cluster but is actually an outlier for the cluster. This generally means it doesn’t fit well anywhere.

Data “Cleanup”  Reflect heterogeneity in final product (rating plan adjustments, underwriting, tiering)  Data verification  Modify data  Exclude data