© Deloitte Consulting, 2004 Introduction to Data Mining James Guszcza, FCAS, MAAA CAS 2004 Ratemaking Seminar Philadelphia March 11-12, 2004.

Slides:



Advertisements
Similar presentations
Predictive Modeling for Property-Casualty Insurance
Advertisements

Brief introduction on Logistic Regression
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Mining the Data Ira M. Schoenberger, FACHCA Senior Administrator 2011 AHCA/NCAL Quality Symposium Friday February 18, 2011.
Data Mining – Best Practices CAS 2008 Spring Meeting Quebec City, Canada Louise Francis, FCAS, MAAA
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
EFFECTIVE PREDICTIVE MODELING- DATA,ANALYTICS AND PRACTICE MANAGEMENT Richard A. Derrig Ph.D. OPAL Consulting LLC Karthik Balakrishnan Ph.D. ISO Innovative.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS 2007 Ratemaking Seminar Louise Francis, FCAS Francis Analytics and.
2006 CAS RATEMAKING SEMINAR CONSIDERATIONS FOR SMALL BUSINESSOWNERS POLICIES (COM-3) Beth Fitzgerald, FCAS, MAAA.
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
The Basics of Model Validation
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
Data Mining Chun-Hung Chou
Proprietary & Confidential 1 Product Development Workshop Part 7: Product Monitoring/Risk Management 2012 CAS Ratemaking and Product Management Seminar.
 Several years ago, a major P&C insurer established key business goal Significantly enhance approach to writing Small Commercial  Product / process.
© Deloitte Consulting, 2005 Predictive Modeling – Panacea or Placebo? Cheng-Sheng Peter Wu, FCAS, ASA, MAAA CAS 2005 Spring Meeting Scottsdale, AZ May.
A View Inside the “Black Box”: A Review and Analysis of Personal Lines Insurance Credit Scoring Models Filed in the State of Virginia By Cheng-sheng Peter.
Travelers Analytics: U of M Stats 8053 Insurance Modeling Problem
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Data Cleansing for Predictive Models: The Next Level Roosevelt C. Mosley, Jr., FCAS, MAAA CAS Ratemaking & Product Management Seminar Philadelphia, PA.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Neural Networks Automatic Model Building (Machine Learning) Artificial Intelligence.
1 Does Credit Score Really Help Explain Insurance Losses? Cheng-Sheng Peter Wu, FCAS, ASA, MAAA, Jim Guszcza, ACAS, MAAA, Ph. D.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
2007 CAS PREDICTIVE MODELING SEMINAR PROJECT MANAGEMENT FOR PREDICTIVE MODELS BETH FITZGERALD, ISO.
© Deloitte Consulting, 2004 Alternatives to Credit Scoring in Insurance James Guszcza, FCAS, MAAA Cheng-Sheng Peter Wu, FCAS, ASA, MAAA CAS 2004 Ratemaking.
Integrating the Broad Range Applications of Predictive Modeling in a Competitive Market Environment Jun Yan Mo Mosud Cheng-sheng Peter Wu 2008 CAS Spring.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
© Deloitte Consulting, 2005 What To Do When You Cannot Use Credit? (Personal Lines) Cheng-Sheng Peter Wu, FCAS, ASA, MAAA CAS 2005 Special Interest Seminar.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
May 18, 2004CAS Spring Meeting1 Demand Based Pricing: A Company Perspective CAS Spring Meeting May 18, 2004 Floyd M. Yager, FCAS, MAAA Allstate Insurance.
A way to integrate IR and Academic activities to enhance institutional effectiveness. Introduction The University of Alabama (State of Alabama, USA) was.
Predictive Modeling for Small Commercial Risks CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.
CAS Seminar on Ratemaking Introduction to Ratemaking Relativities (INT - 3) March 11, 2004 Wyndham Franklin Plaza Hotel Philadelphia, Pennsylvania Presented.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
2008 CAS SPRING MEETING PROJECT MANAGEMENT FOR PREDICTIVE MODELS JOHN BALDAN, ISO.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
CANE 2007 Spring Meeting Visualizing Predictive Modeling Results Chuck Boucek (312)
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
1 Deloitte Consulting LLP Predictive Modeling for Commercial Risks Cheng-Sheng Peter Wu, FCAS, ASA, MAAA CAS 2005 Special Interest Seminar Chicago September.
Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.
Dancing With Dirty Data: Methods for Exploring and Cleaning Data 2005 CAS Ratemaking Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial.
Commercial Insurance Product Development Justin VanOpdorp ACAS, MAAA GE Commercial Insurance g.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Clustering Algorithms Minimize distance But to Centers of Groups.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
What we mean by Big Data and Advanced Analytics
Machine Learning with Spark MLlib
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
MIS2502: Data Analytics Advanced Analytics - Introduction
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Data Mining 101 with Scikit-Learn
Machine Learning & Data Science
Week 11 Knowledge Discovery Systems & Data Mining :
Dr. Morgan C. Wang Department of Statistics
CSCI N317 Computation for Scientific Applications Unit Weka
Presentation transcript:

© Deloitte Consulting, 2004 Introduction to Data Mining James Guszcza, FCAS, MAAA CAS 2004 Ratemaking Seminar Philadelphia March 11-12, 2004

© Deloitte Consulting, Themes What is Data Mining? How does it relate to statistics? Insurance applications Data sources The Data Mining Process Model Design Modeling Techniques Louise Francis’ Presentation

© Deloitte Consulting, Themes How does data mining need actuarial science? Variable creation Model design Model evaluation How does actuarial science need data mining? Advances in computing, modeling techniques Ideas from other fields can be applied to insurance problems

© Deloitte Consulting, Themes “The quiet statisticians have changed our world; not by discovering new facts or technical developments, but by changing the ways that we reason, experiment and form our opinions.” -- Ian Hacking Data mining gives us new ways of approaching the age-old problems of risk selection and pricing…. ….and other problems not traditionally considered ‘actuarial’.

© Deloitte Consulting, 2004 What is Data Mining?

© Deloitte Consulting, What is Data Mining? My definition: “Statistics for the Computer Age” Many new techniques have come from Computer Science, Marketing, Biology… but all can (should!) be brought under the framework of “statistics” Not a radical break with traditional statistics Complements, builds on traditional statistics Statistics enriched with brute-force capabilities of modern computing Opens the door to new techniques Therefore Data Mining tends to be associated with industrial-sized data sets

© Deloitte Consulting, Buzz-words Data Mining Knowledge Discovery Machine Learning Statistical Learning Predictive Modeling Supervised Learning Unsupervised Learning ….etc

© Deloitte Consulting, What is Data Mining? Supervised learning: predict the value of a target variable based on several predictive variables “Predictive Modeling” Credit / non-credit scoring engines Retention, cross-sell models Unsupervised learning: describe associations and patterns along many dimensions without any target information Customer segmentation Data Clustering Market basket analysis (“diapers and beer”)

© Deloitte Consulting, So Why Should Actuaries Do This Stuff? Any application of statistics requires subject-matter expertise Psychometricians Econometricians Bioinformaticians Marketing scientists …are all applied statisticians with a particular subject- matter expertise & area of specialty Add actuarial modelers to this list! “Insurometricians”!? Actuarial knowledge is critical to the success of insurance data mining projects

© Deloitte Consulting, Three Concepts Scoring engines A “predictive model” by any other name… Lift curves How much worse than average are the policies with the worst scores? Out-of-sample tests How well will the model work in the real world? Unbiased estimate of predictive power

© Deloitte Consulting, Classic Application: Scoring Engines Scoring engine: formula that classifies or separates policies (or risks, accounts, agents…) into profitable vs. unprofitable Retaining vs. non-retaining… (Non-)Linear equation f( ) of several predictive variables Produces continuous range of scores score = f(X 1, X 2, …, X N )

© Deloitte Consulting, What “Powers” a Scoring Engine? Scoring Engine: score = f(X 1, X 2, …, X N ) The X 1, X 2,…, X N are at least as important as the f( )! Again why actuarial expertise is necessary Think of the predictive power of credit variables A large part of the modeling process consists of variable creation and selection Usually possible to generate 100’s of variables Steepest part of the learning curve

© Deloitte Consulting, Model Evaluation: Lift Curves Sort data by score Break the dataset into 10 equal pieces Best “decile”: lowest score  lowest LR Worst “decile”: highest score  highest LR Difference: “Lift” Lift = segmentation power Lift translates into ROI of the modeling project

© Deloitte Consulting, Out-of-Sample Testing Randomly divide data into 3 pieces Training data, Test data, Validation data Use Training data to fit models Score the Test data to create a lift curve Perform the train/test steps iteratively until you have a model you’re happy with During this iterative phase, validation data is set aside in a “lock box” Once model has been finalized, score the Validation data and produce a lift curve Unbiased estimate of future performance

© Deloitte Consulting, Data Mining: Applications The classic: Profitability Scoring Model Underwriting/Pricing applications Credit models Retention models Elasticity models Cross-sell models Lifetime Value models Agent/agency monitoring Target marketing Fraud detection Customer segmentation no target variable (“unsupervised learning”)

© Deloitte Consulting, Skills needed Statistical Beyond college/actuarial exams… fast-moving field Actuarial The subject-matter expertise Programming! Need scalable software, computing environment IT - Systems Administration Data extraction, data load, model implementation Project Management Absolutely critical because of the scope & multidisciplinary nature of data mining projects

© Deloitte Consulting, Data Sources Company’s internal data Policy-level records Loss & premium transactions Billing VIN…….. Externally purchased data Credit CLUE MVR Census ….

© Deloitte Consulting, 2004 The Data Mining Process

© Deloitte Consulting, Raw Data Research/Evaluate possible data sources Availability Hit rate Implementability Cost-effectiveness Extract/purchase data Check data for quality (QA) At this stage, data is still in a “raw” form Often start with voluminous transactional data Much of the data mining process is “messy”

© Deloitte Consulting, Variable Creation Create predictive and target variables Need good programming skills Need domain and business expertise Steepest part of the learning curve Discuss specifics of variable creation with company experts Underwriters, Actuaries, Marketers… Opportunity to quantify tribal wisdom

© Deloitte Consulting, Variable Transformation Univariate analysis of predictive variables Exploratory Data Analysis (EDA) Data Visualization Use EDA to cap / transform predictive variables Extreme values Missing values …etc

© Deloitte Consulting, Multivariate Analysis Examine correlations among the variables Weed out redundant, weak, poorly distributed variables Model design Build candidate models Regression/GLM Decision Trees/MARS Neural Networks Select final model

© Deloitte Consulting, Model Analysis & Implementation Perform model analytics Necessary for client to gain comfort with the model Calibrate Models Create user-friendly “scale” – client dictates Implement models Programming skills again are critical Monitor performance Distribution of scores/variables, usage of the models,..etc Plan model maintenance schedule

© Deloitte Consulting, 2004 Model Design Where Data Mining Needs Actuarial Science

© Deloitte Consulting, Model Design Issues Which target variable to use? Frequency & severity Loss Ratio, other profitability measures Binary targets: defection, cross-sell …etc How to prepare the target variable? Period - 1-year or Multi-year? Losses Cap large losses? Cat losses? How / whether to re-rate, adjust premium? What counts as a “retaining” policy? …etc

© Deloitte Consulting, Model Design Issues Which data points to include/exclude Certain classes of business? Certain states? …etc Which variables to consider? Credit, or non-credit only? Include rating variables in the model? Exclude certain variables for regulatory reasons? …etc What is the “level” of the model? Policy-term level, HH-level, Risk-level..etc Or should data be summarized into “cells” à la minimum bias?

© Deloitte Consulting, Model Design Issues How should model be evaluated? Lift curves, Gains chart, ROC curve? How to measure ROI? How to split data into train/test/validation? Or cross- validation? Is there enough data for lift curve to be “credible”?  Are your “incredible” results credible? …etc Not an exhaustive list – every project raises different actuarial issues!

© Deloitte Consulting, Reference My favorite textbook: The Elements of Statistical Learning --Jerome Friedman, Trevor Hastie, Robert Tibshirani