Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Supervised Learning Recap
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.
Decision Support Systems
OLS REGRESSION VS. NEURAL NETWORKS VS. MARS A COMPARISON R. J. Lievano E. Kyper University of Minnesota Duluth.
Data Mining Techniques Outline
Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting Louise Francis,
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Additive Models and Trees
Data Mining – Best Practices CAS 2008 Spring Meeting Quebec City, Canada Louise Francis, FCAS, MAAA
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS 2007 Ratemaking Seminar Louise Francis, FCAS Francis Analytics and.
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.
Predictive Modeling Project Stephen P. D’Arcy Professor of Finance University of Illinois at Urbana-Champaign ORMIR Presentation October 26, 2005.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Chapter 9 – Classification and Regression Trees
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Correlation & Regression
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Data Mining – Best Practices Part #2 Richard Derrig, PhD, Opal Consulting LLC CAS Spring Meeting June 16-18, 2008.
Software Prediction Models Forecasting the costs of software development.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Predictive Modeling CAS Reinsurance Seminar May 7, 2007 Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining,
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Review of Fraud Classification Using Principal Components Analysis of RIDITS By Louise A. Francis Francis Analytics and Actuarial Data Mining, Inc.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters.
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
Dancing With Dirty Data: Methods for Exploring and Cleaning Data 2005 CAS Ratemaking Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial.
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and.
Artificial Neural Networks for Data Mining. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-2 Learning Objectives Understand the.
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS,
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
CHAPTER 13 Data Processing, Basic Data Analysis, and the Statistical Testing Of Differences Copyright © 2000 by John Wiley & Sons, Inc.
Data Transformation: Normalization
Chapter 7. Classification and Prediction
KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
ENME 392 Regression Theory
Dimension Reduction in Workers Compensation
Business Statistics, 4e by Ken Black
Data Mining Lecture 11.
BPK 304W Correlation.
(classification & regression trees)
Prepared by Lee Revere and John Large
כריית נתונים.
Introduction to Predictive Modeling
MG3117 Issues and Controversies in Accounting
Business Statistics, 4e by Ken Black
What is Artificial Intelligence?
Presentation transcript:

Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa. Louise Francis, FCAS, MAAA Louise_francis@msn.com Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Objectives Answer the question: Why use data mining? Introduce the main data mining methods Decision Trees Neural Networks MARS Clustering Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. The Data Simulated Data for Automobile Claim Frequency Three Factors Territory Four Territories Age Continuous Function Mileage High, Low Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Data Challenges Nonlinearities Relation between dependent variable and independent variables is not linear or cannot be transformed to linear Interactions Relation between independent and dependent variable varies by one or more other variables Correlations Predictor variables are correlated with each other Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Simulated Example: Probability of Claim vs. Age by Territory and Mileage Group Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Claim Frequency Data Francis Analytics and Actuarial Data Mining, Inc.

Independent Probabilities for Each Variable Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Decision Trees Recursively partitions the data Often sequentially bifurcates the data – but can split into more groups Applies goodness of fit to select best partition at each step Selects the partition which results in largest improvement to goodness of fit statistic Francis Analytics and Actuarial Data Mining, Inc.

Goodness of Fit Statistics Chi Square  CHAID (Fish, Gallagher, Monroe- Discussion Paper Program, 1990) Deviance  CART Francis Analytics and Actuarial Data Mining, Inc.

Goodness of Fit Statistics Gini Measure  CART Francis Analytics and Actuarial Data Mining, Inc.

Goodness of Fit Statistics Entropy  C4.5 Francis Analytics and Actuarial Data Mining, Inc.

Territory = North / South First Split All Policyholders P = 1.00 Territory = North / South P = 0.11 Territory = East / West P = 0.06 Francis Analytics and Actuarial Data Mining, Inc.

Example of Goodness of Fit Calculation Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Example of Fitted Tree Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. MARS Multivariate Adaptive Regression Splines An extension of regression which Uses automated search procedures Models nonlinearities Models interactions Produces a regression-like formula Francis Analytics and Actuarial Data Mining, Inc.

Nonlinear Relationships Fits piecewise regression to continuous variables Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Interactions Fits basis functions (which are like dummy variables) to model interactions An interaction between Territory=East and Mileage can be modeled by a dummy variable which is 1 if the Territory=East and mileage =High and 0 otherwise. Francis Analytics and Actuarial Data Mining, Inc.

Goodness of Fit Statistics Generalized Cross-Validation Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Fitted MARS Model Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Neural Networks Developed by artificial intelligence experts – but now used by statisticians also Based on how neurons function in brain Francis Analytics and Actuarial Data Mining, Inc.

Neural Network Structure Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Neural Networks Fit by minimizing squared deviation between fitted and actual values Can be viewed as a non-parametric, non-linear regression Often thought of as a “black box” Due to complexity of fitted model it is difficult to understand relationship between dependent and predictor variables Francis Analytics and Actuarial Data Mining, Inc.

Understanding the Model: Variable Importance Look at weights to hidden layer Compute sensitivities: a measure of how much the predicted value’s error increases when the variables are excluded from the model one at a time Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Importance Ranking Neural Network and Mars ranked variables in same order Francis Analytics and Actuarial Data Mining, Inc.

Visualizing Fitted Neural Network Francis Analytics and Actuarial Data Mining, Inc.

ROC Curves for the Data Mining Methods Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Correlation Variable gender added Its only impact on probability of a claim: correlation with mileage variable – males had higher mileage MARS did not use the variable in model CART used it in two places to split tree Neural Network ranked gender as least important variable Francis Analytics and Actuarial Data Mining, Inc.

How the Methods Did Correlation with “True” Claim Frequency Francis Analytics and Actuarial Data Mining, Inc.

Unsupervised Learning Common Method: Clustering No dependent variable – records are grouped into classes with similar values on the variable Start with a measure of similarity or dissimilarity Maximize dissimilarity between members of different clusters Francis Analytics and Actuarial Data Mining, Inc.

Dissimilarity (Distance) Measure Euclidian Distance Manhattan Distance Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Binary Variables Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Binary Variables Sample Matching Rogers and Tanimoto Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Example: Fraud Data Data from 1993 closed claim study conducted by Automobile Insurers Bureau of Massachusetts Claim files often have variables which may be useful in assessing suspicion of fraud, but a dependent variable is often not available Variables used for clustering: Injury type Provider type Legal representation Prior Claim SIU Investigation Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Results for 2 Clusters Francis Analytics and Actuarial Data Mining, Inc.

Francis Analytics and Actuarial Data Mining, Inc. Beginners Library Berry, Michael J. A., and Linoff, Gordon, Data Mining Techniques, John Wiley and Sons, 1997 Kaufman, Leonard and Rousseeuw, Peter, Finding Groups in Data, John Wiley and Sons, 1990 Smith, Murry, Neural Networks for Statistical Modeling, International Thompson Computer Press, 1996 Francis Analytics and Actuarial Data Mining, Inc.

Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa. Louise Francis, FCAS, MAAA Louise_francis@msn.com Francis Analytics and Actuarial Data Mining, Inc.