Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc www.data-mines.com.

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Support Systems
Chapter 7 – K-Nearest-Neighbor
Data Mining Techniques Outline
Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting Louise Francis,
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Tree-based methods, neutral networks
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Data Mining – Best Practices CAS 2008 Spring Meeting Quebec City, Canada Louise Francis, FCAS, MAAA
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Chapter 5 Data mining : A Closer Look.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Radial-Basis Function Networks
Introduction to Directed Data Mining: Decision Trees
Introduction to undirected Data Mining: Clustering
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS 2007 Ratemaking Seminar Louise Francis, FCAS Francis Analytics and.
Evaluating Performance for Data Mining Techniques
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
Overview DM for Business Intelligence.
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
Next Generation Techniques: Trees, Network and Rules
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
ANNs (Artificial Neural Networks). THE PERCEPTRON.
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Chapter 9 – Classification and Regression Trees
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
NEURAL NETWORKS FOR DATA MINING
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Data Mining – Best Practices Part #2 Richard Derrig, PhD, Opal Consulting LLC CAS Spring Meeting June 16-18, 2008.
Software Prediction Models Forecasting the costs of software development.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Predictive Modeling CAS Reinsurance Seminar May 7, 2007 Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining,
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Chapter 12 Discovering New Knowledge – Data Mining.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Review of Fraud Classification Using Principal Components Analysis of RIDITS By Louise A. Francis Francis Analytics and Actuarial Data Mining, Inc.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
Dancing With Dirty Data: Methods for Exploring and Cleaning Data 2005 CAS Ratemaking Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial.
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Artificial Neural Networks for Data Mining. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-2 Learning Objectives Understand the.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS,
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Transformation: Normalization
KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Dimension Reduction in Workers Compensation
Dr. Morgan C. Wang Department of Statistics
Presentation transcript:

Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

2 Objectives Introduce Predictive modeling Why use it? Describe some methods in depth Trees Neural networks Clustering Apply to fraud data

3 Predictive Modeling Family Predictive Modeling Classical Linear Models GLMsData Mining

4 Why Predictive Modeling? Better use of insurance data Advanced methods for dealing with messy data now available

5 Major Kinds of Modeling Supervised learning Most common situation A dependent variable Frequency Loss ratio Fraud/no fraud Some methods Regression CART Some neural networks Unsupervised learning No dependent variable Group like records together A group of claims with similar characteristics might be more likely to be fraudulent Some methods Association rules K-means clustering Kohonen neural networks

6 Two Big Specialties in Predicative Modeling Data Mining Trees Neural Networks Clustering

7 Modeling Process Internal Data Data Cleaning External Data Other Preprocessing Build ModelValidate ModelTest Model Deploy Model

8 Data Complexities Affecting Insurance Data Nonlinear functions Interactions Missing Data Correlations Non normal data

9 Kinds of Applications Classification Prediction

10 The Fraud Study Data 1993 Automobile Insurers Bureau closed Personal Injury Protection claims Dependent Variables Suspicion Score Number from 0 to 10 Expert assessment of liklihood of fraud or abuse 5 categories Used to create a binary indicator Predictor Variables Red flag indicators Claim file variables

11 Introduction of Two Methods Trees Sometimes known as CART (Classification and Regression Trees) Neural Networks Will introduce backpropagation neural network

12 Decision Trees Recursively partitions the data Often sequentially bifurcates the data – but can split into more groups Applies goodness of fit to select best partition at each step Selects the partition which results in largest improvement to goodness of fit statistic

13 Goodness of Fit Statistics Chi Square  CHAID (Fish, Gallagher, Monroe- Discussion Paper Program, 1990) Deviance  CART

14 Goodness of Fit Statistics Gini Measure  CART i is impurity measure

15 Goodness of Fit Statistics Entropy  C4.5

16 An Illustration from Fraud data: GINI Measure

17 First Split All Claims p(fraud) = 0.36 Legal Rep = Yes P(fraud) = Legal Rep = No P(fraud) = 0.113

18 Example cont:

19 Example of Nonlinear Function Suspicion Score vs. 1 st Provider Bill

20 An Approach to Nonlinear Functions: Fit A Tree

21 Fitted Curve From Tree

22 Neural Networks Developed by artificial intelligence experts – but now used by statisticians also Based on how neurons function in brain

23 Neural Networks Fit by minimizing squared deviation between fitted and actual values Can be viewed as a non-parametric, non- linear regression Often thought of as a “black box” Due to complexity of fitted model it is difficult to understand relationship between dependent and predictor variables

24 The Backpropagation Neural Network

25 Neural Network Fits a nonlinear function at each node of each layer

26 The Logistic Function

27 Universal Function Approximator The backpropagation neural network with one hidden layer is a universal function approximator Theoretically, with a sufficient number of nodes in the hidden layer, any continuous nonlinear function can be approximated

28 Nonlinear Function Fit by Neural Network

29 Interactions Functional relationship between a predictor variable and a dependent variable depends on the value of another variable(s)

30 Interactions Neural Networks The hidden nodes pay a key role in modeling the interactions CART partitions the data Partitions capture the interactions

31 Simple Tree of Injury and Provider Bill

32

33 Missing Data Occurs frequently in insurance data There are some sophisticated methods for addressing this (i.e., EM algorithm) CART finds surrogates for variables with missing values Neural Networks have no explicit procedure for missing values

34 More Complex Example Dependent variable: Expert’s assessment of liklihood claim is legitimate A classification application Predictor variables: Combination of claim file variables (age of claimant, legal representation) red flag variables (injury is strain/sprain only, claimant has history of previous claim) Used an enhancement on CART known as boosting

35 Red Flag Predictor Variables

36 Claim File Variables

37 Neural Network Measure of Variable Importance Look at weights to hidden layer Compute sensitivities: a measure of how much the predicted value’s error increases when the variables are excluded from the model one at a time

38 Variable Importance

39 Testing: Hold Out Part of Sample Fit model on 1/2 to 2/3 of data Test fit of model on remaining data Need a large sample

40 Testing: Cross-Validation Hold out 1/n (say 1/10) of data Fit model to remaining data Test on portion of sample held out Do this n (say 10) times and average the results Used for moderate sample sizes Jacknifing similar to cross-validation

41 Results of Classification on Test Data

42 Unsupervised Learning Common Method: Clustering No dependent variable – records are grouped into classes with similar values on the variable Start with a measure of similarity or dissimilarity Maximize dissimilarity between members of different clusters

43 Dissimilarity (Distance) Measure Euclidian Distance Manhattan Distance

44 Binary Variables

45 Binary Variables Sample Matching Rogers and Tanimoto

46 Results for 2 Clusters

47 Beginners Library Berry, Michael J. A., and Linoff, Gordon, Data Mining Techniques, John Wiley and Sons, 1997 Kaufman, Leonard and Rousseeuw, Peter, Finding Groups in Data, John Wiley and Sons, 1990 Smith, Murry, Neural Networks for Statistical Modeling, International Thompson Computer Press, 1996

Data Mining CAMAR Spring Meeting Louise Francis, FCAS, MAAA