Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting www.opalconsulting.com Louise Francis, FCAS,

Slides:



Advertisements
Similar presentations
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
Advertisements

R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK 2 nd May 2013.
Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
x – independent variable (input)
Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting Louise Francis,
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Learning From Data Chichang Jou Tamkang University.
Data Mining – Best Practices CAS 2008 Spring Meeting Quebec City, Canada Louise Francis, FCAS, MAAA
Three kinds of learning
EFFECTIVE PREDICTIVE MODELING- DATA,ANALYTICS AND PRACTICE MANAGEMENT Richard A. Derrig Ph.D. OPAL Consulting LLC Karthik Balakrishnan Ph.D. ISO Innovative.
Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.
Ensemble Learning (2), Tree and Forest
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS 2007 Ratemaking Seminar Louise Francis, FCAS Francis Analytics and.
2006 CAS RATEMAKING SEMINAR CONSIDERATIONS FOR SMALL BUSINESSOWNERS POLICIES (COM-3) Beth Fitzgerald, FCAS, MAAA.
AbstractAbstract Adjusting Active Basis Model by Regularized Logistic Regression Ruixun Zhang Peking University, Department of Statistics and Probability.
SIMPLE LINEAR REGRESSION
Next Generation Techniques: Trees, Network and Rules
Predictive Modeling Project Stephen P. D’Arcy Professor of Finance University of Illinois at Urbana-Champaign ORMIR Presentation October 26, 2005.
Introduction Mohammad Beigi Department of Biomedical Engineering Isfahan University
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining By Dave Maung.
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
Linear Regression. Fitting Models to Data Linear Analysis Decision Trees.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Data Mining – Best Practices Part #2 Richard Derrig, PhD, Opal Consulting LLC CAS Spring Meeting June 16-18, 2008.
Recursive Binary Partitioning Old Dogs with New Tricks KDD Conference 2009 David J. Slate and Peter W. Frey.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Predictive Modeling CAS Reinsurance Seminar May 7, 2007 Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining,
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Data Mining Database Systems Timothy Vu. 2 Mining Mining is the extraction of valuable minerals or other geological materials from the earth, usually.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
If They Cheat, Can We Catch Them With Predictive Modeling Richard A. Derrig, PhD, CFE President Opal Consulting, LLC Senior Consultant Insurance Fraud.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Konstantina Christakopoulou Liang Zeng Group G21
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Dancing With Dirty Data: Methods for Exploring and Cleaning Data 2005 CAS Ratemaking Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial.
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Dimension Reduction in Workers Compensation
Boosting and Additive Trees (2)
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
FOUNDATIONS OF BUSINESS ANALYTICS Introduction to Machine Learning
A machine learning approach to prognostic and predictive covariate identification for subgroup analysis David A. James and David Ohlssen Advanced Exploratory.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Is Statistics=Data Science
Presentation transcript:

Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Data Mining Data Mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition.

The Problem: Nonlinear Functions

An Insurance Nonlinear Function: Provider Bill vs. Probability of Independent Medical Exam

Common Links for GLMs The identity link: h(Y) = Y The log link: h(Y) = ln(Y) The inverse link: h(Y) = The logit link: h(Y) = The probit link: h(Y) =

Nonlinear Example Data Provider 2 Bill (Bonded) Avg Provider 2 Bill Avg Total PaidPercent IME Zero-9,0636% 1 – ,7618% 251 – ,7269% 501 – 1, ,46910% 1,001 – 1,5001,24314,99813% 1,501 – 2,5001,91517,28914% 2,501 – 5,0003,30023,99415% 5,001 – 10,0006,72047,72815% 10, ,35083,26115% All Claims54511,2248%

Desirable Features of a Data Mining Method: Any nonlinear relationship can be approximated A method that works when the form of the nonlinearity is unknown The effect of interactions can be easily determined and incorporated into the model The method generalizes well on out-of sample data

Decision Trees In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure.

Different Kinds of Decision Trees Single Trees (CART, CHAID) Ensemble Trees, a more recent development (TREENET, RANDOM FOREST) A composite or weighted average of many trees (perhaps 100 or more) There are many methods to fit the trees and prevent overfitting Boosting: Iminer Ensemble and Treenet Bagging: Random Forest

The Methods and Software Evaluated 1) TREENET5) Iminer Ensemble 2) Iminer Tree6) Random Forest 3) SPLUS Tree7) Naïve Bayes (Baseline) 4) CART8) Logistic (Baseline)

CART Example of Parent and Children Nodes Total Paid as a Function of Provider 2 Bill

CART Example with Seven Nodes IME Proportion as a Function of Provider 2 Bill

CART Example with Seven Step Functions IME Proportions as a Function of Provider 2 Bill

Ensemble Prediction of Total Paid

Ensemble Prediction of IME Requested

Naive Bayes Naive Bayes assumes conditional independence Probability that an observation will have a specific set of values for the independent variables is the product of the conditional probabilities of observing each of the values given category cj

Bayes Predicted Probability IME Requested vs. Quintile of Provider 2 Bill

Naïve Bayes Predicted IME vs. Provider 2 Bill

The Fraud Surrogates used as Dependent Variables Independent Medical Exam (IME) requested Special Investigation Unit (SIU) referral IME successful SIU successful DATA: Detailed Auto Injury Claim Database for Massachusetts Accident Years ( )

Results for IME Requested

Results for IME Favorable

Results for SIU Referral

Results for SIU Favorable

TREENET ROC Curve – IME AUROC = 0.701

TREENET ROC Curve – SIU AUROC = 0.677

Logistic ROC Curve – IME AUROC = 0.643

Logistic ROC Curve – SIU AUROC = 0.612

Ranking of Methods/Software – 1 st Two Surrogates

Ranking of Methods/Software – Last Two Surrogates

Plot of AUROC for SIU vs. IME Decision

Plot of AUROC for SIU vs. IME Favorable

References Brieman, L., J. Freidman, R. Olshen, and C. Stone, Classification and Regression Trees, Chapman Hall, Friedman, J., Greedy Function Approximation: The Gradient Boosting Machine, Annals of Statistics, Hastie, T., R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer, New York., Kantardzic, M., Data Mining, John Wiley and Sons, 2003.