Data Mining – Best Practices Part #2 Richard Derrig, PhD, Opal Consulting LLC CAS Spring Meeting June 16-18, 2008.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
Slides from: Doug Gray, David Poole
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
What is Statistical Modeling
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
x – independent variable (input)
Sparse vs. Ensemble Approaches to Supervised Learning
Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting Louise Francis,
Learning From Data Chichang Jou Tamkang University.
Lecture 5 (Classification with Decision Trees)
Data Mining – Best Practices CAS 2008 Spring Meeting Quebec City, Canada Louise Francis, FCAS, MAAA
Three kinds of learning
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
EFFECTIVE PREDICTIVE MODELING- DATA,ANALYTICS AND PRACTICE MANAGEMENT Richard A. Derrig Ph.D. OPAL Consulting LLC Karthik Balakrishnan Ph.D. ISO Innovative.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Chapter 5 Data mining : A Closer Look.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
Evaluating Performance for Data Mining Techniques
Richard A. Derrig Ph. D. OPAL Consulting LLC Visiting Scholar, Wharton School University of Pennsylvania Daniel Finnegan Quality Planning Corp Innovative.
Overview DM for Business Intelligence.
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Chapter 9 – Classification and Regression Trees
NEURAL NETWORKS FOR DATA MINING
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
Today Ensemble Methods. Recap of the course. Classifier Fusion
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Predictive Modeling CAS Reinsurance Seminar May 7, 2007 Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining,
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
If They Cheat, Can We Catch Them With Predictive Modeling Richard A. Derrig, PhD, CFE President Opal Consulting, LLC Senior Consultant Insurance Fraud.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining and Decision Support
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS,
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Transformation: Normalization
DATA MINING © Prentice Hall.
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 7: Transformations
Machine Learning with Clinical Data
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Data Mining – Best Practices Part #2 Richard Derrig, PhD, Opal Consulting LLC CAS Spring Meeting June 16-18, 2008

Data Mining  Data Mining, also known as Knowledge- Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition. 

AGENDA  Predictive v Explanatory Models  Discussion of Methods  Example: Explanatory Models for Decision to Investigate Claims  The “Importance” of Explanatory and Predictive Variables  An Eight Step Program for Building a Successful Model

Predictive v Explanatory Models  Both are of the form: Target or Dependent Variable is a Function of Feature or Independent Variables that are related to the Target Variable  Explanatory Models assume all Variables are Contemporaneous and Known  Predictive Models assume all Variables are Contemporaneous and Estimable

Desirable Properties of a Data Mining Method:  Any nonlinear relationship between target and features can be approximated  A method that works when the form of the nonlinearity is unknown  The effect of interactions can be easily determined and incorporated into the model  The method generalizes well on out-of sample data

Major Kinds of Data Mining Methods  Supervised learning  Most common situation  Target variable  Frequency  Loss ratio  Fraud/no fraud  Some methods  Regression  Decision Trees  Some neural networks  Unsupervised learning  No Target variable  Group like records together-Clustering  A group of claims with similar characteristics might be more likely to be of similar risk of loss  Ex: Territory assignment,  Some methods  PRIDIT  K-means clustering  Kohonen neural networks

The Supervised Methods and Software Evaluated 1) TREENET7) Iminer Ensemble 2) Iminer Tree8) MARS 3) SPLUS Tree9) Random Forest 4) CART10) Exhaustive Chaid 5) S-PLUS Neural11) Naïve Bayes (Baseline) 6) Iminer Neural 12) Logistic reg ( (Baseline)

Decision Trees  In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure. 

CART – Example of 1 st split on Provider 2 Bill, With Paid as Dependent  For the entire database, total squared deviation of paid losses around the predicted value (i.e., the mean) is 4.95x1013. The SSE declines to 4.66x10 13 after the data are partitioned using $5,021 as the cutpoint.  Any other partition of the provider bill produces a larger SSE than 4.66x For instance, if a cutpoint of $10,000 is selected, the SSE is 4.76*10 13.

Different Kinds of Decision Trees  Single Trees (CART, CHAID)  Ensemble Trees, a more recent development (TREENET, RANDOM FOREST)  A composite or weighted average of many trees (perhaps 100 or more)  There are many methods to fit the trees and prevent overfitting  Boosting: Iminer Ensemble and Treenet  Bagging: Random Forest

Neural Networks =

 Self-Organizing Feature Maps  T. Kohonen (Cybernetics)  Reference vectors of features map to OUTPUT format in topologically faithful way. Example: Map onto 40x40 2- dimensional square.  Iterative Process Adjusts All Reference Vectors in a “Neighborhood” of the Nearest One. Neighborhood Size Shrinks over Iterations NEURAL NETWORKS

FEATURE MAP SUSPICION LEVELS

FEATURE MAP SIMILIARITY OF A CLAIM

DATA MODELING EXAMPLE: CLUSTERING  Data on 16,000 Medicaid providers analyzed by unsupervised neural net  Neural network clustered Medicaid providers based on 100+ features  Investigators validated a small set of known fraudulent providers  Visualization tool displays clustering, showing known fraud and abuse  Subset of 100 providers with similar patterns investigated: Hit rate > 70% Cube size proportional to annual Medicaid revenues © 1999 Intelligent Technologies Corporation

Multiple Adaptive Regression Splines (MARS)  MARS fits a piecewise linear regression  BF1 = max(0, X – 1,401.00)  BF2 = max(0, 1, X )  BF3 = max(0, X )  Y = E-03 * BF E-03 * BF E-03 * BF3; BF1 is basis function  BF1, BF2, BF3 are basis functions  MARS uses statistical optimization to find best basis function(s)  Basis function similar to dummy variable in regression. Like a combination of a dummy indicator and a linear independent variable

Baseline Methods: Naive Bayes Classifier Logistic Regression  Naive Bayes assumes feature (predictor) variables) independence conditional on each category  Logistic Regression assumes target is linear in the logs of the feature (predictor) variables

REAL CLAIM FRAUD DETECTION PROBLEM  Classify all claims  Identify valid classes  Pay the claim  No hassle  Visa Example  Identify (possible) fraud  Investigation needed  Identify “gray” classes  Minimize with “learning” algorithms

The Fraud Surrogates used as Target Decision Variables  Independent Medical Exam (IME) requested  Special Investigation Unit (SIU) referral  IME successful  SIU successful  DATA: Detailed Auto Injury Closed Claim Database for Massachusetts  Accident Years ( )

DM Databases Scoring Functions Graded Output Non-Suspicious Claims Routine Claims Suspicious Claims Complicated Claims

ROC Curve Area Under the ROC Curve  Want good performance both on sensitivity and specificity  Sensitivity and specificity depend on cut points chosen for binary target (yes/no)  Choose a series of different cut points, and compute sensitivity and specificity for each of them  Graph results  Plot sensitivity vs 1-specifity  Compute an overall measure of “lift”, or area under the curve

True/False Positives and True/False Negatives: The “Confusion” Matrix  Choose a “cut point” in the model score.  Claims > cut point, classify “yes”.

TREENET ROC Curve – IME AUROC = 0.701

Logistic ROC Curve – IME AUROC = 0.643

Ranking of Methods/Software – IME Requested

Variable Importance (IME) Based on Average of Methods

Results for IME Requested

Ranking of Methods/Software – 1 st Two Surrogates

Ranking of Methods/Software – Last Two Surrogates

Plot of AUROC for SIU vs. IME Decision

Plot of AUROC for SIU vs IME Favorable

Claim Fraud Detection Plan  STEP 1:SAMPLE: Systematic benchmark of a random sample of claims.  STEP 2:FEATURES: Isolate red flags and other sorting characteristics  STEP 3:FEATURE SELECTION: Separate features into objective and subjective, early, middle and late arriving, acquisition cost levels, and other practical considerations.  STEP 4:CLUSTER: Apply unsupervised algorithms (Kohonen, PRIDIT, Fuzzy) to cluster claims, examine for needed homogeneity.

Claim Fraud Detection Plan  STEP 5:ASSESSMENT: Externally classify claims according to objectives for sorting.  STEP 6:MODEL: Supervised models relating selected features to objectives (logistic regression, Naïve Bayes, Neural Networks, CART, MARS)  STEP7:STATIC TESTING: Model output versus expert assessment, model output versus cluster homogeneity (PRIDIT scores) on one or more samples.  STEP 8:DYNAMIC TESTING: Real time operation of acceptable model, record outcomes, repeat steps 1-7 as needed to fine tune model and parameters. Use PRIDIT to show gain or loss of feature power and changing data patterns, tune investigative proportions to optimize detection and deterrence of fraud and abuse.