If They Cheat, Can We Catch Them With Predictive Modeling Richard A. Derrig, PhD, CFE President Opal Consulting, LLC Senior Consultant Insurance Fraud.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Evaluating Inforce Blocks Of Disability Business With Predictive Modeling SOA Spring Health Meeting May 28, 2008 Jonathan Polon FSA
An Overview of Machine Learning
The Economics of Insurance Fraud Investigation: Evidence of a Nash Equilibrium Stephen P. D’Arcy, FCAS University of Illinois Richard A. Derrig Ph.D. OPAL.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
What is Statistical Modeling
1. Abstract 2 Introduction Related Work Conclusion References.
Richard A. Derrig Ph. D. OPAL Consulting LLC Visiting Scholar, Wharton School University of Pennsylvania New Applications of Statistics and Data Mining.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Shipi Kankane Prashanth Nakirekommula.  Applying analytics and risk- management capabilities to health insurance through LexisNexis data platforms. 
Chapter 9 Business Intelligence Systems
Chapter 9 Competitive Advantage with Information Systems for Decision Making © 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke.
Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting Louise Francis,
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Data Mining – Best Practices CAS 2008 Spring Meeting Quebec City, Canada Louise Francis, FCAS, MAAA
Richard A. Derrig Ph. D. OPAL Consulting LLC Visiting Scholar, Wharton School University of Pennsylvania Fraud Fighting Actuaries Mathematical Models for.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
EFFECTIVE PREDICTIVE MODELING- DATA,ANALYTICS AND PRACTICE MANAGEMENT Richard A. Derrig Ph.D. OPAL Consulting LLC Karthik Balakrishnan Ph.D. ISO Innovative.
Moderator: Richard Derrig Automobile Insurers Bureau of Massachusetts Insurance Fraud Bureau of Massachusetts Martin Ellingsworth.
Richard A. Derrig Ph. D. OPAL Consulting LLC Visiting Scholar, Wharton School University of Pennsylvania Daniel Finnegan Quality Planning Corp Innovative.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Data Mining Techniques
Overview DM for Business Intelligence.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
Predictive Modeling Project Stephen P. D’Arcy Professor of Finance University of Illinois at Urbana-Champaign ORMIR Presentation October 26, 2005.
The Community Insurance Fraud Initiative (CIFI) The Use of a Unique Insurance Database A 5 Year Retrospective Daniel J. Johnston President, Automobile.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Anomaly detection with Bayesian networks Website: John Sandiford.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
The BI Settlement Process and Structure of Negotiated Payments Richard A. Derrig Automobile Insurers Bureau of MA Herbert I. Weisberg Correlation Research.
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
NEURAL NETWORKS FOR DATA MINING
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Data Mining – Best Practices Part #2 Richard Derrig, PhD, Opal Consulting LLC CAS Spring Meeting June 16-18, 2008.
Predictive Modeling CAS Reinsurance Seminar May 7, 2007 Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining,
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
Review of Fraud Classification Using Principal Components Analysis of RIDITS By Louise A. Francis Francis Analytics and Actuarial Data Mining, Inc.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Data Mining and Decision Support
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CSE 4705 Artificial Intelligence
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
DATA MINING © Prentice Hall.
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Validating the PRIDIT method for determining hospital quality with outcomes data Robert Lieberthal, PhD, Dominique Comer, PharmD, Katherine O’Connell,
Dimension Reduction in Workers Compensation
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

If They Cheat, Can We Catch Them With Predictive Modeling Richard A. Derrig, PhD, CFE President Opal Consulting, LLC Senior Consultant Insurance Fraud Bureau of Massachusetts CAS Predictive Modeling October 11-12, 2007

Insurance Fraud- The Problem  ISO/IRC 2001 Study: Auto and Workers Compensation Fraud a Big Problem by 27% of Insurers.  CAIF: Estimation (too large)  Mass IFB: 1,500 referrals annually for Auto, WC, and Other P-L lines.

Fraud Definition PRINCIPLES  Clear and willful act  Proscribed by law  Obtaining money or value  Under false pretenses Abuse, Unethical:Fails one or more Principles

HOW MUCH CLAIM FRAUD? (CRIMINAL or CIVIL?)

10% Fraud

REAL PROBLEM-CLAIM FRAUD  Classify all claims  Identify valid classes Pay the claim No hassle Visa Example  Identify (possible) fraud Investigation needed  Identify “gray” classes Minimize with “learning” algorithms

Company Automation - Data Mining  Data Mining/Predictive Modeling Automates Record Reviews  No Data Mining without Good Clean Data (90% of the solution)  Insurance Policy and Claim Data; Business and Demographic Data  Data Warehouse/Data Mart  Data Manipulation – Simple First; Complex Algorithms When Needed

DATA

Computers advance

 Experience and Judgment  Artificial Intelligence Systems Regression & Tree Models Fuzzy Clusters Neural Networks Expert Systems Genetic Algorithms All of the Above FRAUD IDENTIFICATION

MATHEMATICAL MODELS  Databases Vector Spaces Topological Spaces Stochastic Processes  Scoring Mappings to R Linear Functionals

DM Databases Scoring Functions Graded Output Non-Suspicious Claims Routine Claims Suspicious Claims Complicated Claims

DM Databases Scoring Functions Graded Output Non-Suspicious Risks Routine Underwriting Suspicious Risks Non-Routine Underwriting

POTENTIAL VALUE OF A PREDICTIVE MODELING SCORING SYSTEM  Screening to Detect Fraud Early  Auditing of Closed Claims to Measure Fraud, Both Kinds  Sorting to Select Efficiently among Special Investigative Unit Referrals  Providing Evidence to Support a Denial  Protecting against Bad-Faith

PREDICTIVE MODELING SOME PUBLIC TECHNIQUES  Fuzzy Logic and Controllers  Regression Scoring Systems  Unsupervised Techniques: Kohonen and PRIDIT  EM Algorithm (Medical Bills)  Tree-based Methods

FUZZY SETS COMPARED WITH PROBABILITY  Probability: Measures randomness; Measures whether or not event occurs; and Randomness dissipates over time or with further knowledge.  Fuzziness: Measures vagueness in language; Measures extent to which event occurs; and Vagueness does not dissipate with time or further knowledge.

Fuzzy Clustering & Detection: k-Means Clustering  Fuzzy Logic: True, False, Uncertain  Fuzzy Numbers: Membership Value  Fuzzy Clusters: Partial Membership  App1: Suspicion of Fraud  App2: Town (Zip) Rating Classes  REF: Derrig-Ostazewski 1995

FUZZY SETS TOWN RATING CLASSIFICATION  When is one Town near another for Auto Insurance Rating? -Geographic Proximity (Traditional) -Overall Cost Index (Massachusetts) - Geo-Smoothing (Experimental)  Geographically close Towns do not have the same Expected Losses.  Clusters by Cost Produce Border Problems: Nearby Towns Different Rating Territories. Fuzzy Clusters acknowledge the Borders.  Are all coverage clusters correct for each Insurance Coverage?  Fuzzy Clustering on Five Auto Coverage Indices is better and demonstrates a weakness in Overall Crisp Clustering.

Fuzzy Clustering of Fraud Study Claims by Assessment Data Membership Value Cut at 0.2

AIB FRAUD INDICATORS  Accident Characteristics (19) No report by police officer at scene No witnesses to accident  Claimant Characteristics (11) Retained an attorney very quickly Had a history of previous claims  Insured Driver Characteristics(8) Had a history of previous claims Gave address as hotel or P.O. Box

Supervised Models Regression: Fraud Indicators  Fraud Indicators Serve as Independent Dummy Variables  Expert Evaluation Categories Serve as Dependent Target  Regression Scoring Systems  REF1: Weisberg-Derrig, 1998  REF2: Viaene et al., 2002

Unsupervised Models Kohonen Self-Organizing Features  Fraud Indicators Serve as Independent “Features”  Expert Evaluation Categories Can Serve as Dependent Target in Second Phase  Self-Organizing Feature Maps T. Kohonen (Cybernetics) Reference vectors map to OUTPUT format in topologically faithful way. Example: Map onto 40x40 2-dimensional square. Iterative Process Adjusts All Reference Vectors in a “Neighborhood” of the Nearest One. Neighborhood Size Shrinks over Iterations

MAPPING: PATTERNS-TO-UNITS Patterns

KOHONEN FEATURE MAP SUSPICION LEVELS

FEATURE MAP SIMILIARITY OF A CLAIM

DATA MODELING EXAMPLE: CLUSTERING  Data on 16,000 Medicaid providers analyzed by unsupervised neural net  Neural network clustered Medicaid providers based on 100+ features  Investigators validated a small set of known fraudulent providers  Visualization tool displays clustering, showing known fraud and abuse  Subset of 100 providers with similar patterns investigated: Hit rate > 70% Cube size proportional to annual Medicaid revenues © 1999 Intelligent Technologies Corporation

SELF ORGANIZING MAP  Binary Features  d(m,0) = Suspicion Level  Given c = {features}  c → m c “Guilt by Association”

PRIDIT Unique Embedded Score  Data: Features have no natural metric-scale  Model: Stochastic process has no parametric form  Classification: Inverse image of one dimensional scoring function and decision rule  Feature Value: Identify which features are “important”

PRIDIT METHOD OVERVIEW  1. DATA: N Claims, T Features, K sub T Responses, Monotone In “Fraud”  2. RIDIT score each possible response: proportion below minus proportion above, score centered at zero.  3. RESPONSE WEIGHTS: Principal Component of Claims x Features with RIDIT in Cells (SPSS, SAS or S Plus software)  4. SCORE: Sum weights x claim Ridit score.  5. PARTITION: above and below zero.

PRIDIT METHOD RESULTS  1. DATA: N Claims Clustered in Varying Degrees of “Fraud”  2. FEATURES: Each Feature Has Importance Weight  3. CLUSTERS: Size Can Change With Experience  4. SCORE: Can Be Checked On New Database  5. DECISIONS: Scores Near Zero = Too Little Information.

TABLE 1 Computation of RIDIT Scores Variable Label Proportion of “Yes” B t1 ("Yes") B t2 (“No") Large # of Visits to ChiropractorTRT144% Chiropractor provided 3 or more modalities on most visits TRT212% Large # of visits to a physical therapist TRT38% MRI or CT scan but no inpatient hospital charges TRT420% Use of “high volume” medical provider TRT531% Significant gaps in course of treatment TRT69% Treatment was unusually prolonged (> 6 months) TRT724% Indep. Medical examiner questioned extent of treatment TRT811% Medical audit raised questions about charges TRT94%

TABLE 3 PRIDIT Transformed Indicators, Scores and Classes ClaimTRT 1 TRT 2 TRT 3 TRT 4 TRT 5 TRT 6 TRT 7 TRT 8 TRT 9 ScoreClass

TABLE 2 Weights for Treatment Variables Variable PRIDIT Weights W (∞) Regression Weights TRT *** TRT *** TRT *** TRT TRT * TRT TRT TRT *** TRT ** Regression significance shown at 1% (***), 5% (**) or 10% (*) levels.

TABLE 3 PRIDIT Transformed Indicators, Scores and Classes ClaimTRT 1 TRT 2 TRT 3 TRT 4 TRT 5 TRT 6 TRT 7 TRT 8 TRT 9 ScoreClass

TABLE 7 AIB Fraud and Suspicion Score Data Top 10 Fraud Indicators by Weight PRIDITAdj. Reg. ScoreInv. Reg. Score ACC3ACC1ACC11 ACC4ACC9CLT4 ACC15ACC10CLT7 CLT11ACC19CLT11 INJ1CLT11INJ1 INJ2INS6INJ3 INJ5INJ2INJ8 INJ6INJ9INJ11 INS8TRT1 LW6TRT9

EM Algorithm Hidden Exposures - Overview  Modeling hidden risk exposures as additional dimension(s) of the loss severity distribution via EM, Expectation-Maximization, Algorithm  Considering the mixtures of probability distributions as the model for losses affected by hidden exposures with some parameters of the mixtures considered missing (i.e., unobservable in practice)  Approach is feasible due to advancements in the computer driven methodologies dealing with partially hidden or incomplete data models  Empirical data imputation has become more sophisticated and the availability of ever faster computing power have made it increasingly possible to solve these problems via iterative algorithms

Figure 1: Overall distribution of the 348 BI medical bill amounts from Appendix B compared with that submitted by provider A. Left panel: frequency histograms (provider A’s histogram in filled bars). Right panel: density estimators (provider A’s density in dashed line) Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 9, 11/18/02

Figure 2: EM Fit Left panel: mixture of normal distributions fitted via the EM algorithm to BI data Right panel: Three normal components of the mixture. Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 13, 11/18/02

Decision Trees  In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure. 

Different Kinds of Decision Trees  Single Trees (CART, CHAID)  Ensemble Trees, a more recent development (TREENET, RANDOM FOREST) A composite or weighted average of many trees (perhaps 100 or more) There are many methods to fit the trees and prevent overfitting  Boosting: Iminer Ensemble and Treenet  Bagging: Random Forest

The Methods and Software Evaluated 1) TREENET5) Iminer Ensemble 2) Iminer Tree6) Random Forest 3) SPLUS Tree7) Naïve Bayes (Baseline) 4) CART8) Logistic (Baseline)

Ensemble Prediction of IME Requested

TREENET ROC Curve – IME AUROC = 0.701

Ranking of Methods/Software – 1 st Two Surrogates

Implementation Outline Included at End

NON-CRIMINAL FRAUD?

 General Deterrence – Ins. System, Ins Dept + DIA, Med, and Other Government Oversight  Specific Deterrence – Company SIU, Auditor, Data, Predictive Modeling for Claims and Underwriting. Non-Criminal Fraud

FRAUD INDICATORS VALIDATION PROCEDURES  Canadian Coalition Against Insurance Fraud (1997) 305 Fraud Indicators (45 vehicle theft)  “No one indicator by itself is necessarily suspicious”.  Problem: How to validate the systematic use of Fraud Indicators?  Solution 1: Kohonen Self-Organizing Feature Map (Brockett et.al, 1998  Solution 2: Logistic Regression (Viaene, et.al, 2002)  Solution 3: PRIDIT Method (Brockett et.al, 2002)  Solution 4: Regression Tree Methods (Derrig, Francis, 2007)

REFERENCES  Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark, (2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk and Insurance, 69:3,  Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen’ Self- Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk and Insurance, 65:  Derrig, Richard A., (2002), Insurance Fraud, Journal of Risk and Insurance, 69:3,  Derrig, Richard A., and Ostaszewski, K., (1995) Fuzzy Techniques of Pattern Recognition in Risk & Claim Classification, 62:3,  Francis, Louise and Derrig, Richard A., (2007) Distinguishing the Forest from the TREES: Working Paper.  Rempala, G.A., and Derrig, Richard A., (2003), Modeling Hidden Exposures in Claim Severity via the EM Algorithm, North American Actuarial Journal, 9(2),  Viaene, S., Derrig, Richard A. et. al, (2002) A Comparison of State-of-the-Art Classification Techniques for Expert Automobile Insurance Fraud Detection, Journal of Risk and Insurance, 69:3,  Weisberg, H.I. and Derrig R.A., (1998), Quantitative Methods for Detecting Fraudulent Automobile Bodily Injury Claims, RISQUES, vol. 35, pp , July-September (In French, English available)

Claim Fraud Detection Plan  STEP 1:SAMPLE: Systematic benchmark of a random sample of claims.  STEP 2:FEATURES: Isolate red flags and other sorting characteristics  STEP 3:FEATURE SELECTION: Separate features into objective and subjective, early, middle and late arriving, acquisition cost levels, and other practical considerations.  STEP 4:CLUSTER: Apply unsupervised algorithms (Kohonen, PRIDIT, Fuzzy) to cluster claims, examine for needed homogeneity.

Claim Fraud Detection Plan  STEP 5:ASSESSMENT: Externally classify claims according to objectives for sorting.  STEP 6:MODEL: Supervised models relating selected features to objectives (logistic regression, Naïve Bayes, Neural Networks, CART, MARS)  STEP7:STATIC TESTING: Model output versus expert assessment, model output versus cluster homogeneity (PRIDIT scores) on one or more samples.  STEP 8:DYNAMIC TESTING: Real time operation of acceptable model, record outcomes, repeat steps 1-7 as needed to fine tune model and parameters. Use PRIDIT to show gain or loss of feature power and changing data patterns, tune investigative proportions to optimize detection and deterrence of fraud and abuse.