Download presentation
Presentation is loading. Please wait.
Published byBertha Phillips Modified over 9 years ago
1
If They Cheat, Can We Catch Them With Predictive Modeling Richard A. Derrig, PhD, CFE President Opal Consulting, LLC Senior Consultant Insurance Fraud Bureau of Massachusetts CAS Predictive Modeling October 11-12, 2007
2
Insurance Fraud- The Problem ISO/IRC 2001 Study: Auto and Workers Compensation Fraud a Big Problem by 27% of Insurers. CAIF: Estimation (too large) Mass IFB: 1,500 referrals annually for Auto, WC, and Other P-L lines.
3
Fraud Definition PRINCIPLES Clear and willful act Proscribed by law Obtaining money or value Under false pretenses Abuse, Unethical:Fails one or more Principles
4
HOW MUCH CLAIM FRAUD? (CRIMINAL or CIVIL?)
5
10% Fraud
6
REAL PROBLEM-CLAIM FRAUD Classify all claims Identify valid classes Pay the claim No hassle Visa Example Identify (possible) fraud Investigation needed Identify “gray” classes Minimize with “learning” algorithms
7
Company Automation - Data Mining Data Mining/Predictive Modeling Automates Record Reviews No Data Mining without Good Clean Data (90% of the solution) Insurance Policy and Claim Data; Business and Demographic Data Data Warehouse/Data Mart Data Manipulation – Simple First; Complex Algorithms When Needed
8
DATA
9
Computers advance
10
Experience and Judgment Artificial Intelligence Systems Regression & Tree Models Fuzzy Clusters Neural Networks Expert Systems Genetic Algorithms All of the Above FRAUD IDENTIFICATION
11
MATHEMATICAL MODELS Databases Vector Spaces Topological Spaces Stochastic Processes Scoring Mappings to R Linear Functionals
12
DM Databases Scoring Functions Graded Output Non-Suspicious Claims Routine Claims Suspicious Claims Complicated Claims
13
DM Databases Scoring Functions Graded Output Non-Suspicious Risks Routine Underwriting Suspicious Risks Non-Routine Underwriting
14
POTENTIAL VALUE OF A PREDICTIVE MODELING SCORING SYSTEM Screening to Detect Fraud Early Auditing of Closed Claims to Measure Fraud, Both Kinds Sorting to Select Efficiently among Special Investigative Unit Referrals Providing Evidence to Support a Denial Protecting against Bad-Faith
15
PREDICTIVE MODELING SOME PUBLIC TECHNIQUES Fuzzy Logic and Controllers Regression Scoring Systems Unsupervised Techniques: Kohonen and PRIDIT EM Algorithm (Medical Bills) Tree-based Methods
16
FUZZY SETS COMPARED WITH PROBABILITY Probability: Measures randomness; Measures whether or not event occurs; and Randomness dissipates over time or with further knowledge. Fuzziness: Measures vagueness in language; Measures extent to which event occurs; and Vagueness does not dissipate with time or further knowledge.
17
Fuzzy Clustering & Detection: k-Means Clustering Fuzzy Logic: True, False, Uncertain Fuzzy Numbers: Membership Value Fuzzy Clusters: Partial Membership App1: Suspicion of Fraud App2: Town (Zip) Rating Classes REF: Derrig-Ostazewski 1995
18
FUZZY SETS TOWN RATING CLASSIFICATION When is one Town near another for Auto Insurance Rating? -Geographic Proximity (Traditional) -Overall Cost Index (Massachusetts) - Geo-Smoothing (Experimental) Geographically close Towns do not have the same Expected Losses. Clusters by Cost Produce Border Problems: Nearby Towns Different Rating Territories. Fuzzy Clusters acknowledge the Borders. Are all coverage clusters correct for each Insurance Coverage? Fuzzy Clustering on Five Auto Coverage Indices is better and demonstrates a weakness in Overall Crisp Clustering.
20
Fuzzy Clustering of Fraud Study Claims by Assessment Data Membership Value Cut at 0.2
21
AIB FRAUD INDICATORS Accident Characteristics (19) No report by police officer at scene No witnesses to accident Claimant Characteristics (11) Retained an attorney very quickly Had a history of previous claims Insured Driver Characteristics(8) Had a history of previous claims Gave address as hotel or P.O. Box
22
Supervised Models Regression: Fraud Indicators Fraud Indicators Serve as Independent Dummy Variables Expert Evaluation Categories Serve as Dependent Target Regression Scoring Systems REF1: Weisberg-Derrig, 1998 REF2: Viaene et al., 2002
23
Unsupervised Models Kohonen Self-Organizing Features Fraud Indicators Serve as Independent “Features” Expert Evaluation Categories Can Serve as Dependent Target in Second Phase Self-Organizing Feature Maps T. Kohonen 1982-1990 (Cybernetics) Reference vectors map to OUTPUT format in topologically faithful way. Example: Map onto 40x40 2-dimensional square. Iterative Process Adjusts All Reference Vectors in a “Neighborhood” of the Nearest One. Neighborhood Size Shrinks over Iterations
24
MAPPING: PATTERNS-TO-UNITS Patterns
25
KOHONEN FEATURE MAP SUSPICION LEVELS
26
FEATURE MAP SIMILIARITY OF A CLAIM
27
DATA MODELING EXAMPLE: CLUSTERING Data on 16,000 Medicaid providers analyzed by unsupervised neural net Neural network clustered Medicaid providers based on 100+ features Investigators validated a small set of known fraudulent providers Visualization tool displays clustering, showing known fraud and abuse Subset of 100 providers with similar patterns investigated: Hit rate > 70% Cube size proportional to annual Medicaid revenues © 1999 Intelligent Technologies Corporation
28
SELF ORGANIZING MAP Binary Features d(m,0) = Suspicion Level Given c = {features} c → m c “Guilt by Association”
29
PRIDIT Unique Embedded Score Data: Features have no natural metric-scale Model: Stochastic process has no parametric form Classification: Inverse image of one dimensional scoring function and decision rule Feature Value: Identify which features are “important”
30
PRIDIT METHOD OVERVIEW 1. DATA: N Claims, T Features, K sub T Responses, Monotone In “Fraud” 2. RIDIT score each possible response: proportion below minus proportion above, score centered at zero. 3. RESPONSE WEIGHTS: Principal Component of Claims x Features with RIDIT in Cells (SPSS, SAS or S Plus software) 4. SCORE: Sum weights x claim Ridit score. 5. PARTITION: above and below zero.
31
PRIDIT METHOD RESULTS 1. DATA: N Claims Clustered in Varying Degrees of “Fraud” 2. FEATURES: Each Feature Has Importance Weight 3. CLUSTERS: Size Can Change With Experience 4. SCORE: Can Be Checked On New Database 5. DECISIONS: Scores Near Zero = Too Little Information.
32
TABLE 1 Computation of RIDIT Scores Variable Label Proportion of “Yes” B t1 ("Yes") B t2 (“No") Large # of Visits to ChiropractorTRT144%-.56.44 Chiropractor provided 3 or more modalities on most visits TRT212%-.88.12 Large # of visits to a physical therapist TRT38%-.92.08 MRI or CT scan but no inpatient hospital charges TRT420%-.80.20 Use of “high volume” medical provider TRT531%-.69.31 Significant gaps in course of treatment TRT69%-.91.09 Treatment was unusually prolonged (> 6 months) TRT724%-.76.24 Indep. Medical examiner questioned extent of treatment TRT811%-.89.11 Medical audit raised questions about charges TRT94%-.96.04
33
TABLE 3 PRIDIT Transformed Indicators, Scores and Classes ClaimTRT 1 TRT 2 TRT 3 TRT 4 TRT 5 TRT 6 TRT 7 TRT 8 TRT 9 ScoreClass 1 0.440.120.080.20.310.090.240.110.04.072 2 0.440.120.080.2-0.690.090.240.110.04.072 3 0.44-0.88-0.920.20.31-0.91-0.760.110.04-.251 4 -0.560.120.080.20.310.090.240.110.04.042 5 -0.56-0.880.080.20.310.090.240.110.04.022 6 0.440.120.080.20.310.090.240.110.04.072 7 -0.560.120.080.20.310.09-0.76-0.890.04-.101 8 -0.440.120.080.2-0.690.090.240.110.04.022 9 -0.56-0.880.08-0.80.310.090.240.11-0.96.052 10 -0.560.120.080.20.310.090.240.110.04.042
34
TABLE 2 Weights for Treatment Variables Variable PRIDIT Weights W (∞) Regression Weights TRT1.30.32*** TRT2.19.19*** TRT3.53.22*** TRT4.38.07 TRT5.02.08* TRT6.70-.01 TRT7.82.03 TRT8.37.18*** TRT9-.13.24** Regression significance shown at 1% (***), 5% (**) or 10% (*) levels.
35
TABLE 3 PRIDIT Transformed Indicators, Scores and Classes ClaimTRT 1 TRT 2 TRT 3 TRT 4 TRT 5 TRT 6 TRT 7 TRT 8 TRT 9 ScoreClass 1 0.440.120.080.20.310.090.240.110.04.072 2 0.440.120.080.2-0.690.090.240.110.04.072 3 0.44-0.88-0.920.20.31-0.91-0.760.110.04-.251 4 -0.560.120.080.20.310.090.240.110.04.042 5 -0.56-0.880.080.20.310.090.240.110.04.022 6 0.440.120.080.20.310.090.240.110.04.072 7 -0.560.120.080.20.310.09-0.76-0.890.04-.101 8 -0.440.120.080.2-0.690.090.240.110.04.022 9 -0.56-0.880.08-0.80.310.090.240.11-0.96.052 10 -0.560.120.080.20.310.090.240.110.04.042
36
TABLE 7 AIB Fraud and Suspicion Score Data Top 10 Fraud Indicators by Weight PRIDITAdj. Reg. ScoreInv. Reg. Score ACC3ACC1ACC11 ACC4ACC9CLT4 ACC15ACC10CLT7 CLT11ACC19CLT11 INJ1CLT11INJ1 INJ2INS6INJ3 INJ5INJ2INJ8 INJ6INJ9INJ11 INS8TRT1 LW6TRT9
37
EM Algorithm Hidden Exposures - Overview Modeling hidden risk exposures as additional dimension(s) of the loss severity distribution via EM, Expectation-Maximization, Algorithm Considering the mixtures of probability distributions as the model for losses affected by hidden exposures with some parameters of the mixtures considered missing (i.e., unobservable in practice) Approach is feasible due to advancements in the computer driven methodologies dealing with partially hidden or incomplete data models Empirical data imputation has become more sophisticated and the availability of ever faster computing power have made it increasingly possible to solve these problems via iterative algorithms
38
Figure 1: Overall distribution of the 348 BI medical bill amounts from Appendix B compared with that submitted by provider A. Left panel: frequency histograms (provider A’s histogram in filled bars). Right panel: density estimators (provider A’s density in dashed line) Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 9, 11/18/02
39
Figure 2: EM Fit Left panel: mixture of normal distributions fitted via the EM algorithm to BI data Right panel: Three normal components of the mixture. Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 13, 11/18/02
40
Decision Trees In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure. www.wikipedia.org
41
Different Kinds of Decision Trees Single Trees (CART, CHAID) Ensemble Trees, a more recent development (TREENET, RANDOM FOREST) A composite or weighted average of many trees (perhaps 100 or more) There are many methods to fit the trees and prevent overfitting Boosting: Iminer Ensemble and Treenet Bagging: Random Forest
42
The Methods and Software Evaluated 1) TREENET5) Iminer Ensemble 2) Iminer Tree6) Random Forest 3) SPLUS Tree7) Naïve Bayes (Baseline) 4) CART8) Logistic (Baseline)
43
Ensemble Prediction of IME Requested
44
TREENET ROC Curve – IME AUROC = 0.701
45
Ranking of Methods/Software – 1 st Two Surrogates
46
Implementation Outline Included at End
47
NON-CRIMINAL FRAUD?
48
General Deterrence – Ins. System, Ins Dept + DIA, Med, and Other Government Oversight Specific Deterrence – Company SIU, Auditor, Data, Predictive Modeling for Claims and Underwriting. Non-Criminal Fraud
49
FRAUD INDICATORS VALIDATION PROCEDURES Canadian Coalition Against Insurance Fraud (1997) 305 Fraud Indicators (45 vehicle theft) “No one indicator by itself is necessarily suspicious”. Problem: How to validate the systematic use of Fraud Indicators? Solution 1: Kohonen Self-Organizing Feature Map (Brockett et.al, 1998 Solution 2: Logistic Regression (Viaene, et.al, 2002) Solution 3: PRIDIT Method (Brockett et.al, 2002) Solution 4: Regression Tree Methods (Derrig, Francis, 2007)
50
REFERENCES Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark, (2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373. Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen’ Self- Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274 Derrig, Richard A., (2002), Insurance Fraud, Journal of Risk and Insurance, 69:3, 271-289. Derrig, Richard A., and Ostaszewski, K., (1995) Fuzzy Techniques of Pattern Recognition in Risk & Claim Classification, 62:3, 147-182. Francis, Louise and Derrig, Richard A., (2007) Distinguishing the Forest from the TREES: Working Paper. Rempala, G.A., and Derrig, Richard A., (2003), Modeling Hidden Exposures in Claim Severity via the EM Algorithm, North American Actuarial Journal, 9(2), 108-128. Viaene, S., Derrig, Richard A. et. al, (2002) A Comparison of State-of-the-Art Classification Techniques for Expert Automobile Insurance Fraud Detection, Journal of Risk and Insurance, 69:3, 373-423. Weisberg, H.I. and Derrig R.A., (1998), Quantitative Methods for Detecting Fraudulent Automobile Bodily Injury Claims, RISQUES, vol. 35, pp. 75-101, July-September (In French, English available)
52
Claim Fraud Detection Plan STEP 1:SAMPLE: Systematic benchmark of a random sample of claims. STEP 2:FEATURES: Isolate red flags and other sorting characteristics STEP 3:FEATURE SELECTION: Separate features into objective and subjective, early, middle and late arriving, acquisition cost levels, and other practical considerations. STEP 4:CLUSTER: Apply unsupervised algorithms (Kohonen, PRIDIT, Fuzzy) to cluster claims, examine for needed homogeneity.
53
Claim Fraud Detection Plan STEP 5:ASSESSMENT: Externally classify claims according to objectives for sorting. STEP 6:MODEL: Supervised models relating selected features to objectives (logistic regression, Naïve Bayes, Neural Networks, CART, MARS) STEP7:STATIC TESTING: Model output versus expert assessment, model output versus cluster homogeneity (PRIDIT scores) on one or more samples. STEP 8:DYNAMIC TESTING: Real time operation of acceptable model, record outcomes, repeat steps 1-7 as needed to fine tune model and parameters. Use PRIDIT to show gain or loss of feature power and changing data patterns, tune investigative proportions to optimize detection and deterrence of fraud and abuse.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.