Presentation is loading. Please wait.

Presentation is loading. Please wait.

Review of Fraud Classification Using Principal Components Analysis of RIDITS By Louise A. Francis Francis Analytics and Actuarial Data Mining, Inc.

Similar presentations


Presentation on theme: "Review of Fraud Classification Using Principal Components Analysis of RIDITS By Louise A. Francis Francis Analytics and Actuarial Data Mining, Inc."— Presentation transcript:

1 Review of Fraud Classification Using Principal Components Analysis of RIDITS By Louise A. Francis Francis Analytics and Actuarial Data Mining, Inc.

2 Objectives ßAddress question: Why use new method, PRIDIT? ßIntroduce other methods used in similar circumstances ßExplain how PRIDIT adds to methods available ßExplain limitations of PRIDIT/RIDIT

3 A Key Problem in Fraud Modeling ßMost data mining methods need a target (dependent) variable ßY = a + b 1 x 1 + b 2 x 2 + … b n x n ßFraud (Yes/No or Fraud Score) = f(predictor variables) ßNeed sample of data where claims have been determined to be fraudulent or legitimate

4 Dependent variable hard to get ßIn a large sample of automobile insurance claims perhaps 1/3 may have an element of abuse or fraud ßScarce resources are not expensed on such large volumes of claims to determine their legitimacy ßOnly a small percentage referred to SIU investigators or other investigations ßThere are time lags in determining the outcome of investigations

5 Unsupervised learning ßAnother approach that does not require a dependent variable ßTwo Key Kinds ßCluster Analysis ßPrincipal Components/Factor Analysis ßPridit uses this approach ßIt is applied to ordered categorical variables

6 Cluster Analysis ßRecords are grouped in categories that have similar values on the variables ßExamples ßMarketing: People with similar values on demographic variables (i.e., age, gender, income) may be grouped together for marketing ßText analysis: Use words that tend to occur together to classify documents ßNote: no dependent variable used in analysis

7 Clustering ßCommon Method: k-means, hierarchical ßNo dependent variable – records are grouped into classes with similar values on the variable ßStart with a measure of similarity or dissimilarity ßMaximize dissimilarity between members of different clusters

8 Dissimilarity (Distance) Measure – Continuous Variables ßEuclidian Distance ßManhattan Distance

9 Binary Variables

10 ßSample Matching ßRogers and Tanimoto

11 Example: Fraud Data ßData from 1993 closed claim study conducted by Automobile Insurers Bureau of Massachusetts ßClaim files often have variables which may be useful in assessing suspicion of fraud, but a dependent variable is often not available ßVariables used for clustering: ßLegal representation ßPrior Claim ßSIU Investigation ßAt fault ßPolice report ßNumber of providers

12 Statistics for Clusters ßBased on descriptive statistics, Cluster 2 appears to have higher likelihood of fraudulent claims – more about this later

13 Principal Components Analysis ßA form of dimension (variable) reduction ßSuppose we want to combine all the information related to the “financial” dimension of fraud ßMedical provider bill (indicative of padding claim) ßHospital bill ßNumber of providers ßEconomic Losses ßClaimed wages ßIncurred Losses

14 Principal Components ßThese variables are correlated but not perfectly correlated ßWe replace many variables with a weighted sum of the variables

15 Correlation Matrix for Variables

16 Finding Factor or Component ßThe correlation matrix is used to find the factor that explains the most variance (captures most of the correlation) for the set of variables ßThat component or factor extracted will be a weighted average of the variables ßMore than one Component or Factor may result from applying the method

17 Evaluating Importance of Variables ßUse factor loadings

18 Problem: Categorical Variables ßIt is not clear how to best perform Principal Components/Factor Analysis on categorical variables ßThe categories may be coded as a series of binary dummy variables ßIf the categories are ordered categories, you may loose important information ßThis is the problem that PRIDIT addresses

19 RIDIT ßVariables are ordered so that lowest value is associated with highest probability of fraud ßUse Cumulative distribution of claims at each value, i, to create RIDIT statistic for claim t, value i

20 Example: RIDIT for Legal Representation

21 PRIDIT ßUse RIDIT statistics in Principal Components Analysis

22 Scoring ßAssign a score to each claim ßThe score can be used to sort claims ßMore effort expended on claims more likely to be fraudulent or abusive ßIn the case of AIB data, we can use additional information to test how well PRIDIT did, using the PRIDIT score ßA suspicion score was assigned to each claim by an expert

23 PRIDIT vs. Suspicion Score

24 Clustering and Suspicion Score

25 Result ßThere appears to be a strong relationship between PRIDIT score and suspicion that claim is fraudulent or abusive ßThe clusters resulting from the cluster procedure also appeared to be effective in separating legitimate from fraudulent or abusive claims

26 Comparison: PRIDIT and Clustering ßPRIDIT gives a score, which may be very useful for claims sorting. Clustering assigns claims to classes. They are either in or out of the assigned class. ßClustering ignores information about the order of values for categorical variables ßClustering can accommodate both categorical and continuous variables

27 Comparison ßUnordered categorical variables with many values (i.e., injury type): ßClustering has a procedure for measuring dissimilarity for these variables and can use them in clustering ßIf the values for the variables contain no meaningful order, PRIDIT will not help in creating variables to use in Principal Components Analysis.

28 Review of Fraud Classification Using Principal Components Analysis of RIDITS By Louise A. Francis Francis Analytics and Actuarial Data Mining, Inc.


Download ppt "Review of Fraud Classification Using Principal Components Analysis of RIDITS By Louise A. Francis Francis Analytics and Actuarial Data Mining, Inc."

Similar presentations


Ads by Google