Adaptive Fraud Detection

Adaptive Fraud Detection
From Data Mining and Knowledge Discovery by Tom Fawcett and Foster Provost Presented by: Lara Nargozian updated from last 3 year’s presentation by Adam Boyer, Yunfei Zhao and Ahmen Abdeen Hamed

Why? Solving real-world problems that are very important to each and everyone of us Provide a framework that can be adapted to solve similar problems Use Data Mining algorithms and techniques learned this semester Rule Learning Classification Fun to learn about

Outline Problem Description Cellular cloning fraud problem
Why it is important Current strategies Construction of Fraud Detector Framework Rule learning, Monitor construction, Evidence combination Experiments and Evaluation Data used in this study Data preprocessing Comparative results Conclusion Exam Questions The outline of this presentation is as follows. We first describe the problem of cellular cloning fraud and some existing strategies for detecting it. We then describe the framework in detail using examples from the implemented system. Finally We present experimental results and evaluation of DC1. We will wrap up with a conclusion on DC1’s performance and the exam questions.

The Problem How to detect suspicious changes in user behavior to identify and prevent cellular fraud Non-legitimate users, aka bandits, gain illicit access to a legitimate user’s, or victim’s, account Solution useful in other contexts Identifying and preventing credit card fraud, toll fraud, and computer intrusion

Cellular Fraud - Cloning
Cloning Fraud A kind of Superimposition fraud.(parasite) Fraudulent usage is superimposed upon ( added to ) the legitimate usage of an account. Causes inconvenience to customers and great expense to cellular service providers. In the United States, according to data from 1997, cellular fraud cost the telecommunications industry hundreds of millions of dollars per year (Walters and Wilkinson 1994; Steward 1997). One kind of cellular fraud called cloning was particularly expensive and spreading fast in major cities throughout the United States. So What is cloning : …

Cellular communications and Cloning Fraud
Mobile Identification Number (MIN) and Electronic Serial Number (ESN) Identify a specific account Periodically transmitted unencrypted whenever phone is on Cloning occurs when a customer’s MIN and ESN are programmed into a cellular phone not belonging to the customer Bandit can make virtually unlimited, untraceable calls at someone else’s expense Whenever a cellular phone is on, it periodically transmits two unique identification numbers: its Mobile Identification Number (MIN) and its Electronic Serial Number (ESN). These two numbers together specify the customer's account. These numbers are broadcast unencrypted over the airwaves, and they can be received, decoded and stored using special equipment that is relatively inexpensive. Cloning occurs when a customer's MIN and ESN are programmed into a cellular telephone not belonging to the customer. With the stolen MIN and ESN, a cloned phone user (which we call a bandit) can make virtually unlimited calls free of charge.

Interest in reducing Cloning Fraud
Fraud is detrimental in several ways: Fraudulent usage congests cell sites Fraud incurs land-line usage charges Cellular carriers must pay costs to other carriers for usage outside the home territory Crediting process is costly to carrier and inconvenient to the customer Certainly cloning fraud is very harmful, It is detrimental in several ways such as : Congested cell sites which can lead to service being denied to legit users Most calls made to non cellular destinations, incurring land line usage charges Financial burden to carriers Inconvenience makes customers more likely to switch to another carrier

Strategies for dealing with cloning fraud
Pre-call Methods Identify and block fraudulent calls as they are made Validate the phone or its user when a call is placed Post-call Methods Identify fraud that has already occurred on an account so that further fraudulent usage can be blocked Periodically analyze call data on each account to determine whether fraud has occurred. There are two classes of methods for dealing with cloning fraud. Precall methods, which try to identify and block fraudulent calls as they are made and Post-call methods, which try to identify fraud that has already occurred on an account so that further fraudulent usage can be blocked.

Pre-call Methods Personal Identification Number (PIN)
PIN cracking is possible with more sophisticated equipment. RF Fingerprinting Method of identifying phones by their unique transmission characteristics Authentication Reliable and secure private key encryption method. Requires special hardware capability An estimated 30 million non-authenticatable phones are in use in the US alone (in 1997) Though they are slightly harder to decode than MIN and ESNs, Pins are also broadcast unencrypted. For this reason, pin cracking possible Read remainder of slide. As long as cloning fraud will continue to be a problem the industry will have to rely on post-call fraud detection methods.

Post-call Methods Collision Detection Velocity Checking
Analyze call data for temporally overlapping calls Velocity Checking Analyze the locations and times of consecutive calls Disadvantage of the above methods Usefulness depends upon a moderate level of legitimate activity Since a MINESN pair is licensed to only one legitimate user, any simultaneous usage is probably fraudulent. A closely related method, velocity checking (Davis and Goyal 1993), involves analyzing the locations and times of consecutive calls to determine whether a single user could have placed them while traveling at reasonable speeds. For example, if a call is made in Los Angeles 20 minutes after a call is made on the same account in New York, two different people are likely using the account. If a specific customer rarely uses his or her cell phone, the odds of fraud being detected with this method is substantially less likely. For this reason, profiling is a good complement to collision and velocity checking because it covers cases the others might miss.

Another Post-call Method ( Main focus of this paper )
User Profiling Analyze calling behavior to detect usage anomalies suggestive of fraud Works well with low-usage customers Good complement to collision and velocity checking because it covers cases the others might miss

Sample Frauded Account
Date Time Day Duration Origin Destination Fraud 1/01/95 10:05:01 Mon 13 minutes Brooklyn, NY Stamford, CT 1/05/95 14:53:27 Fri 5 minutes Greenwich, CT 1/08/95 09:42:01 3 minutes Bronx, NY Manhattan, NY 15:01:24 9 minutes 1/09/95 15:06:09 Tue 16:28:50 53 seconds 1/10/95 01:45:36 Wed 35 seconds Boston, MA Chelsea, MA Bandit 01:46:29 34 seconds Yonkers, NY 01:50:54 39 seconds 11:23:28 24 seconds Congers, NY 1/11/95 22:00:28 Thu 37 seconds 22:04:01 This chart shows some chronological call data from an example. The column at the far right indicates whether the call is fraudulent or not. We can see from this call data that : 1. The legitimate user calls from the metro New York City area, usually during working hours, and typically makes calls lasting a few minutes. 2. The bandit's calls originate from a different area (Boston, Massachusetts, about 200 miles away), are made in the evenings, and last less than a minute.

The Need to be Adaptive Patterns of fraud are dynamic – bandits constantly change their strategies in response to new detection techniques Levels of fraud can change dramatically from month-to-month Cost of missing fraud or dealing with false alarms change with inter-carrier contracts One thing about our detection system is that it should be adaptive.

Automatic Construction of Profiling Fraud Detectors
Ideally, a fraud detection system should be able to learn such rules automatically and use them to catch fraud. This following part addresses the automatic design of user profiling methods. Automatic Construction of Profiling Fraud Detectors

One Approach Build a fraud detection system by classifying calls as being fraudulent or legitimate However there are two problems that make simple classification techniques infeasible. Ask the class if they can think of any ways that we couldn’t use straight forward classification.

Problems with simple classification
Context A call that would be unusual for one customer may be typical for another customer (For example, a call placed from Brooklyn is not unusual for a subscriber who lives there, but might be very strange for a Boston subscriber. ) Granularity (over fitting?) At the level of the individual call, the variation in calling behavior is large, even for a particular user. In general, there are two problems that make simple classification approaches infeasible. So if we put all the data together and generate rules from that, we would lose context info. When looking at individual calls, the behavior of customers, i.e. the variation in calling behavior is large. It’s necessary to discover indicators corresponding to changes in behavior that are indicative of fraud, rather than absolute indicators of fraud ( just to say for every user what condition is wrong) and to profile the behavior of individual customers to characterize their normal behavior. Context information would comprise behavior information such as what the phone is used for, what areas it is used in, what areas/numbers it normally calls, what times of day, and so on. Context information (such as what areas/numbers it normally calls) is not available, so our solution is to derive it from historical data specific to each account. Legitimate subscribers occasionally make calls that look suspicious. So far there is no method to classify individual calls effectively, So it is necessary to aggregate customer behavior, smoothing out the variation, So in the experiments we aggregate customer behavior into account days.

In Summary: Learning The Problem
Which phone call features are important? How should profiles be created? When should alarms be raised? In sum, the learning problem comprises three questions, each of which corresponds to a component of our framework. Which features or combinations of features are useful for identifying fraud? How do you characterize the behavior in respect to a feature so that you can notice changes? How do you effectively combine behavior to determine if fraud has occurred?

Proposed Detector Constructor Framework (DC-1)
Under the framework, a system first learns rules that serve as indicators of fraudulent behavior. It then uses these rules, along with a set of templates, to create profiling monitors (M 1 through Mn ). These monitors profile the typical behavior of each account with respect to a discovered pattern/rule and, in use, describe how far each account is from its typical behavior. Finally, the system learns to weight the monitor outputs so as to maximize the effectiveness of the resulting fraud detector. When evidence is strong- alarm issued that fraud has occurred. So In General , It is composed of three components .

DC-1 Processing Account-Day Example
Data on a call consists of date duration, cell-site,

DC-1 Fraud Detection Stages
Stage 1: Rule Learning Stage 2: Profile Monitoring Stage 3: Combining Evidence

Rule Learning – the 1st stage
Rule Generation Rules are generated locally based on differences between fraudulent and normal behavior for each account Rule Selection Then they are combined in a rule selection step Now Lets discuss the first stage of DC1 Rule Learning If using basic techniques, the rule learning algorithm would be applied to the data set consisting of the combination of all legitimate and fraudulent instances. This however, would result in the loss of context information. For example if half of subscribers were in new york and their fraudulent instances occurred in LA, and other half in LA had fraudulent instances in new york , then when combined the algorithm would not uncover fraudulent instances In light of the need to maintain context information, rule learning is performed in two steps. Rules are first generated locally based on differences between fraudulent and normal behavior for each account, then they are combined in a rule selection step. That is one major difference between DC1 and the general Classification technique.

Rule Generation DC-1 uses the RL program to generate rules with certainty factors above user-defined threshold For each Account, RL generates a “local” set of rules describing the fraud on that account. Example: (Time-of-Day = Night) AND (Location = Bronx)  FRAUD Certainty Factor = 0.89 The input of the RL program is the call data are organized by account, with each call record labeled as fraudulent or legitimate. RL performs a general-to-specific search of the space of conjunctive rules. Beam search used for rules with certainty factor above user defined threshold Please notice this rule is true within certain account , For other accounts this may not be true . This rule denotes that a call placed at night from The Bronx (a Borough of New York City) is likely to be fraudulent. The Certainty factor = 0.89 means that, within this account, a call matching this rule has an 89% probability of being fraudulent.

Rule Selection Rule generation step typically yields tens of thousands of rules If a rule is found in ( or covers ) many accounts then it is probably worth using Selection algorithm identifies a small set of general rules that cover the accounts Resulting set of rules is used to construct specific monitors After all accounts have been processed, the rule selection step is performed to derive set or rules For example if there is accounts and for each account RL generate 50 rules then there would be rules after rule generation. Because of multiple levels of granularity DC1 selects the set of rules that covers the accounts as opposed to the instances/calls For each account, the list of rules generated is sorted by the frequency of occurrence in the accounts set. Resulting set monitors

Rule Selection and Covering Algorithm

Profiling Monitors – the 2nd stage
Monitors have 2 distinct steps - Profiling step: Monitor is applied to an account’s normal usage to measure the account‘s normal activity. Statistics are saved with the account. Use step: A monitor processes a single account-day, References the normalcy measure from profiling Generates a numeric value describing how abnormal the current account-day is.

Most Common Monitor Templates
Threshold Standard Deviation Profiling monitors are created by the monitor constructor, which employs a set of templates. The templates are instantiated by rule conditions. There are two common templates one is called Threshold and another is called Standard Deviation.

Threshold Monitors The threshold monitor yields a binary feature corresponding to whether the user's behavior was above threshold for the given day. This monitor outputs a 1 if the daily usage exceeds the threshold

Standard Deviation Monitors
This monitor outputs the number of sds over the mean profiled usage. Standard deviation monitors are sensitive both to the expected amount of activity on an account and to the expected daily variation of that activity.

Comparing the same standard deviation monitor on two accounts
These two accounts have the same mean, but their standard deviation is different. one is very large, the other very small. The monitor on the account with the smaller standard deviation or lower variation produces large values with irregular behavior from the use period. The other account having more variation in the profiling period, on the other hand, produces smaller values when variant behvior is observed in the use period.

Example for Standard Deviation
Rule (TIMEOFDAY = NIGHT) AND (LOCATION = BRONX) FRAUD Profiling Step the subscriber called from the Bronx an average of 5 minutes per night with a standard deviation of 2 minutes. At the end of the Profiling step, the monitor would store the values (5,2) with that account. Use step if the monitor processed a day containing 3 minutes of airtime from the Bronx at night, the monitor would emit a zero; if the monitor saw 15 minutes, it would emit (15 - 5)/2 = 5. This value denotes that the account is five standard deviations above its average (profiled) usage level. An example of standard deviation:

Combining Evidence from the Monitors – the 3rd stage
Weights the monitor outputs and learns a threshold on the sum to produce high confidence alarms DC-1 uses Linear Threshold Unit (LTU) Simple and fast Enables good first-order judgment A Feature selection process is used to Choose a small set of useful monitors in the final detector Some rules don’t perform well when used in monitors, some overlap Forward selection process chooses set of useful monitors The third stage of detector construction learns how to combine evidence from the monitors generated by the previous stage. For this stage, the outputs of the monitors are used as features to a standard learning program. Training is done on account data, and monitors evaluate one entire accountday at a time. In training, the monitors' outputs are presented along with the desired output (the accountday's correct class: fraud or nonfraud). The evidence combination weights the monitor outputs and learns a threshold on the sum so that alarms may be issued with high confidence.

Final Output of DC-1 Detector that profiles each user’s behavior based on several indicators An alarm when sufficient evidence of fraudulent activity

Here describe the data we used to do our experiment .
Data used in the study

Data Information Four months of phone call records from the New York City area. Each call is described by 31 original attributes Some derived attributes are added Time-Of-Day (MORNING, AFTERNOON, TWILIGHT, EVENING, NIGHT) To-Payphone Each call is given a class label of fraudulent or legitimate. The call data used for this study are records of cellular calls placed over four months by users in the New York City area---an area with high levels of fraud. Class label given by cross referencing database of calls that were credited for being fraudulent for the same time period

Data Cleaning Eliminated credited calls made to destinations/numbers that are not in the created block The destination number must be only called by the legitimate user. Days with 1-4 minutes of fraudulent usage were discarded. May have credited for other reasons, such as wrong number Call times were normalized to Greenwich Mean Time for chronological sorting Cellular call data contains errors and noise from various sources. The process in marking fraudulent calls is called block crediting and consists of the customer and a carrier rep establishing a range of dates when fraud occurred. One set of noise in the block is obvious non-fraudulent instances, thus they are removed from the block, but the heuristics in doing so can be fallible. Also, disagreement about the fraud span usually pans out in the customers favor, with a wider date span. Any rroniously credited calls are additional noise in the data.

Data Description Once the monitors are created and accounts profiled, the system transforms raw call data into a series of account-days using the monitor outputs as features Selected for Profiling, training and testing: 3600 accounts that have at least 30 fraud-free days of usage before any fraudulent usage. Initial 30 days of each account were used for profiling. Remaining days were used to generate 96,000 account-days. Distinct training and testing accounts:10,000 account-days for training; for testing 20% fraud days and 80% non-fraud days The call data were separated carefully into several partitions for rule learning, account profiling, and detector training and testing.

Experiments and Evaluation

Output of DC-1 components
Rule learning: 3630 rules Each covering at least two accounts Rule selection: 99 rules 2 monitor templates yielding 198 monitors Final feature selection: 11 monitors

The Importance Of Error Cost
Classification accuracy is not sufficient to evaluate performance Should take misclassification costs into account Estimated Error Costs: False positive(false alarm): $5 False negative (letting a fraudulent account-day go undetected): $0.40 per minute of fraudulent air-time Factoring in error costs requires second training pass by LTU In order to evaluate DC1 ‘s performance, classification accuracy is not enough since different error has different cost A realistic evaluation must take into account misclassification costs. LTU minimizes error but not cost. The LTU’s threshold is adjusted to yield minimum error cost on the training set.

Alternative Detection Methods
Collisions + Velocities Errors almost entirely due to false negatives High Usage – detect sudden large jump in account usage Best Individual DC-1 Monitor (Time-of-day = Evening) ==> Fraud SOTA - State Of The Art Incorporates 13 hand-crafted profiling methods Best detectors identified in a previous study High usage- standard deviation monitor whose rule conditions are always satisfied. Cost in high usage cases even higher than $0.40/min Best indiviual- additional benefit of combining monitors - a valuable fraud indicator

DC-1 Vs. Alternatives Detector Accuracy(%) Cost ($) Accuracy at Cost
Alarm on all 20 20000 Alarm on none 80 /- 961 Collisions + Velocities 82 +/- 0.3 /- 749 82 +/- 0.4 High Usage 88+/- 0.7 6938 +/- 470 85 +/- 1.7 Best DC-1 monitor 89 +/- 0.5 7940 +/- 313 85 +/- 0.8 State of the art (SOTA) 90 +/- 0.4 6557 +/- 541 88 +/- 0.9 DC-1 detector 92 +/- 0.5 5403 +/- 507 91 +/- 0.8 SOTA plus DC-1 92 +/- 0.4 5078 +/- 319 Alarm on all represents the policy of alarming on every account every day. The opposite strategy, Alarm on None, represents the policy of allowing fraud to go completely unchecked. We can consider these to be sort of costs bounds for the experiment. Interesting things to note: DC-1 beats the “State of the art” (SOTA) detector in all three measurements. Combining SOTA with DC-1 slightly increases the accuracy but decreases the costs substantially.

Shifting Fraud Distributions
Fraud detection system should adapt to shifting fraud distributions To illustrate the above point - One non-adaptive DC-1 detector trained on a fixed distribution ( 80% non-fraud ) and tested against range of 75-99% non-fraud Another DC-1 was allowed to adapt (re-train its LTU threshold) for each fraud distribution Second detector was more cost effective than the first Only an adaptive system will keep from losing its edge. In order to testify the importance of adaptability. Two different DC1 were constructed.

The results are shown in Figure 7
The results are shown in Figure 7. The Xaxis is the percentage of nonfraud accountdays, and the Yaxis is the cost per account day. This figure shows that the second detector, which is allowed to adjust itself to each new distribution, is consistently more cost effective than the fixed detector. This difference increases as the testing distribution becomes more skewed from the distribution upon which the fixed detector was trained.

Conclusion DC-1 uses a rulelearning program to uncover indicators of fraudulent behavior from a large database of customer transactions. Then the indicators are used to create a set of monitors, which profile legitimate customer behavior and indicate anomalies. Finally, the outputs of the monitors are used as features in a system that learns to combine evidence to generate highconfidence alarms.

Conclusion Adaptability to dynamic patterns of fraud can be achieved by generating fraud detection systems automatically from data, using data mining techniques DC-1 can adapt to the changing conditions typical of fraud detection environments Experiments indicate that DC-1 performs better than other methods for detecting fraud

Exam Questions

Question 1 What are the two major fraud detection categories, differentiate them, and where does DC-1 fall under? Pre Call Methods Involves validating the phone or its user when a call is placed. Post Call Methods – DC1 falls here Analyzes call data on each account to determine whether cloning fraud has occurred.

Question 2 Why do fraud detection methods need to be adaptive?
Bandits change their behavior- patterns of fraud dynamic Levels of fraud varies month-to-month Cost of missing fraud or handling false alarms changes between inter-carrier contracts

Question 3 What are the two steps of profiling monitors and and what are the two main monitor templates? Profiling Step: measure an accounts normal activity and save statistics Use Step: process usage for an account-day to produce a numerical output describing how abnormal activity was on that account-day Threshold and Standard Deviation monitors.

The End Questions?

Adaptive Fraud Detection

Similar presentations

Presentation on theme: "Adaptive Fraud Detection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adaptive Fraud Detection

Similar presentations

Presentation on theme: "Adaptive Fraud Detection"— Presentation transcript:

Similar presentations

About project

Feedback