CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.

Slides:



Advertisements
Similar presentations
DAMA-NCR Tuesday, November 13, 2001 Laura Squier Technical Consultant What is Data Mining?
Advertisements

Unit 7: Store and Retrieve it Database Management Systems (DBMS)
1. Abstract 2 Introduction Related Work Conclusion References.
Continuous Audit at Insurance Companies
Shipi Kankane Prashanth Nakirekommula.  Applying analytics and risk- management capabilities to health insurance through LexisNexis data platforms. 
Mining the Data Ira M. Schoenberger, FACHCA Senior Administrator 2011 AHCA/NCAL Quality Symposium Friday February 18, 2011.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Data Mining Knowledge Discovery in Databases Data 31.
Clementine Server Clementine Server A data mining software for business solution.
Data Mining By Archana Ketkar.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Data Mining – Intro.
DataMining By Guan Hang Su CS157A section 2 fall 2005.
Chapter 5 Data mining : A Closer Look.
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
Data Mining Techniques As Tools for Analysis of Customer Behavior
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
© 2010 IBM Corporation © 2011 IBM Corporation September 6, 2012 NCDHHS FAMS Overview for Behavioral Health Managed Care Organizations.
Critical Analysis. Key Ideas When evaluating claims based on statistical studies, you must assess the methods used for collecting and analysing the data.
Understanding Data Analytics and Data Mining Introduction.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Intrusion Detection Adam Ashenfelter Nicholas J. Tyrrell.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
The CRISP-DM Process Model
INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Banking on Analytics Dr A S Ramasastri Director, IDRBT.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
MIS2502: Data Analytics Advanced Analytics - Introduction.
CS526: Information Security Chris Clifton November 25, 2003 Intrusion Detection.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Do Not Pay Business Center- Using Analytics to Help Agencies Prevent Improper Payments JFMIP May 2016.
Fraud Detection with Machine Learning: A Case Study from Sift Science
F8: Audit and Assurance. 2 Designed to give you knowledge and application of: Section A: Audit Framework and Regulation Section B: Internal audit Section.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Introduction to Machine Learning, its potential usage in network area,
Oracle Advanced Analytics
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Based Decision Making
MIS2502: Data Analytics Advanced Analytics - Introduction
What Matters in Student Rating of Instructor Teaching (SRI)?
Capitalising on Analytics as a Security Measure
Evaluating a Real-time Anomaly-based IDS
Data Science introduction.
Data Mining: Introduction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining: Concepts and Techniques
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection

What is Fraud Detection? Identify wrongful actions –Is right and wrong universal? –If so, why not just prevent wrong actions Identify actions by the wrong people Identify suspect actions –Legal –But probably not right

In Data Mining terms… Classification? –Classify into fraudulent and non-fraudulent behavior –What do we need to do this? Outlier Detection –Assume non-fraudulent behavior is normal –Find the exceptions Problems?

–+– Solution: Differential Profiling Determine individual behavior –What is normal for the individual –What separates one individual from another Gives profile of individual behavior How do we do this? Profile Classification Mining Profile ++–

Has this been done? Intrusion Detection (Lane&Brodley) Profiled computer users based on command sequences –Command –Some (but not all) argument information –Sequence information

Results AccuracyTime to Alarm

Scaling Issues What happens with millions of users? –Credit card –Cell phone What about new users? Ideas?

Multi-user profiles Cluster users Develop profiles for clusters –E.g., differential profiling Old customers: Do they match profile for their cluster? –Allows wider range of acceptable behavior New customer: Do they match any profile?

Data mining for detection and prevention

“The process of discovering meaningful new relationships, patterns and trends by sifting through data using pattern recognition technologies as well as statistical and mathematical techniques.” - The Gartner Group Data mining defined:

Matching known fraud/non-compliance Which new cases are similar to known cases? How can we define similarity? How can we rate or score similarity?

Anomalies and irregularities How can we detect anomalous or unusual behavior? What do we mean by usual? Can we rate or score cases on their degree of anomaly?

Data mining is not “Blind”application of analysis/modeling algorithms Brute-force crunching of bulk data Black box technology Magic

How do you mine data? Use the Cross Industry Standard Process for Data Mining (CRISP-DM) Based on real- world lessons: –Focus on business issues –User-centric & interactive –Full process –Results are used

Techniques used to identify fraud Predict and Classify –Regression algorithms (predict numeric outcome): neural networks, CART, Regression, GLM –Classification algorithms (predict symbolic outcome): CART, C5.0, logistic regression Group and Find Associations –Clustering/Grou ping algorithms: K-means, Kohonen, 2Step, Factor analysis –Association algorithms: apriori, GRI, Capri, Sequence

Techniques for finding fraud: Predict the expected value for a claim, compare that with the actual value of the claim. Those cases that fall far outside the expected range should be evaluated more closely

Techniques for finding fraud: Build a profile of the characteristics of fraudulent behavior. Pull out the cases that meet the historical characteristics of fraud. Decision Trees and Rules

Techniques for finding fraud: Group behavior using a clustering algorithm Find groups of events using the association algorithms Identify outliers and investigate Clustering and Associations

Fraud detection using CRISP-DM Provides a systematic way to detect fraud and abuse Ensures auditing and investigative efforts are maximized Continually assesses and updates models to identify new emerging fraud patterns Leads to higher recoupments

Data mining in action: Fraud, waste and abuse case studies

How can data mining help? Payment error prevention Billing and payment fraud Audit selection

Payment Error Prevention …used this information to focus their auditing effort The US Health Care Finance Administration needed to isolate the likely causes of payment error by developing a profile of acceptable billing practices and...

Payment error prevention solution Clementine™ Using audited discharge records, built profiles of appropriate decisions such as diagnosis coding and admission Matched new cases Cases not matching are audited

Payment error prevention results Detected 50% of past incorrect payments – resulting in significant recovery of funding lost to payment errors PRO analysts able to use resultant Clementine models to prevent future error

Billing and payment fraud Identified suspicious cases to focus investigations The US Defense Finance and Accounting Service needed to find fraud in millions of Dept of Defense transactions and...

Billing and payment fraud solution Clementine Detection models based on known fraud patterns Analyzed all transactions – scored based on similarity to these known patterns High scoring transactions flagged for investigation

Billing and payment fraud results Identified over 1,200 payments for further investigation Integrated the detection process Anomaly detection methods (e.g., clustering) will serve as ‘sentinel’ systems for previously undetected fraud patterns

Audit selection Focused audit investigations on cases with the highest likely adjustments The Washington State Department of Revenue needed to detect erroneous tax returns and...

Audit selection solution Clementine Using previously audited returns Model adjustment (recovery) per auditor hour based on return information Models will then score future returns showing highest potential adjustment

Audit selection results Maximizes auditors’ time by focusing on cases likely to yield the highest return Closes the ‘tax gap’

Data mining - key to detecting and preventing fraud, waste and abuse Learn from the past –High quality, evidence based decisions Predict –Prevent future instances React to changing circumstances –Models kept current, from latest data