Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —

Slides:



Advertisements
Similar presentations
Density-Based Clustering Math 3210 By Fatine Bourkadi.
Advertisements

Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar Introduction to Data Mining.
Spatial and Temporal Data Mining
Cluster Analysis Part III. Learning Objectives Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis Summary.
Cluster Analysis.
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Data Mining – Intro.
COMP 5331: Knowledge Discovery and Data Mining
Outlier Detection & Analysis
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining Chun-Hung Chou
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
1. cluster the data. 2. for the data of a cluster, set up the network. 3. begin at a random vertex as source/sink s, choose its farthest vertex as the.
Clustering Part2 BIRCH Density-based Clustering --- DBSCAN and DENCLUE
A simple method for multi-relational outlier detection Sarah Riahi and Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
11/15/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Clustering.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Outlier Detection Lian Duan Management Sciences, UIOWA.
RDF: A Density-based Outlier Detection Method Using Vertical Data Representation Dongmei Ren, Baoying Wang, William Perrizo North Dakota State University,
Final Review Lei Chen. Clustering Algorithms K-Means.
October 27, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 7 — ©Jiawei Han and Micheline.
1 Clustering Sunita Sarawagi
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
November 1, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly Detection © Tan,Steinbach, Kumar Introduction to Data Mining.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Lecture 7: Outlier Detection Introduction to Data Mining Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Data Mining and Warehousing: Chapter 8
COMP5331 Outlier Prepared by Raymond Wong Presented by Raymond Wong
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Trajectory Outlier Detection: A Partition-and-Detect Framework1 04/08/08 April 8, 2007 Trajectory Outlier Detection: A Partition-and-Detect Framework Jae-Gil.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Graph preprocessing. Framework for validating data cleaning techniques on binary data.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
Han: Clustering1 Clustering — Slides for Textbook — — Chapter 8 — ©Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar GNET 713 BCB Module Spring 2007 Wei Wang.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
1 Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Density-Based.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Dr. Hongqin FAN Department of Building and Real Estate
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
©Jiawei Han and Micheline Kamber Department of Computer Science
Lecture Notes for Chapter 9 Introduction to Data Mining, 2nd Edition
Data Mining Anomaly Detection
Outlier Discovery/Anomaly Detection
Data Mining Anomaly/Outlier Detection
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
CSCI N317 Computation for Scientific Applications Unit Weka
Online Analytical Processing Stream Data: Is It Feasible?
Data Mining Anomaly Detection
Data Mining Anomaly Detection
Presentation transcript:

Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University ©2009 Han, Kamber & Pei. All rights reserved. 4/16/2017 Data Mining: Concepts and Techniques 1 1

Chapter 12. Outlier Analysis Why outlier analysis? Identifying and handling of outliers Distribution-Based Outlier Detection: A Statistics-Based Approach Classification-Based Outlier Detection Clustering-Based Outlier Detection Distance-Based Outlier Detection Local Outlier Analysis: A Density-Based Approach Deviation-Based Outlier Detection Isolation-Based Method: From Isolation Tree to Isolation Forest Outlier Detection in High Dimensional Data Intrusion Detection Summary

What Is Outlier Discovery? What are outliers? The set of objects are considerably dissimilar from the remainder of the data Example: Sports: Michael Jordon, Wayne Gretzky, ... Problem: Define and find outliers in large data sets Applications: Credit card fraud detection Telecom fraud detection Customer segmentation Medical analysis April 16, 2017 Data Mining: Concepts and Techniques

Outlier Discovery: Statistical Approaches Assume a model underlying distribution that generates data set (e.g. normal distribution) Use discordancy tests depending on data distribution distribution parameter (e.g., mean, variance) number of expected outliers Drawbacks most tests are for single attribute In many cases, data distribution may not be known April 16, 2017 Data Mining: Concepts and Techniques

Outlier Discovery: Distance-Based Approach Introduced to counter the main limitations imposed by statistical methods We need multi-dimensional analysis without knowing data distribution Distance-based outlier: A DB(p, D)-outlier is an object O in a dataset T such that at least a fraction p of the objects in T lies at a distance greater than D from O Algorithms for mining distance-based outliers [Knorr & Ng, VLDB’98] Index-based algorithm Nested-loop algorithm Cell-based algorithm April 16, 2017 Data Mining: Concepts and Techniques

Density-Based Local Outlier Detection M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. SIGMOD 2000. Distance-based outlier detection is based on global distance distribution It encounters difficulties to identify outliers if data is not uniformly distributed Ex. C1 contains 400 loosely distributed points, C2 has 100 tightly condensed points, 2 outlier points o1, o2 Distance-based method cannot identify o2 as an outlier Need the concept of local outlier Local outlier factor (LOF) Assume outlier is not crisp Each point has a LOF April 16, 2017 Data Mining: Concepts and Techniques

Outlier Discovery: Deviation-Based Approach Identifies outliers by examining the main characteristics of objects in a group Objects that “deviate” from this description are considered outliers Sequential exception technique simulates the way in which humans can distinguish unusual objects from among a series of supposedly like objects OLAP data cube technique uses data cubes to identify regions of anomalies in large multidimensional data April 16, 2017 Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Summary Cluster analysis groups objects based on their similarity and has wide applications Measure of similarity can be computed for various types of data Clustering algorithms can be categorized into partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods Outlier detection and analysis are very useful for fraud detection, etc. and can be performed by statistical, distance-based or deviation-based approaches There are still lots of research issues on cluster analysis April 16, 2017 Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques References (1) M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. SIGMOD’00 E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. VLDB’98 April 16, 2017 Data Mining: Concepts and Techniques