Mining Top-n Local Outliers in Large Databases Author: Wen Jin, Anthony K. H. Tung, Jiawei Han Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.

Slides:



Advertisements
Similar presentations
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Advertisements

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Density-Based Clustering of Spatial Data when facing.
2001/12/181/50 Discovering Robust Knowledge from Databases that Change Author: Chun-Nan Hsu, Craig A. Knoblock Advisor: Dr. Hsu Graduate: Yu-Wei Su.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Graduate : Sheng-Hsuan Wang
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Indexing Positions of Moving Objects Using B + -trees 4-th WIM meeting, Aalborg 2002 Laurynas Speičys
Visualization of Clusters with a Density-Based Similarity Measure Rebecca Nugent Department of Statistics, Carnegie Mellon University June 9, 2007 Joint.
CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th International Conference on Computer Design (ICCD) M. Fattah,
Discovering Outlier Filtering Rules from Unlabeled Data Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.
CSV: Visualizing and Mining Cohesive Subgraphs Nan Wang Srinivasan Parthasarathy Kian-Lee Tan Anthony K. H. Tung School of Computing National University.
Mobility Limited Flip-Based Sensor Networks Deployment Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic.
Similarity Measure Based on Partial Information of Time Series Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Xiaoming Jin Yuchang Lu Chunyi Shi.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi.
Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.
Author: Zhexue Huang Advisor: Dr. Hsu Graduate: Yu-Wei Su
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Outlier Detection Lian Duan Management Sciences, UIOWA.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Nonoverlap of the Star Unfolding Boris Aronov and Joseph O’Rourke, 1991 A Summary by Brendan Lucier, 2004.
Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology SIGIR1 Improving Web Search Results Using Affinity Graph.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Multiprefix Trie: A New Data Structure for Designing Dynamic Router-Tables Author: Sun-Yuan Hsieh, Senior Member, IEEE, Yi-Ling Huang, and Ying-Chi Yang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Lecture 7: Outlier Detection Introduction to Data Mining Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
A Fuzzy k-Modes Algorithm for Clustering Categorical Data
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events Advisor : Dr. Hsu Graduate : You-Cheng Chen.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Graph preprocessing. Framework for validating data cleaning techniques on binary data.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Jeremy.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
ViSOM - A Novel Method for Multivariate Data Projection and Structure Visualization Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Hujun Yin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Gustavo.
TreeFinder : a first step towards XML data mining Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Alexandre Termier Marie-Christine Michele Sebag.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2005.ACM GECCO.8.Discriminating and visualizing anomalies.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Multivariate Discretization of Continuous Variables for Set Mining Author:Stephen D. Bay Advisor: Dr. Hsu Graduate: Kuo-wei Chen.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Evaluation of Bipartite-graph-based Web Page Clustering Shim Wonbo M1 Chikayama-Taura Lab.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Web-based acquisition of Japanese katakana variants
Efficient SOM Learning by Data Order Adjustment
Dr. Hongqin FAN Department of Building and Real Estate
DATA MINING Spatial Clustering
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Mining Top-n Local Outliers in Large Databases Author: Wen Jin, Anthony K. H. Tung, Jiawei Han Advisor: Dr. Hsu Graduate: Chia- Hsien Wu

Outline Motivation Objective Introduction Definition of Local Outlier Algorithm for Finding Top-n Local Outliers Micro-Cluster-Based Algorithm Experimentation Conclusion Opinion

Motivation By using a novel notion about LOF for outlier detection has some problems  Resulting a unnecessary computation for every object about its value of LOF  Handling the overlapping data

Objective To propose the good performance in finding the most outstanding local outliers We use a meaningful cut-plan to make our algorithms to be useful

Introduction There are five general categories about outlier detection. i.e., density-based Using a novel method for top-n local outliers mining that avoid computation of LOF for most objects Using the “Micro-clusters” and bounds notions

Definition of Local Outlier DEFINITION 1.(k-distance of p)

Definition of Local Outlier (cont.) DEFINITION 2.(k-distance neighborhood of p)

Definition of Local Outlier (cont.) DEFINITION 3. (reachability distance of p w.r.t object o)

Definition of Local Outlier (cont.)

DEFINITION 4.(local reachability density of p) There MinPis is equal to K DEFINITION 5.(local outlier factor of p)

Algorithm for Finding Top-n Local Outliers

Algorithm for Finding Top-n Local Outliers (cont.)

THEOREM 3.2

Algorithm for Finding Top-n Local Outliers (cont.)

DEFINITION 7.

Algorithm for Finding Top-n Local Outliers (cont.) DEFINITION 8.

Algorithm for Finding Top-n Local Outliers (cont.) DEFINITION 9.

Algorithm for Finding Top-n Local Outliers (cont.)

COROLLARY 3.2.

Algorithm for Finding Top-n Local Outliers (cont.) DEFINITION 10.(INTERNAL REACHABILITY BOUND OF A MICRO-CLUSTER)

Algorithm for Finding Top-n Local Outliers (cont.) DEFINITION 11.(EXTERNAL REACHABILITY BOUND OF TWO MICRO-CLUSTER)

Micro-Cluster-Based Algorithm Preprocessing, Computing LOF bound for micro-clusters, Rank top-n local outliers

Micro-Cluster-Based Algorithm (cont.) Preprocessing  Load data into CF tree  Fix CF Node and generate micro-clusters  Insert micro-clusters into X-tree

Micro-Cluster-Based Algorithm (cont.) Computing LOF Bound for Micro-clusters  Algorithm 1

Micro-Cluster-Based Algorithm (cont.) Algorithm 2.

Micro-Cluster-Based Algorithm (cont.) Rank Top-n local outliers  Algorithm 3

Experimentation

Experimentation (cont.)

Conclusion We have proposed a novel and efficient method for mining top-n local outliers To find strong and nested local outlier at multiple levels of granularity

Opinion How to combine this novel notion and other ways for finding outlier may be a research topic