Mining Top-n Local Outliers in Large Databases Author: Wen Jin, Anthony K. H. Tung, Jiawei Han Advisor: Dr. Hsu Graduate: Chia- Hsien Wu
Outline Motivation Objective Introduction Definition of Local Outlier Algorithm for Finding Top-n Local Outliers Micro-Cluster-Based Algorithm Experimentation Conclusion Opinion
Motivation By using a novel notion about LOF for outlier detection has some problems Resulting a unnecessary computation for every object about its value of LOF Handling the overlapping data
Objective To propose the good performance in finding the most outstanding local outliers We use a meaningful cut-plan to make our algorithms to be useful
Introduction There are five general categories about outlier detection. i.e., density-based Using a novel method for top-n local outliers mining that avoid computation of LOF for most objects Using the “Micro-clusters” and bounds notions
Definition of Local Outlier DEFINITION 1.(k-distance of p)
Definition of Local Outlier (cont.) DEFINITION 2.(k-distance neighborhood of p)
Definition of Local Outlier (cont.) DEFINITION 3. (reachability distance of p w.r.t object o)
Definition of Local Outlier (cont.)
DEFINITION 4.(local reachability density of p) There MinPis is equal to K DEFINITION 5.(local outlier factor of p)
Algorithm for Finding Top-n Local Outliers
Algorithm for Finding Top-n Local Outliers (cont.)
THEOREM 3.2
Algorithm for Finding Top-n Local Outliers (cont.)
DEFINITION 7.
Algorithm for Finding Top-n Local Outliers (cont.) DEFINITION 8.
Algorithm for Finding Top-n Local Outliers (cont.) DEFINITION 9.
Algorithm for Finding Top-n Local Outliers (cont.)
COROLLARY 3.2.
Algorithm for Finding Top-n Local Outliers (cont.) DEFINITION 10.(INTERNAL REACHABILITY BOUND OF A MICRO-CLUSTER)
Algorithm for Finding Top-n Local Outliers (cont.) DEFINITION 11.(EXTERNAL REACHABILITY BOUND OF TWO MICRO-CLUSTER)
Micro-Cluster-Based Algorithm Preprocessing, Computing LOF bound for micro-clusters, Rank top-n local outliers
Micro-Cluster-Based Algorithm (cont.) Preprocessing Load data into CF tree Fix CF Node and generate micro-clusters Insert micro-clusters into X-tree
Micro-Cluster-Based Algorithm (cont.) Computing LOF Bound for Micro-clusters Algorithm 1
Micro-Cluster-Based Algorithm (cont.) Algorithm 2.
Micro-Cluster-Based Algorithm (cont.) Rank Top-n local outliers Algorithm 3
Experimentation
Experimentation (cont.)
Conclusion We have proposed a novel and efficient method for mining top-n local outliers To find strong and nested local outlier at multiple levels of granularity
Opinion How to combine this novel notion and other ways for finding outlier may be a research topic