2015/7/21 Incremental Clustering for Mining in a Data Warehousing Environment Martin Ester Hans-Peter Kriegel J.Sander Michael Wimmer Xiaowei Xu Proceedings of the 24 th VLDB Conference New York,USA,1998 Modified Version
2015/7/22 Introduction Related Work The algorithm DBSCAN Incremental DBSCAN Performance Evaluation Conclusions Comment Outline
2015/7/23 Introduction Data Warehouse: collection of data from multiple sources two characteristic : (1) Derived information is present for the purpose of analysis (2) The environment is dynamic Data Mining: the application of data analysis and discovery algorithms that Under acceptable computational efficiency limitations Produce a particulate enumeration of patterns over the data
2015/7/24 Introduction(Cont.) The task in this paper is clustering - Grouping the objects of a database into a meaningful subclasses. Data warehouse updating - Periodically database update in a batch mode(ex:night). Due to the vary large size of the database,it is highly desirable to perform these updates incrementally Based on DBSCAN [EKSX 96] -The incremental algorithm has the same cluster result with DBSCAN and it speed up the daily updates in a data warehouse
2015/7/25 Related Work Partitioning algorithms: Construct various partitions and then evaluate then by some criterion Hierarchy algorithms: Create a hierarchy decomposition of the set of data(or objects) using some criterion Density-based: Based on connectivity and density functions
2015/7/26 Why choose DBSCAN The reason: -One of the most efficient algorithms on large databases -Be applied to any database containing data from a metric space (assuming a distance function)
2015/7/28 The algorithm DBSCAN Background Definition DBSCAN Algorithm DBSCAN Demo
2015/7/29 Background D: the dataset (i.e., a set of points) Eps: Maximum radius of the neighborhood MinPts: Minimum number of points in an Eps- neighborhood of that point N Eps (p) is the subset of D contained in the Eps- neighborhood of p N Eps (p)={q belong to D | dist(p,q) ≤ Eps} core object : | N Eps (p)| ≥ MinPts Eps p
2015/7/210 directly density-researchable : p in N Eps (q) |N Eps (q)| ≥ MinPts p q density-reachable : p D q because p r r q Definition q p Eps MinPts: 4 q p r
2015/7/211 Definition(Cont.) density-connected : p and q are density-connected because p D o q D o cluster : Maximality :∀ p,q in D, if p ∈ C and q D p, then q ∈ C Connectivity :∀ p,q in C, p is density-connected to q noise :{ p in D ∣∀ i: p ∉ C i } MinPts: 4 o p q core objects borderline objects noise
2015/7/212 DBSCAN Algorithm o p q
2015/7/213 DBSCAN Demo Stack Eps: MinPts: 3 noise
