Presentation is loading. Please wait.

Presentation is loading. Please wait.

DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić

Similar presentations


Presentation on theme: "DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić"— Presentation transcript:

1 DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić
Professor Dr Veljko Milutinović Student Milan Micić 2011/3323 School of Electrical Engineering, University of Belgrade Department of Computer Engineering

2 Content Introduction The DBSCAN basic idea Algorithm DBSCAN on R
Example Advantages Disadvantages References 2/13

3 Introduction Data clustering algorithms
Using in machine learning, pattern recognition, image analyses, information retrieval, and bioinformatics Hierarchical, centroid-based, distribution-based, density-based, etc 3/13

4 DBSCAN basic idea Density-Based Spatial Clustering of Applications with Noise Munich,1996 Derived from a human natural clustering approach Input parameters The size of epsilon neighborhood – ε Minimum points in cluster – MinPts Neighborhood of a given radius ε has to contain at least a minimum number of points MinPts 4/13

5 DBSCAN basic idea Directly density-reachable, p1 from p2
p1 belongs to the ε neighborhood of p2 p2's neighborhood size is greater than a given parameter MinPts Density-reachable, p0 from pn Exists a chain of points p1,..., pn-1, where pi+1 is directly density-reachable from pi Core, border and noise point 5/13

6 Algorithm Complexity with indexing structure: O(n*log(n))
DBSCAN(D, eps, MinPts) C = 0 for each unvisited point P in dataset D mark P as visited N = regionQuery(P, eps) if sizeof(N) < MinPts mark P as NOISE else C = next cluster expandCluster(P, N, C, eps, MinPts) expandCluster(P,N,C,eps,MinPts) add P to cluster C for each point P' in N if P' is not visited mark P' as visited N' = regionQuery(P', eps) if sizeof(N') >= MinPts N = N joined with N' if P' is not yet member of any cluster add P' to cluster C Complexity with indexing structure: O(n*log(n)) 6/13

7 DBSCAN on R FPC - Flexible Procedures for Clustering
GNU General Public License  Various methods for clustering and cluster validation Interface functions for many methods implemented in language R DBSCAN: O(n2) dbscan(x,0.2,showplot=2) dbscan Pts=600 MinPts=5 eps=0.2 seed border total 7/13

8 Example Astronomy task
Identifying celestial objects by capturing the radiation they emit Captured noise (by sensors, diffuse emission from atmosphere and space itself) Eliminating method – to constrain the relevant intensity by a known threshold In this case – only pixels whose intensity are less than 50 (and consequently darker) are being considered 8/13

9 Example DBSCAN algorithm applied on individual pixels
Linking together a complete emission area Each of the generated cluster will define a celestial entity ε = 5, MinPts = 5, 64 clusters and 224 outliers found 9/13

10 Disadvantages Appropriate parameters ε and MinPts
Numerous experiments indicates best MinPts = 4 Clustering datasets with large difference in densities “Curse of dimensionality” In every algorithm based on the Euclidean distance for high-dimensional data sets 10/13

11 Advantages Does not require number of clusters in the data a priori
Can find arbitrarily shaped clusters Even clusters completely surrounded by a different cluster Mostly insensitive to the ordering of the points in the database Only border points might swap cluster membership Has a notion of noise Requires just two parameters 11/13

12 References Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu: “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Institute for Computer Science, University of Munich, 1996; Mehmed Kantardzic: “Data Mining: Concepts, Models, Methods, and Algorithms”, 2011; Wikibooks: Wiki: 12/13

13 Thank you for your attention!
Questions Milan Micić 13/13


Download ppt "DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić"

Similar presentations


Ads by Google