Download presentation
Presentation is loading. Please wait.
Published byIda Jayadi Modified over 5 years ago
1
Dimensionally distributed Pasi Fränti and Sami Sieranoja
density estimation Pasi Fränti and Sami Sieranoja P. Fränti and S. Sieranoja, "Dimensionally distributed density estimation", Int. Conf. Artificial Intelligence and Soft Computing (ICAISC), Zakopane, Poland, , June 2018.
2
Density in clustering
3
Density in outlier detection
4
Definitions
5
Definition of density Density = mass / volume
6
Definition of density Density = mass / volume r
7
Definition of density Density = mass / volume r N
8
Definition of density Density = mass / volume r N
9
Two-ways to estimate density
Input: Point Output: Density around the point Distance-based Neighbor-based Fix neigborhood (R) Count points (N) Fix number of points (N) Measure size of neigborhood (R)
10
Two-ways to estimate density
Distance-based Neighbor-based 1.9 2 1.1 Input: R-radius Output: Point count Input: k-neighbors Output: Mean distance
11
Two-ways to estimate density
Distance-based Neighbor-based 1.9 3 1.4 1 1.4 2 1.6 0.9 2 2 0.8 1.1 2 1 1.2 1.5 2.0
12
Summary Distance-based Neighbor-based Measure: N Fixed constant
Measure: R
13
Choice of the parameters
Distance-based: R = % * average distance to data center [2] R = Average pairwise distance of all data points [28] R = 90% * first peak in the pairwise distance histogram [17] R = 0.07 [26] Neighbor-based: k = 10 [18] k = 30 [12] k = [27] k = [5] k = N [19] k = min{50, N/(2K)} where K is the number of clusters
14
Bottleneck: finding neighbors
O(N2) Distance-based Neighbor-based 3 1 2 4 d(x,y) > R k-nearest
15
Dimensionally distributed density estimation (DDDE)
16
Density in categorical data
Estimate popularity of individual attributes Cao, Liang, Bai, Expert Systems with Applications, 2009. [Zhang, Farmer, Mandarin] [Malinen, Scientist, Finnish] A B
17
Sorting in each dimension
Sorting by x-values Sorting by y-values Sliding window Sliding window
18
Independent density estimates
x-projection y-projection 1.7 2.0 1.6 1.2 0.3 1.2 0.5 0.6 0.6 0.4 Sliding window 0.5 0.6 0.4 1.8 2.0 0.5 0.7 0.5 1.5 0.9 Sliding window Density value = = 1.2
19
Density estimates DDDE 2-NN
20
Sliding window technique
m— m+ 33 y[i] 73 17 21 26 29 44 47 67 75 77 88 95 15 25 m— m+ 40 y[i] 80 17 21 26 29 44 47 67 75 77 88 95 -26 +47 -67 +88
21
DDDE algorithm O(DNlogN+DN) O(NlogN) O(N)
22
Potential false detections
23
Experiments
24
Methods compared Clustering algorithms:
Density-based sorting + k-means Density peaks [26] Repeated k-means (as reference point) Density estimations: Full search O(N2) Using subsample (s=2%) O(sN2) Using DDDE O(NlogN)
25
Datasets S1 S2 S3 S4 Unbalance Birch1 Birch2 DIM32 A1 A2 A3
26
Centroid index (CI) CI = 4
[Fränti, Rezaei, Zhao, Pattern Recognition 2014] CI = 4 empty 15 prototypes (pigeons) 15 real clusters (pigeon holes) empty empty empty
27
Quality comparison Centroid index
28
Effect for density peaks
Full search DDDE
29
Speed comparison Seconds
30
Speed vs. quality
31
Time profiling
32
Time profiling
33
Conclusions Rapid O(DN logN) time algorithm.
Remarkable 160:1 speed-up Density estimation no longer bottleneck
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.