Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dimensionally distributed Pasi Fränti and Sami Sieranoja

Similar presentations


Presentation on theme: "Dimensionally distributed Pasi Fränti and Sami Sieranoja"— Presentation transcript:

1 Dimensionally distributed Pasi Fränti and Sami Sieranoja
density estimation Pasi Fränti and Sami Sieranoja P. Fränti and S. Sieranoja, "Dimensionally distributed density estimation", Int. Conf. Artificial Intelligence and Soft Computing (ICAISC), Zakopane, Poland, , June 2018.

2 Density in clustering

3 Density in outlier detection

4 Definitions

5 Definition of density Density = mass / volume

6 Definition of density Density = mass / volume r

7 Definition of density Density = mass / volume r N

8 Definition of density Density = mass / volume r N

9 Two-ways to estimate density
Input: Point Output: Density around the point Distance-based Neighbor-based Fix neigborhood (R) Count points (N) Fix number of points (N) Measure size of neigborhood (R)

10 Two-ways to estimate density
Distance-based Neighbor-based 1.9 2 1.1 Input: R-radius Output: Point count Input: k-neighbors Output: Mean distance

11 Two-ways to estimate density
Distance-based Neighbor-based 1.9 3 1.4 1 1.4 2 1.6 0.9 2 2 0.8 1.1 2 1 1.2 1.5 2.0

12 Summary Distance-based Neighbor-based Measure: N Fixed constant
Measure: R

13 Choice of the parameters
Distance-based: R = % * average distance to data center [2] R = Average pairwise distance of all data points [28] R = 90% * first peak in the pairwise distance histogram [17] R = 0.07 [26] Neighbor-based: k = 10 [18] k = 30 [12] k = [27] k = [5] k = N [19] k = min{50, N/(2K)} where K is the number of clusters

14 Bottleneck: finding neighbors
O(N2) Distance-based Neighbor-based 3 1 2 4 d(x,y) > R k-nearest

15 Dimensionally distributed density estimation (DDDE)

16 Density in categorical data
Estimate popularity of individual attributes Cao, Liang, Bai, Expert Systems with Applications, 2009. [Zhang, Farmer, Mandarin] [Malinen, Scientist, Finnish] A B

17 Sorting in each dimension
Sorting by x-values Sorting by y-values Sliding window Sliding window

18 Independent density estimates
x-projection y-projection 1.7 2.0 1.6 1.2 0.3 1.2 0.5 0.6 0.6 0.4 Sliding window 0.5 0.6 0.4 1.8 2.0 0.5 0.7 0.5 1.5 0.9 Sliding window Density value = = 1.2

19 Density estimates DDDE 2-NN

20 Sliding window technique
m— m+ 33 y[i] 73 17 21 26 29 44 47 67 75 77 88 95 15 25 m— m+ 40 y[i] 80 17 21 26 29 44 47 67 75 77 88 95 -26 +47 -67 +88

21 DDDE algorithm O(DNlogN+DN) O(NlogN) O(N)

22 Potential false detections

23 Experiments

24 Methods compared Clustering algorithms:
Density-based sorting + k-means Density peaks [26] Repeated k-means (as reference point) Density estimations: Full search O(N2) Using subsample (s=2%) O(sN2) Using DDDE O(NlogN)

25 Datasets S1 S2 S3 S4 Unbalance Birch1 Birch2 DIM32 A1 A2 A3

26 Centroid index (CI) CI = 4
[Fränti, Rezaei, Zhao, Pattern Recognition 2014] CI = 4 empty 15 prototypes (pigeons) 15 real clusters (pigeon holes) empty empty empty

27 Quality comparison Centroid index

28 Effect for density peaks
Full search DDDE

29 Speed comparison Seconds

30 Speed vs. quality

31 Time profiling

32 Time profiling

33 Conclusions Rapid O(DN logN) time algorithm.
Remarkable 160:1 speed-up Density estimation no longer bottleneck


Download ppt "Dimensionally distributed Pasi Fränti and Sami Sieranoja"

Similar presentations


Ads by Google