Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mean-shift outlier detection

Similar presentations


Presentation on theme: "Mean-shift outlier detection"— Presentation transcript:

1 Mean-shift outlier detection
Jiawei Yang Susanto Rahardja Pasi Fränti

2 Clustering

3 Clustering with noisy data

4 How to deal with outliers
in clustering Approch 1: Cluster Approch 2: Outlier removal Cluster Approch 3: Outlier removal Cluster Mean-shift approch: Mean shift Cluster

5 Mean-shift

6 K-nearest neighbors K-neighborhood Point k=4 6 Motivation:
Orienteering is difficult to create and maintain. Easier with O-Mopsi. Control points may be lost. Targets might become outdated in O-Mopsi. Not everyone can play the game at 17:00. In O-Mopsi you can play at any time. Disadvantages: Battery, Weather, Point 6

7 Expected result Mean of neighbors Point 7 Motivation:
Orienteering is difficult to create and maintain. Easier with O-Mopsi. Control points may be lost. Targets might become outdated in O-Mopsi. Not everyone can play the game at 17:00. In O-Mopsi you can play at any time. Disadvantages: Battery, Weather, 7

8 Result with noise point
Mean of neighbors Noise point K-neighborhood Motivation: Orienteering is difficult to create and maintain. Easier with O-Mopsi. Control points may be lost. Targets might become outdated in O-Mopsi. Not everyone can play the game at 17:00. In O-Mopsi you can play at any time. Disadvantages: Battery, Weather, 8

9 Part I: Noise removal Fränti and Yang, "Medoid-shift noise removal to improve clustering", Int. Conf. Artificial Intelligence and Soft Computing (CAISC), June 2018.

10 Move points to the means of their neighbors
Mean-shift process Move points to the means of their neighbors Motivation: Orienteering is difficult to create and maintain. Easier with O-Mopsi. Control points may be lost. Targets might become outdated in O-Mopsi. Not everyone can play the game at 17:00. In O-Mopsi you can play at any time. Disadvantages: Battery, Weather, 10

11 Mean or Medoid? Mean-shift Medoid-shift

12 Medoid-shift algorithm
Fränti and Yang, "Medoid-shift noise removal to improve clustering", Int. Conf. Artificial Intelligence and Soft Computing (CAISC), June 2018. REPEAT 3 TIMES Calculate kNN(x) Calculate medoid M of the neighbors Replace point x by the medoid M

13 Iterative processes

14 Effect on clustering result
Noisy original Iteration 1 CI=4 CI=4 Iteration 2 Iteration 3 CI=3 CI=0

15 Experiments

16 Datasets P. Fränti and S. Sieranoja, "K-means properties on six clustering benchmark datasets", Applied Intelligence, 2018. k=20 k=35 k=50 N=5000 N=5000 k=15 k=15 N=5000 N=5000 k=15 k=15

17 Noise types S1 S3 S3 S1 Random noise Random noise Data-dependent noise

18 Results for noise type 1 KM: K-means with random initialization
Centroid Index (CI) CI reduces from 5.1 to 1.4 KM: K-means with random initialization RS: P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, 2018.

19 Results for noise type 2 KM: K-means with random initialization
Centroid Index (CI) CI reduces from 5.0 to 1.1 KM: K-means with random initialization RS: P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, 2018.

20 Part II: Outlier detection
Yang, Rahardja, Fränti, "Mean-shift outlier detection", Int. Conf. Fuzzy Systems and Data Mining (FSDM), November 2018.

21 Outlier scores 203 12 D=|x-y| k=4 21 Motivation:
Orienteering is difficult to create and maintain. Easier with O-Mopsi. Control points may be lost. Targets might become outdated in O-Mopsi. Not everyone can play the game at 17:00. In O-Mopsi you can play at any time. Disadvantages: Battery, Weather, 12 k=4 21

22 Mean-shift for Outlier Detection
Algorithm: MOD Calculate Mean-shift y for every point x Outlier score Di = |yi - xi| SD = Standard deviation of all Di IF Di > SD THEN xi is OUTLIER

23 Result after three iterations
x D=|x-y3| y3 y1 y2 Motivation: Orienteering is difficult to create and maintain. Easier with O-Mopsi. Control points may be lost. Targets might become outdated in O-Mopsi. Not everyone can play the game at 17:00. In O-Mopsi you can play at any time. Disadvantages: Battery, Weather, 23 23

24 Outlier scores D=|x-y| SD=52 69 203 65 36 13 64 26 44 28 14 40 30 120
187 12 4 46 27 16 58 118 83 95 SD=52 24 24

25 Distribution of the scores
Outliers Normal points

26 Experimental comparisons

27 Results for Noise type 1 A priori noise level 7%
Automatically estimated (SD)

28 Results for Noise type 2 A priori noise level 7%
Automatically estimated (SD)

29 Conclusions Why to use: Simple but effective! How it performs:
Outperforms existing methods! Usefulness: No threshold parameter needed! 98-99%


Download ppt "Mean-shift outlier detection"

Similar presentations


Ads by Google