Download presentation
Presentation is loading. Please wait.
Published byJerome Willis Modified over 9 years ago
1
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU
2
Presentation Outline Introduction concept of clustering, clustering of uncertain objects Example: Application of clustering on uncertain data UK-means algorithm Motivation Voronoi-diagram-based (VD) clustering MinMax-based (MM) clustering VD is strictly better than MinMax Clustering algorithms VDBi, VDBiP, VD based methods with Cluster Shift When VD based methods are better than MM based methods? Experiments Conclusion
3
Introduction
4
Clustering Group similar data objects together to form clusters Partition-based clustering Input: # of clusters (k), # of objects (n) Iterative method In each iteration, divide n data objects into k groups to minimize an objective function e.g., minimize the sum of squares of distances Stop when the results are converged
5
Introduction To cluster the data points in 2D space Data objects: n data points Apply any partition-based clustering algorithms (K-means) Distance measure: Euclidean distance, Manhattan distance, etc.
6
Introduction To cluster the uncertain objects in 2D space Uncertain objects: objects with uncertainty (e.g. location uncertainty) No fixed coordinates in 2D space Object’s location is estimated by using a probability density function (pdf) over an uncertainty region Assume the pdf for each object can be obtained Uncertainty region (ur): a region that the object may appear, with a certain probability distribution; and the probability of the objects appear outside the uncertainty region is zero Each object may have an irregular uncertainty region, also the pdf could be arbitrary o1o1 o 1.ur MBR of o 1.ur
7
The expected distance (ED) is used to measure the distance between uncertain object and cluster representative. ED is the expected distance function, d is Euclidean distance function, x is any point inside o i ’s uncertainty region, f is the pdf of uncertain objects o i, and p j is any cluster representatives. ED computations are very expensive, in each iteration of K-means, nk ED computations are required. Expected distance computation Cluster p j oioi ED(o i, p j )
8
Application: Clustering the vehicles Objective: get traffic patterns by clustering vehicles in a city Data objects: vehicles on a 2D map Uncertainty: location uncertainty of the vehicles, each pdf defined over object’s uncertainty region represent the probability distribution of possible location of a vehicle in a certain period of time
9
oioi Degree of uncertainty is affected by the following factors, 1.Time 2.Traffic of the roads 3.Shape of the roads 4.Speed of the vehicles
10
oioi Results
11
UK-means UK-means: first extension of K-means algorithm to handle uncertain objects Distance measure: Expected distance (ED) Disadvantage: Slow and inefficient Show the possibility of using K-means to handle the clustering of uncertain objects
12
Two Approaches to solve clustering problem by UK-means 1. MinMax-based approach (Jacky) 2. Voronoi-Diagram-based approach (Paul)
13
Motivation
14
Two Approaches to solve clustering problem by UK-means 1. MinMax-based approach (Jacky) Basic MinMax distance pruning (MinMax) MinMax with pre-computation of ED MinMax with Cluster Shift (MinMax-Shift) 2. Voronoi-Diagram-based approach (Paul) Voronoi diagram with Bisector Pruning (VDBi) Voronoi diagram with Bisector Pruning and Partial expected distance computations (VDBiP) Voronoi diagram with Bisector Pruning and Cluster Shift (VDBi-Shift) Voronoi diagram with Bisector Pruning and Partial expected distance computations and Cluster Shift (VDBiP-Shift)
15
MinMax-based Approach UK-means with MinMax distance pruning Objective: avoid expected distance computation using mindist and maxdist between object’s MBR and cluster representatives to represent the distance bounds of ED(c j, o i ) & ED(c m, o i ) E.g., given an object o i, cluster rep c j and c m, if mindist(c j, o i ) > maxdist (c m, o i ) then c j can be pruned oioi cjcj cmcm maxdist (c m, o i ) mindist(c j, o i ) ED(c j,o i ) need not be calculated. (pruned) ED(c j,o i ) > ED(c m,o i ) prune c j
16
MinMax-based Approach Upper and lower bounds can become tighter by using Cluster Shift (CS) and ED Pre-computation (PC) methods Replace mindist and maxdist loose estimation by tighter estimations on distance bounds Details refer to Jacky’s works
17
Voronoi-diagram-based approach Each object’s uncertainty region is bounded by its minimum bounding rectangle (MBR) The objects’ MBRs are indexed by R-tree Voronoi diagram is constructed for the cluster representatives in each iteration o1o1 Voronoi diagram for 5 cluster representatives Uncertain object o 1 indexed by R-tree
18
o1o1 p1p1 p2 p2 Bisector of p 1 and p 2 Voronoi-diagram-based approach If the bisector of two cluster representatives do not cut an object’s MBR, and fall in p 2 side of the bisector, then ED(p 1,o 1 ) > ED(p 2, o 1 )
19
p1p1 o1o1 p2 p2 p 3 ED(o 1, p 2 ) < ED(o 1, p 1 ) and ED(o 1,p 2 ) < ED(o 1, p 3 ) o 1 is assigned to cluster p 2. Voronoi-diagram-based approach (Cluster Assignment)
20
Voronoi-diagram-based approach In each iteration, For each Voronoi cell, (approximated by a MBR) issue a range queries to object’s R-tree retrieve the candidates objects for the cluster If the candidate’s MBR is completely enclosed in the Voronoi cell, assign the object to the cluster If the candidate’s MBR intersect with more than one Voronoi cells, special handling methods required for the objects to prune away the unqualified clusters get candidate objects for the cluster object enclosed entirely in Voronoi cell object that intersect with more than one Voronoi cell
21
Avoid expected distance computation 1. If the object is completely enclosed in a Voronoi cell, then the object must belong to this cluster 2. For the best case, we do not need any expensive expected distance calculations, and we do not need to retrieve the object’s pdf during the clustering Advantages of using Voronoi- diagram-based clustering
22
Voronoi diagram construction cost is independent of number of objects We only need O(k log k) time to compute the 2D Voronoi diagram in each iteration, where k is the number of clusters, and k is not depend on number of objects n is much larger than k
23
1. Handling of uncertain objects that intersect with more than one Voronoi cells We cannot determine the nearest clusters by just looking at the Voronoi diagram Difficulties of Voronoi based clustering c1c1 o1o1 c2 c2 c 3
24
Is VD better than basic MinMax? Theorem: VD is strictly better than basic MinMax Given an object o i that is assigned to cluster c 1, for any iteration in UK-means, if VD calculates ED(o i, c p ) for some c p, then MM must calculate ED(o i, c p ) as well. If VD does not calculate ED(o i, c p ), sometimes MM must calculate ED(o i,c p ).
25
In some situations, VD based is better VD based methods is always better than basic MinMax, but VD based methods may not beat MinMax-Shift In some situations, VD based methods outperform all MM based methods when the object uncertainty are very small, then VD based methods are preferred
26
Clustering algorithms
27
Clustering Methods Voronoi-diagram-based approach 1. Voronoi diagram with bisector pruning (VDBi) 2. Voronoi diagram with bisector pruning and partial expected distance computation (VDBiP)
28
MinMax-based Methods For each object, Find out the upper and lower bounds of ED values if Cluster-Shift (CS) method is not enabled, upper and lower bounds is estimated by “maxdist” and “mindist” respectively (MinMax) if CS method is enabled, then upper and lower bounds become tighter (MinMax-Shift) Prune unwanted clusters by upper and lower bounds For all un-pruned cluster compute the ED values to determine the cluster assignment of the object
29
Voronoi-diagram-based Methods Before each iteration, Voronoi diagram is constructed for all cluster representatives For each cluster representative, Find out the objects which completely enclosed in the cluster’s Voronoi cell Apply bisector pruning to prune unrelated clusters
30
Voronoi diagram with Bisector Pruning (VDBi) c1c1 o1o1 c 2 c 3 Comparing c 1 and c 3, o 1 fall into c 1 side of the bisector(c 1,c 3 ), then c 3 can be pruned. Since bisector of c1 and c2 cut o1’s MBR, o1 may assigned to either c1 or c2.
31
Voronoi diagram with bisector pruning and partial expected distance computation (VDBiP) Cut the object ’ s MBR input two equal halves (a) and (b) o1o1 (a)(b)
32
VDBiP If o 1(b) ’ s MBR is completely enclosed in Voronoi cell of c 2 Compute ED(o 1(a), c 1 ) & ED(o 1(a), c 2 ) Since ED(o 1(b), c 2 ) < ED(o 1(b), c 1 ) If ED(o 1(a), c 2 ) < ED(o 1(a), c 1 ) then ED(o 1(a), c 2 ) + ED(o 1(b), c 2 ) < ED(o 1(a), c 1 ) + ED(o 1(b), c 1 ) => prune c 1 c1c1 o1o1 c 2 (a)(b) ED(o 1(a), c 1 ) ED(o 1(a), c 2 )
33
Experiments
34
Measures Efficiency (Expected distance computation required) Comparison with Basic Min-max distance pruning (MinMax) Voronoi diagram with Bisector Pruning (VDBi) Voronoi diagram with Bisector Pruning and Partial expected distance computation (VDBiP) MM-based with Cluster Shift (MinMax-Shift) VD-based with Cluster Shift (VDBi-Shift,VDBiP-Shift)
35
Experimental Settings Data setrandomly generated synthetic data set Probability density function random Domain 100 x 100 2D space Number of objects 10000 Number of clusters vary Maximum length of an MBR’s side 10%, 1%, 0.1% Number of sample points 20 * 20
36
Degree of uncertainty is large (MBR width = 10%) 1.VDBi perform slight better than basic MinMax only 2.Cluster shift method greatly improve basic MinMax and VDBi performance
37
Degree of uncertainty is small (MBR width = 1%) 1.Cluster shift method cannot greatly improve the performance of MinMax 2.VD-based approach outperform MM-based approach 1.VD-based approach still better than MM-based approach, but VD perform slightly better if there are less clusters
38
Degree of uncertainty is very small (MBR width = 0.1%)
39
Performance analysis AlgorithmsDescription MinMaxthe worst one MinMax-ShiftGood when object is large VDBiGood when object is small VDBi-ShiftGood at all cases, outperform MinMax-based method VDBiPbetter than VDBi, perform well when MBR width is small VDBiP-ShiftFurther improvement to VDBiP
40
Performance Analysis Basic MinMax performance is bad, because of the loose upper and lower bound estimation by maxdist and mindist. When degree of uncertainty of an object are small, MinMax with cluster shift (improved distance bounds) method cannot greatly improve the tightness of distance bounds, since mindist and maxdist is accurate enough MinMax-Shift’s performance is similar to that of basic MinMax Because of the smaller object’s size, lesser objects may intersect with multiple Voronoi cells, also we proved that VD is better than basic MinMax VD is good for small objects, and a hybrid of cluster shift (PC) and VD perform well in all cases Maxdist(o 1,c j ) is a very loose upper bounds, Cluster shift method can improve a lot cjcj o1o1 cjcj o2o2 Maxdist(o 2,c j ) is not a loose upper bounds, Cluster shift method cannot improve a lot
41
Conclusion Uncertain clustering Voronoi-diagram-based approach and MinMax-based approach VDBi is strictly better than basic MinMax Voronoi-diagram-based approach beat MinMax-based approach when object’s uncertainty are small Hybrid approach is good in all cases
42
Thank you Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.