Download presentation
Presentation is loading. Please wait.
1
A Framework for Projected Clustering of High Dimensional Data Streams Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004
2
Motivation and Underlying Concepts All dimensions should not be considered in high dimensional setup for clustering The Fading Cluster Structure: Use fading function The half life t0 of a point is defined as the time at which f(t0) = (1=2)f(0). A fading cluster structure at time t for a set of d-dimensional points The clustering structure properties called additivity and temporal multiplicity The clustering process requires a simultaneous maintenance of the clusters as well as the set of dimensions associated with each cluster
3
HPStream : High-Dimentional Projected Stream Clustering Method
4
HPStream Algorithm – Brief Explanation -Set parameters -Normalization Process -Initial Clustering using k-means and Init Number -ComputeDimensions: This procedure determines the dimensions in such a way that the spread along the chosen dimensions is as small as possible -The next step is the determination of the closest cluster to the incoming data point using FindProjectedDist -The procedure for determination of the limiting radius is denoted by FindLimitingRadius -Finally decision which cluster to add or delete.
8
Experimental Setup HPStream compared with Clustream : both implemented on MS VC++ One synthetic data and 2 sets of Real world data - Network Intrusion and Forest cover type data sets. Comparison criteria for judging the 2 algorithms: - accuracy : clustering quality - efficiency : stream processing rate - sensitivity : varying decay rate, l and radius threshold - scalability : varying number of dimensions and clusters Parameters initialized as following: Decay-rate = 0:5, Spread radius factor = 2, InitNumber =2000, Average Projected Dimensionality l > d/2.
9
Comparing Accuracy : Using clustering quality and cluster purity
10
Accuracy comparison continued:
12
Efficiency comparison using Stream Processing Rate:
13
Sensitivity : Varying ‘l’
14
Sensitivity: Varying radius threshold and decay rate
15
Scalability : varying dimensionality and number of clusters
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.