Download presentation
Presentation is loading. Please wait.
Published byLillian Franklin Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extensions of vector quantization for incremental clustering Edwin Lughofer PR, Vol.41 2008, pp. 995–1011 Presenter : Wei-Shen Tai 2011/1/19
2
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2 Outline Introduction Vector quantization Extensions of vector quantization Evaluation Conclusion and outlook Comments
3
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 3 Motivation Incremental clustering processes Quite often online measurements are recorded resulting in data streams for various applications. In an online manner, guarantee that queries are up-to-date and that results can be answered with a small time delay.
4
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 4 Objective An incremental and evolving vector quantization Processes data streams in a on-line clustering scheme. Omits pre-definition of the number of clusters and improve the quality of cluster partitions with several strategies.
5
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 5 Vector quantization 1.Choose initial values for the C cluster centers. 2.Fetch out the next data sample of the data set. 3.Calculate the distance of the selected data point to all cluster centers. 4.Elicit the cluster center which is closest to the data point. 5.Update the p components of the winning cluster by moving it towards the selected point. 6.If the data set contains data points which were not processed through steps 2–5, goto step 2. 7.If any cluster center was moved significantly in the last iteration, say more than, reset the pointer to the data buffer at the beginning and goto step 2, otherwise stop.
6
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 6 Vector quantization in incremental mode Stability / plasticity dilemma in ART-2 Using vigilance parameter ρ to control the tradeoff between adaptation of already learned clusters (stability) and generation of new clusters (plasticity). Differences between VQ and VQ-INC The starting number of clusters is zeros. If the distance between the incoming input x and the closest cluster center c win is larger than ρ and x is not faulty, a new cluster will be created. Otherwise, c win is updated to move toward to x. Update the ranges of all p variables if x is not faulty. Besides, η is changed with the amount of data points belonging to each cluster in a monotonic decreasing way.
7
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 7 An alternative distance strategy Both ‘over-clustering’ and incorrect partition of the input space occur in VQ-INC. Instead of classic Euclidean distance, the ranges of influence for all clusters or the surface along the direction towards the cluster center are applied in VQ-INC-EXT.
8
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 8 Satellite deletion Cluster satellites Undesirable tiny clusters, which lie very close to significantly bigger ones. Identify outliers and satellites If k i /N <1%, cluster i is regarded as an outlier cluster. If k i /N < low_mass and c i lies inside the range of influence of any other cluster, elicit the closest center c win. Calculate the distance of c i to the surface of all other clusters.
9
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 9 A split-and-merge strategy Parameter ρ Cannot be known in advance and a bad setting may cause an incorrect cluster structure. Not-optimal clustering It is prevented by merging clusters grown together or by splitting big clusters including more than one distinct data cloud. Calculate the quality of cluster partition in three phases including before spilt, after spilt (p results) and after merged. Then pick the best cluster partition to replace existing one.
10
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 10 Evaluation
11
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 11 Conclusion and outlook A new extended vector quantization (VQ-INCEXT) Can be applied for data streams in fast online applications or for huge data bases. Provides an incremental learning scheme and incorporates new distance measurement, satellite deletion and online split-and-merge strategy. Outlooks Split-and-merge strategy may suffer from computation speed. Reacting to drifts or shifts in the data, drifts changes the distribution of the underlying data smoothly over time; shifts trigger abrupt and sudden changes of the data characteristics.
12
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 12 Comments Advantage This proposed method extends VQ to a incremental learning VQ and adds several strategies to improve the quality of cluster partition simultaneously. Data streams can be effectively processed by this on-line learning VQ. Drawback In algorithm 3, the vector of winning cluster is updated by Eq.(1) according to the Manhattan distance between the winning cluster and the input whenever the new distance strategy is applied. Application Data stream on-line learning issue.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.