Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate : Chen, Shao-Pei Authors : Bi-Uu Dai, Jen-Wei Huang, Mi-Yen Yeh, and Ming-Syan Chen, Fellow, IEEE TKDE, 2006

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Methodology Experimental Results Conclusion Appendix

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Can we design a scheme for modeling both fast and slow evolving patterns adaptively? Can we provide a system to support various clustering requirements at the same time?

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective  To deal with various types of multiple data streams and to support flexible mining requirements.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Methodology  Online maintenance phase Wavelet-base Regression-base At most 5 latest models can be maintained in each level of the hierarchy. A model of level 0 is generated from 2 arrivals, and 2 new models accumulated in level L will be aggregated into a new model of level (L+1)

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Methodology  Offline clustering phase Wavelet-base Regression-base

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Methodology Criterion 1. If the slope of a regression-based model does not exceed a threshold  i.e.,,start to apply a wavelet-based model. Criterion 2. If the variation of a wavelet-based model is large than a threshold , i.e.,, where, start to employ a regression-based model. Criterion 3. If the slope of the aggregated model is large than a threshold , a regression-based model is maintained. To make a selection between a wavelet-based model and regression-based model

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Experimental Results COD on Real Data Set The regression-based COD is better than the wavelet-based COD, and the adaptive COD can achieve similar quality as the regression-based COD.

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Conclusion Advantage : One data scan for online statistics collection. Compact multi-resolution approximations. The COD framework performed very efficiently in the data stream environment while producing clustering results of very high quality.

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Appendix Sensitivity Analyses Adaptivity Analyses Scalability The time complexity of the online maintenance phase is O(n ( m + )). The storage space is proportional to O(log m) and O(n), where m is the number of points in each stream and n is the number of streams.


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate."

Similar presentations


Ads by Google