CDS-Tree: An Effective Index for Clustering Arbitrary Shapes in Data Streams Huanliang Sun, Ge Yu, Yubin Bao, Faxin Zhao, Daling Wang RIDE-SDMA’05 Advisor.

CDS-Tree: An Effective Index for Clustering Arbitrary Shapes in Data Streams Huanliang Sun, Ge Yu, Yubin Bao, Faxin Zhao, Daling Wang RIDE-SDMA’05 Advisor ： Jia-Ling Koh Speaker ： Tsui-Feng Yen

Introduction Partitioning - k-means and k-medians algorithms don’t emphasize on finding arbitrary shapes in data streams Density-based -DBSCAN can find arbitrary shapes in data streams, but need to scan database more than one time Cell-based (Grid-based) - CLIQUE has three problems -high complexity -high memory -accuracy is not good with limited memory for changing data streams

Problem Definition Domain ： A={A1,A2,…,Ak} S= A1xA2x... xAk be a k-dimensional numerical space. A1, A2,…,Ak as the dimensions (attributes) of S A k-dimension data stream X={x1, x2, …, xn} is a set of ordered objects at t time point, where xi=, and xij, the jth component of xi, is drawn from domain Aj.

Definition Sliding window model on data stream X - B1 is the most recent bucket, and Bu is the oldest - The window slides by creating a new bucket and discarding a oldest one

Definition cont. Partition P of data stream X - P be a set of non-overlapping rectangular cells, which is obtained by partitioning every dimension of X into equal length -Each cell C is the intersection of one interval from each dimension. It is represented as the form {c1,c2,…,ck} -A cell can also be denoted as (cNO1, cNO2, …, cNOk)named the coordinate of the cell, where cNOi is the interval number of the cell on i-th dimension

Definition cont. Selectivity pc of cell C -The number of points that belong to C defines the selectivity pc of cell C Clustering based on cells data stream X in a sliding window -If the selectivity of a cell is larger than a threshold τ, we call the cell dense -A cluster is the largest set of cells that are adjacent and dense -Two cells C1 and C2 are connective when they are neighboring, or there exists a cell C3, C1 and C3 are neighboring, C2 and C3 are neighboring

CDS-Tree data stream coming ： (2,3),(5,4),(6,5) root-node mid leaf total-num-list

Related Algorithms of CDS-Tree CDS-Tree building algorithm

Related Algorithms of CDS-Tree Clustering algorithm based on CDS-Tree.

Granularity Adjustment -the finer the partition is, the higher the accuracy is, but the more number of the cells is created -if the current cost memory Mp is far less than Mmax, we can execute finer granularity partition for higher accuracy. -if the current memory cost Mp is close to Mmax, we should use coarser partition to avoid memory overflow.

Granularity Adjustment cont. Safety factor (in case of exhausting memory) -λ ： is used to avoid the memory required exceeding the limited memory Mmax when the granularity turns finer, here we set it larger than 1. -η ： we set it to decide the time point to adjust the granularity, where ηis less than 1. For example, is set 0.1, which represents when left memory is less than 10% of Mmax, the algorithm will turn granularity coarse to save more memory.

Granularity Adjustment Algorithm

Experimental Results OS: Microsoft Windows 2000 CPU: 2.5GHz RAM: 512MB Two databases ： - KDD-CUP-99 Network Intrusion Detection stream dataset - Image Fourier Coefficient dataset

Experimental Results

CDS-Tree: An Effective Index for Clustering Arbitrary Shapes in Data Streams Huanliang Sun, Ge Yu, Yubin Bao, Faxin Zhao, Daling Wang RIDE-SDMA’05 Advisor.

Similar presentations

Presentation on theme: "CDS-Tree: An Effective Index for Clustering Arbitrary Shapes in Data Streams Huanliang Sun, Ge Yu, Yubin Bao, Faxin Zhao, Daling Wang RIDE-SDMA’05 Advisor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CDS-Tree: An Effective Index for Clustering Arbitrary Shapes in Data Streams Huanliang Sun, Ge Yu, Yubin Bao, Faxin Zhao, Daling Wang RIDE-SDMA’05 Advisor.

Similar presentations

Presentation on theme: "CDS-Tree: An Effective Index for Clustering Arbitrary Shapes in Data Streams Huanliang Sun, Ge Yu, Yubin Bao, Faxin Zhao, Daling Wang RIDE-SDMA’05 Advisor."— Presentation transcript:

Similar presentations

About project

Feedback