Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data Presenter : Cheng-Han Tsai Authors : Liang Bai, Jiye Liang, Chuangyin Dang KBS, 2011

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation  The k-modes algorithm is sensitive to initial cluster centers and needs to give the number of clusters in advance.  We can’t guarantee the number of clusters we select are the best. 3

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives 4 To propose an initialization method to find initial cluster centers and the number of clusters. The method can efficiently deal with large categorical data in linear time.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 5 Data Set Construct a potential exemplars set S Set the estimated number of clusters K-modes-type algorithm The clustering result 12 3 4 5 67

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  The k-modes algorithm 6  Hamming distance: Differences between two codes(using XOR) ex:10001001 XOR 10110001 ------------------------ 00111000 → Hamming distance = 3

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  New cluster centers initialization method  Finding the number of clusters 7

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  New cluster centers initialization method. 8

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 9

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 10

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 11

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  Finding the number of clusters ─ We need to input a value k’ which is a estimated number of clusters ─ If k’ can’t be determined, we set k’ = |S| 12

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 13

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 14

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  More than 1 knee point of the function P(k)  More than 1 peak of the function C(k) 15

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Performance analysis ─ Soybean dada (4 diseases) ─ Lung cancer data (3 classes) ─ Zoo data (7 classes which has 3 big clusters and 4 small clusters) ─ Mushroom data (2 classes)  Scalability analysis 16

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Performance analysis 17

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 18

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Scalability analysis ─ 67557 data points and 42 categorical attribute 19

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions  The proposed method is effective and efficient for obtaining the good initial cluster centers and the number of clusters  The time complexity has been analyzed in linear time 20

21 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 21 Comments  Advantages ─ Improve the old method about setting the two parameters  Applications ─ Data clustering


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial."

Similar presentations


Ads by Google