Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data Presenter : Cheng-Han Tsai Authors : Liang Bai, Jiye Liang, Chuangyin Dang KBS, 2011

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation  The k-modes algorithm is sensitive to initial cluster centers and needs to give the number of clusters in advance.  We can’t guarantee the number of clusters we select are the best. 3

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives 4 To propose an initialization method to find initial cluster centers and the number of clusters. The method can efficiently deal with large categorical data in linear time.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 5 Data Set Construct a potential exemplars set S Set the estimated number of clusters K-modes-type algorithm The clustering result 12 3 4 5 67

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  The k-modes algorithm 6  Hamming distance: Differences between two codes(using XOR) ex:10001001 XOR 10110001 ------------------------ 00111000 → Hamming distance = 3

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  New cluster centers initialization method  Finding the number of clusters 7

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  New cluster centers initialization method. 8

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 9

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  Finding the number of clusters ─ We need to input a value k’ which is a estimated number of clusters ─ If k’ can’t be determined, we set k’ = |S| 12

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  More than 1 knee point of the function P(k)  More than 1 peak of the function C(k) 15

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Performance analysis ─ Soybean dada (4 diseases) ─ Lung cancer data (3 classes) ─ Zoo data (7 classes which has 3 big clusters and 4 small clusters) ─ Mushroom data (2 classes)  Scalability analysis 16

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Performance analysis 17

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 18

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Scalability analysis ─ 67557 data points and 42 categorical attribute 19

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions  The proposed method is effective and efficient for obtaining the good initial cluster centers and the number of clusters  The time complexity has been analyzed in linear time 20

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 21 Comments  Advantages ─ Improve the old method about setting the two parameters  Applications ─ Data clustering

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.

Similar presentations

Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.

Similar presentations

Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial."— Presentation transcript:

Similar presentations

About project

Feedback