Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor ： Dr. Hsu Graduate ： You-Cheng Chen Author ： Jeremy.

Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor ： Dr. Hsu Graduate ： You-Cheng Chen Author ： Jeremy Tantrum Alejandro Murua Werner Stuetzle

Motivation Objective Introduction Model-based Fractionation Model-based ReFractionation Example Conclusions Personal Opinion Outline

Motivation Propose a extended method to improve performance of model-based clustering method and apply it to large datasets.

Objective Apply Fractionation and Refractionation to model-based clustering.

Introduction Model-based clustering in a nutshell Sample: is the density modeling group g is the prior probability that a randomly chosen observation belongs to group g

Introduction Model-based clustering in a nutshell We can use Approximate Weight of Evidence to estimate the number of groups. where

Introduction Previous work on model-based clustering for large datasets Scalable EM(SEM) algorithm can be used to finding fitting mixture models to large datasets but it can ’ t estimate the number of groups. The simplest and potentially fastest is to draw a sample of the data.

Original Fractionation algorithm 2. Fractionation 1Split data into fractions of size M 2Cluster each fraction into a fixed number  M where a < 1. Summarize each cluster by its mean We refer to these cluster means as meat-observations. 3If the total number of meta-observations is greater that M return to setp1 4Cluster the meta-observations into G clusters. 5Assign each individual observation to the cluster with the closet mean.

 In model-based Fractionation, we use all sufficient the mean,the covariance,and the number of observations to present cluster.  Using AWE to determine the number of clusters in Step 4 2-1. Model-based Fractionation Main difference:

3. Model-based ReFractionation Step 4 of Fractionation algorithm is replaced by 4a,4b 4a Clustering the meta-observations into G clusters, where G is determined by AWE criterion 4b Define the fractions for the i-th pass.

3.1 Illustration M=100 fraction=4 meta-observation=40

3.1 Illustration Step 4a Use AWE find G=25 Step 4b

3.1 Illustration Second pass

3.1 Illustration 2th pass3th pass

3.2 Scope of (Re)Fractionation Let n g be the number of groups in the data n f be the number of fractions n c be the number of clusters generated from each fraction Step2 If n g > n c will bead to impure clusters.

4. Example 4.1 Measuring the agreement between groups and clusters Fowlkes-Mallows index=

4.3 Example 1 Group = 19 n=22000 M=1000 clusters=100

4.3 Example 3 Group=361 n=20900 M=1045 cluster=100

Conclusions We can study the performance of the AWE criterion for estimating the number of groups in a mixture of factor analyzers model.

Personal Opinion We can apply advantage of another clustering method to improve ours defect.

Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor ： Dr. Hsu Graduate ： You-Cheng Chen Author ： Jeremy.

Similar presentations

Presentation on theme: "Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor ： Dr. Hsu Graduate ： You-Cheng Chen Author ： Jeremy."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor ： Dr. Hsu Graduate ： You-Cheng Chen Author ： Jeremy.

Similar presentations

Presentation on theme: "Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor ： Dr. Hsu Graduate ： You-Cheng Chen Author ： Jeremy."— Presentation transcript:

Similar presentations

About project

Feedback