Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Jeremy.

Similar presentations


Presentation on theme: "Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Jeremy."— Presentation transcript:

1 Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Jeremy Tantrum Alejandro Murua Werner Stuetzle

2 Motivation Objective Introduction Model-based Fractionation Model-based ReFractionation Example Conclusions Personal Opinion Outline

3 Motivation Propose a extended method to improve performance of model-based clustering method and apply it to large datasets.

4 Objective Apply Fractionation and Refractionation to model-based clustering.

5 Introduction Model-based clustering in a nutshell Sample: is the density modeling group g is the prior probability that a randomly chosen observation belongs to group g

6 Introduction Model-based clustering in a nutshell We can use Approximate Weight of Evidence to estimate the number of groups. where

7 Introduction Previous work on model-based clustering for large datasets Scalable EM(SEM) algorithm can be used to finding fitting mixture models to large datasets but it can ’ t estimate the number of groups. The simplest and potentially fastest is to draw a sample of the data.

8 Original Fractionation algorithm 2. Fractionation 1Split data into fractions of size M 2Cluster each fraction into a fixed number  M where a < 1. Summarize each cluster by its mean We refer to these cluster means as meat-observations. 3If the total number of meta-observations is greater that M return to setp1 4Cluster the meta-observations into G clusters. 5Assign each individual observation to the cluster with the closet mean.

9  In model-based Fractionation, we use all sufficient the mean,the covariance,and the number of observations to present cluster.  Using AWE to determine the number of clusters in Step 4 2-1. Model-based Fractionation Main difference:

10 3. Model-based ReFractionation Step 4 of Fractionation algorithm is replaced by 4a,4b 4a Clustering the meta-observations into G clusters, where G is determined by AWE criterion 4b Define the fractions for the i-th pass.

11 3.1 Illustration M=100 fraction=4 meta-observation=40

12 3.1 Illustration Step 4a Use AWE find G=25 Step 4b

13 3.1 Illustration Second pass

14 3.1 Illustration 2th pass3th pass

15 3.2 Scope of (Re)Fractionation Let n g be the number of groups in the data n f be the number of fractions n c be the number of clusters generated from each fraction Step2 If n g > n c will bead to impure clusters.

16 4. Example 4.1 Measuring the agreement between groups and clusters Fowlkes-Mallows index=

17 4.3 Example 1 Group = 19 n=22000 M=1000 clusters=100

18 4.3 Example 3 Group=361 n=20900 M=1045 cluster=100

19 Conclusions We can study the performance of the AWE criterion for estimating the number of groups in a mixture of factor analyzers model.

20 Personal Opinion We can apply advantage of another clustering method to improve ours defect.


Download ppt "Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Jeremy."

Similar presentations


Ads by Google