Download presentation
Presentation is loading. Please wait.
Published byKerrie Walker Modified over 9 years ago
1
Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Jeremy Tantrum Alejandro Murua Werner Stuetzle
2
Motivation Objective Introduction Model-based Fractionation Model-based ReFractionation Example Conclusions Personal Opinion Outline
3
Motivation Propose a extended method to improve performance of model-based clustering method and apply it to large datasets.
4
Objective Apply Fractionation and Refractionation to model-based clustering.
5
Introduction Model-based clustering in a nutshell Sample: is the density modeling group g is the prior probability that a randomly chosen observation belongs to group g
6
Introduction Model-based clustering in a nutshell We can use Approximate Weight of Evidence to estimate the number of groups. where
7
Introduction Previous work on model-based clustering for large datasets Scalable EM(SEM) algorithm can be used to finding fitting mixture models to large datasets but it can ’ t estimate the number of groups. The simplest and potentially fastest is to draw a sample of the data.
8
Original Fractionation algorithm 2. Fractionation 1Split data into fractions of size M 2Cluster each fraction into a fixed number M where a < 1. Summarize each cluster by its mean We refer to these cluster means as meat-observations. 3If the total number of meta-observations is greater that M return to setp1 4Cluster the meta-observations into G clusters. 5Assign each individual observation to the cluster with the closet mean.
9
In model-based Fractionation, we use all sufficient the mean,the covariance,and the number of observations to present cluster. Using AWE to determine the number of clusters in Step 4 2-1. Model-based Fractionation Main difference:
10
3. Model-based ReFractionation Step 4 of Fractionation algorithm is replaced by 4a,4b 4a Clustering the meta-observations into G clusters, where G is determined by AWE criterion 4b Define the fractions for the i-th pass.
11
3.1 Illustration M=100 fraction=4 meta-observation=40
12
3.1 Illustration Step 4a Use AWE find G=25 Step 4b
13
3.1 Illustration Second pass
14
3.1 Illustration 2th pass3th pass
15
3.2 Scope of (Re)Fractionation Let n g be the number of groups in the data n f be the number of fractions n c be the number of clusters generated from each fraction Step2 If n g > n c will bead to impure clusters.
16
4. Example 4.1 Measuring the agreement between groups and clusters Fowlkes-Mallows index=
17
4.3 Example 1 Group = 19 n=22000 M=1000 clusters=100
18
4.3 Example 3 Group=361 n=20900 M=1045 cluster=100
19
Conclusions We can study the performance of the AWE criterion for estimating the number of groups in a mixture of factor analyzers model.
20
Personal Opinion We can apply advantage of another clustering method to improve ours defect.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.