DataMining, Morgan Kaufmann, p218-227 Mining Lab. 김완섭 2004년 10월 27일 EM Algorithm: Expectation Maximazation Clustering Algorithm book: “DataMining, Morgan Kaufmann, Frank” DataMining, Morgan Kaufmann, p218-227 Mining Lab. 김완섭 2004년 10월 27일
Content Clustering K-Means via EM Mixture Model EM Algorithm Simple examples of EM EM Application; WEKA References
Clustering (1/2) Clustering ? Clustering vs. Classification Clustering algorithms divide a data set into natural groups (clusters). Instances in the same cluster are similar to each other, they share certain properties. e.g Customer Segmentation. Clustering vs. Classification Supervised Learning Unsupervised Learning Not target variable to be predicted.
Clustering (2/2) Categorization of Clustering Methods Partitioning mehtods K-Means / K-medoids / PAM / CRARA / CRARANS Hierachical methods CURE / CHAMELON / BIRCH Density-based methods DBSCAN / OPTICS Grid-based methods STING / CLIQUE / Wave-Cluster Model-based methods EM / COBWEB / Bayesian / Neural Model-Based Clustering Probability-based Clustering Statistical Clustering
K-Means (1) Algorithm Step 0 : Step 1 : (Assignment) Select K objects as initial centroids. Step 1 : (Assignment) For each object compute distances to k centroids. Assign each object to the cluster to which it is the closest. Step 2 : (New Centroids) Compute a new centroid for each cluster. Step 3: (Converage) Stop if the change in the centroids is less than the selected covergence criterion. Otherwise repeat Step 1.
K-Means (2) simple example Input Data Random Centroids New Centroids & (Check) Assignment New Centroids & (check) Assignment Centroids & (check) Assignment
K-Means (3) weakness on outlier (noise)
K-Means (4) Calculation 0. (4,4), (3,4) 1. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) (4,2), (0,2), (1,1), (1,0) (100, 0) 1. 1) <3.5, 4> <21, 1> 1. 1) <3.5, 4> <1.5, 1.25> 2) <3.5, 4> - (0,2), (1,1), (1,0),(3,4),(4,4),(4,2) <21, 1> - (100,1) 2) <3.5, 4> - (3, 4), (4, 4), (4, 2) <1.5, 1.25> - (0, 2) (1, 1), (1, 0) 2. 1) <2.1, 2.1> <100, 0> 2. 2) <3.67, 3.3> <0.67, 1> 2) <2.1, 2.1> - (0, 2),(1,1),(1,0),(3,4),(4,4),(4,2) <100, 1> - (100, 1) 3) <3.67, 3.3> - (3, 4), (4, 4), (4, 2) <0.67, 1> - (0, 2) (1, 1), (1, 0)
K-Means (5) comparison with EM Hard Clustering. A instance belong to only one Cluster. Based on Euclidean distance. Not Robust on outlier, value range. EM Soft Clustering. A instance belong to several clusters with membership probability. Based on density probability. Can handle both numeric and nominal attributes. I C2 C1 0.7 0.3 I C2
Mixture Model (1) A Mixture is a set of k probability distributions, representing k clusters. A probability distribution have mean and variances. The mixture model combines several normal distributions.
Mixture Model (2) Only one numeric attribute five parameter
Mixture Model (3) Simple Example Probability that an instance x belongs to cluster A Probability Density Function
Mixture Model (4) Probability Density Function Normal Distribution Gaussian Density Function Poisson Distribution
Mixture Model (5) Probability Density Function Iteration Iteration
EM Algorithm (1) Step 1. (Initialization) Step 2. (Maximization Step) Random probability Step 2. (Maximization Step) Re-create cluster model Re-compute the parameter Θ(mean, variance) normal distribution. Step 3. (Expectation Step) Update record’s weight Step 4. Calculate log-likelihood If the value saturates, exit If not, Go to Step 2. Parameter Adjustment Weight Adjustment
EM Algorithm (2) Initialization Random Probability M-Step Example Num Math English 1 80 90 2 50 75 3 85 100 4 30 70 5 95 6 60 Cluster1 Cluster2 0.25 0.75 0.8 0.2 0.43 0.57 0.7 0.3 0.15 0.85 0.6 0.40 2.93 3.07
EM Algorithm (3) M-Step : Parameter (Mean, Dev) Estimating parameters from weighted instances Parameters means, deviations.
EM Algorithm (3) M-Step : Parameter (Mean, Dev) Num Math English 1 80 90 2 50 75 3 85 100 4 30 70 5 95 6 60 Cluster-A Cluster-B 0.25 0.75 0.8 0.2 0.43 0.57 0.7 0.3 0.15 0.85 0.6 0.40 2.93 3.07
EM Algorithm (4) E-Step : Weight compute weight here
EM Algorithm (5) E-Step : Weight Num Math English 1 80 90 compute weight here
EM Algorithm (6) Objective Function (check) Log-likelihood Function For all instances, it’s probability belong to cluster A, Use log for analysis 1-Dimensional data 2-Cluster A,B N-Dimensional data K-cluster - Mean vector - Covariance matrix
EM Algorithm (7) Objective Function (check) - Covariance Matrix - Mean Vector
EM Algorithm (8) Termination Procedure stops when log-likelihood saturates. Q4 Q3 Q2 Q1 Q0 # of Iteration
EM Algorithm (1) Simple Data EM example 6 data (3 sample per 1 class) 2 class (circle, rectangle)
EM Algorithm (2) Likelihood function of two component means Θ1, Θ2
EM Algorithm (3)
EM Example (1) Example dataset 2 Column(Math, English), 6 record Num 80 90 2 60 75 3 4 30 5 100 6 15
EM Example (2) Distri. Of Math Distri. Of Eng mean : 56.67 variance : 776.73 Distri. Of Eng mean : 82.5 variance : 197.50 100 50 50 100
EM Example (3) Random Cluster Weight 2.93 3.07 Num Math English 1 80 90 2 50 75 3 85 100 4 30 70 5 95 6 60 Cluster1 Cluster2 0.25 0.75 0.8 0.2 0.43 0.57 0.7 0.3 0.15 0.85 0.6 0.40 2.93 3.07
(parameter adjustment) EM Example (4) Iteration 1 Maximization Step (parameter adjustment)
EM Example (4)
(parameter adjustment) EM Example (5) Iteration 2 Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)
(parameter adjustment) EM Example (6) Iteration 3 Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)
(parameter adjustment) EM Example (6) Iteration 3 Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)
EM Application (1) Weka Weka Experiment Data Waikato University in Newzealand Open Source Mining Tool http://www.cs.waikato.ac.nz/ml/weka Experiment Data Iris data Real Data Department Customer Data Modified Customer Data
EM Application (2) IRIS Data Data Info Attribute Information: sepal length in cm / sepal width / petal length / petal width in cm class : Iris Setosa / Iris Versicolour / Iris Virginica
EM Application (3) IRIS Data
EM Application (4) Weka Usage Weka Clustering Packages Command line Execution GUI Execution Weka.clusterers Java weka.clusterers.EM –t iris.arff –N 2 Java weka.clusterers.EM –t iris.arff –N 2 -V Java –jar weka.jar
EM Application (4) Weka Usage Options for clustering in weka -t <training file> Specify training file -T <test file> Specify test file -x <number of folds> Specify number of folds for cross-validation -s <random number seed> Specify random number seed -l <input file> Specify input file for model -d <output file> Specify ouput file for model -p Only output prediction for test instances
EM Application (5) Weka usage
EM Application (5) Weka usage – input file format % Summary Statistics: % Min Max Mean SD Class Correlation % sepal length: 4.3 7.9 5.84 0.83 0.7826 % sepal width: 2.0 4.4 3.05 0.43 -0.4194 % petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) % petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa
EM Application (6) Weka usage – output format Number of clusters: 3 Cluster: 0 Prior probability: 0.3333 Attribute: sepallength Normal Distribution. Mean = 5.006 StdDev = 0.3489 Attribute: sepalwidth Normal Distribution. Mean = 3.418 StdDev = 0.3772 Attribute: petallength Normal Distribution. Mean = 1.464 StdDev = 0.1718 Attribute: petalwidth Normal Distribution. Mean = 0.244 StdDev = 0.1061 Attribute: class Discrete Estimator. Counts = 51 1 1 (Total = 53) 0 50 ( 33%) 1 48 ( 32%) 2 52 ( 35%) Log likelihood: -2.21138
EM Application (6) Result Visualization
References DataMining DataMining, Concepts and Techiques. Morgan Cauffmann. IAN H. p218-p255. DataMining, Concepts and Techiques. Jiawei Han. Chapter 8. The Expectation Maximization Algorithm Frank Dellaert, Febrary 2002. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models.