DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일

Slides:



Advertisements
Similar presentations
K-Means Clustering Algorithm Mining Lab
Advertisements

Hierarchical Clustering, DBSCAN The EM Algorithm
Clustering Basic Concepts and Algorithms
CS690L: Clustering References:
Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
Qiang Yang Adapted from Tan et al. and Han et al.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Data Mining Techniques: Clustering
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Instructor: Qiang Yang
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
9/03Data Mining – Clustering G Dong (WSU) 1 4. Clustering Methods Concepts Partitional (k-Means, k-Medoids) Hierarchical (Agglomerative & Divisive, COBWEB)
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Clustering.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Flat clustering approaches
Machine Learning (ML) with Weka Weka can classify data or approximate functions: choice of many algorithms.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer Science Chonnam National University.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Gilad Lerman Math Department, UMN
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Clustering CSC 600: Data Mining Class 21.
Classification of unlabeled data:
Discriminant Analysis
Clustering (3) Center-based algorithms Fuzzy k-means
Waikato Environment for Knowledge Analysis
Clustering Evaluation The EM Algorithm
CSE572, CBS598: Data Mining by H. Liu
Probabilistic Models with Latent Variables
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
CSE572, CBS572: Data Mining by H. Liu
CSE572, CBS572: Data Mining by H. Liu
INTRODUCTION TO Machine Learning
Clustering Wei Wang.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Text Categorization Berlin Chen 2003 Reference:
Clustering Techniques
Junheng, Shengming, Yunsheng 11/09/2018
CSE572: Data Mining by H. Liu
Radial Basis Functions: Alternative to Back Propagation
EM Algorithm and its Applications
Presentation transcript:

DataMining, Morgan Kaufmann, p218-227 Mining Lab. 김완섭 2004년 10월 27일 EM Algorithm: Expectation Maximazation Clustering Algorithm book: “DataMining, Morgan Kaufmann, Frank” DataMining, Morgan Kaufmann, p218-227 Mining Lab. 김완섭 2004년 10월 27일

Content Clustering K-Means via EM Mixture Model EM Algorithm Simple examples of EM EM Application; WEKA References

Clustering (1/2) Clustering ? Clustering vs. Classification Clustering algorithms divide a data set into natural groups (clusters). Instances in the same cluster are similar to each other, they share certain properties. e.g Customer Segmentation. Clustering vs. Classification Supervised Learning Unsupervised Learning Not target variable to be predicted.

Clustering (2/2) Categorization of Clustering Methods Partitioning mehtods K-Means / K-medoids / PAM / CRARA / CRARANS Hierachical methods CURE / CHAMELON / BIRCH Density-based methods DBSCAN / OPTICS Grid-based methods STING / CLIQUE / Wave-Cluster Model-based methods EM / COBWEB / Bayesian / Neural Model-Based Clustering Probability-based Clustering Statistical Clustering

K-Means (1) Algorithm Step 0 : Step 1 : (Assignment) Select K objects as initial centroids. Step 1 : (Assignment) For each object compute distances to k centroids. Assign each object to the cluster to which it is the closest. Step 2 : (New Centroids) Compute a new centroid for each cluster. Step 3: (Converage) Stop if the change in the centroids is less than the selected covergence criterion. Otherwise repeat Step 1.

K-Means (2) simple example Input Data Random Centroids New Centroids & (Check) Assignment New Centroids & (check) Assignment Centroids & (check) Assignment

K-Means (3) weakness on outlier (noise)

K-Means (4) Calculation 0. (4,4), (3,4) 1. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) (4,2), (0,2), (1,1), (1,0) (100, 0) 1. 1) <3.5, 4> <21, 1> 1. 1) <3.5, 4> <1.5, 1.25> 2) <3.5, 4> - (0,2), (1,1), (1,0),(3,4),(4,4),(4,2) <21, 1> - (100,1) 2) <3.5, 4> - (3, 4), (4, 4), (4, 2) <1.5, 1.25> - (0, 2) (1, 1), (1, 0) 2. 1) <2.1, 2.1> <100, 0> 2. 2) <3.67, 3.3> <0.67, 1> 2) <2.1, 2.1> - (0, 2),(1,1),(1,0),(3,4),(4,4),(4,2) <100, 1> - (100, 1) 3) <3.67, 3.3> - (3, 4), (4, 4), (4, 2) <0.67, 1> - (0, 2) (1, 1), (1, 0)

K-Means (5) comparison with EM Hard Clustering. A instance belong to only one Cluster. Based on Euclidean distance. Not Robust on outlier, value range. EM Soft Clustering. A instance belong to several clusters with membership probability. Based on density probability. Can handle both numeric and nominal attributes. I C2 C1 0.7 0.3 I C2

Mixture Model (1) A Mixture is a set of k probability distributions, representing k clusters. A probability distribution have mean and variances. The mixture model combines several normal distributions.

Mixture Model (2) Only one numeric attribute five parameter

Mixture Model (3) Simple Example Probability that an instance x belongs to cluster A Probability Density Function

Mixture Model (4) Probability Density Function Normal Distribution Gaussian Density Function Poisson Distribution

Mixture Model (5) Probability Density Function Iteration Iteration

EM Algorithm (1) Step 1. (Initialization) Step 2. (Maximization Step) Random probability Step 2. (Maximization Step) Re-create cluster model Re-compute the parameter Θ(mean, variance) normal distribution. Step 3. (Expectation Step) Update record’s weight Step 4. Calculate log-likelihood If the value saturates, exit If not, Go to Step 2. Parameter Adjustment Weight Adjustment

EM Algorithm (2) Initialization Random Probability M-Step Example Num Math English 1 80 90 2 50 75 3 85 100 4 30 70 5 95 6 60 Cluster1 Cluster2 0.25 0.75 0.8 0.2 0.43 0.57 0.7 0.3 0.15 0.85 0.6 0.40 2.93 3.07

EM Algorithm (3) M-Step : Parameter (Mean, Dev) Estimating parameters from weighted instances Parameters means, deviations.

EM Algorithm (3) M-Step : Parameter (Mean, Dev) Num Math English 1 80 90 2 50 75 3 85 100 4 30 70 5 95 6 60 Cluster-A Cluster-B 0.25 0.75 0.8 0.2 0.43 0.57 0.7 0.3 0.15 0.85 0.6 0.40 2.93 3.07

EM Algorithm (4) E-Step : Weight compute weight here

EM Algorithm (5) E-Step : Weight Num Math English 1 80 90 compute weight here

EM Algorithm (6) Objective Function (check) Log-likelihood Function For all instances, it’s probability belong to cluster A, Use log for analysis 1-Dimensional data 2-Cluster A,B N-Dimensional data K-cluster - Mean vector - Covariance matrix

EM Algorithm (7) Objective Function (check) - Covariance Matrix - Mean Vector

EM Algorithm (8) Termination Procedure stops when log-likelihood saturates. Q4 Q3 Q2 Q1 Q0 # of Iteration

EM Algorithm (1) Simple Data EM example 6 data (3 sample per 1 class) 2 class (circle, rectangle)

EM Algorithm (2) Likelihood function of two component means Θ1, Θ2

EM Algorithm (3)

EM Example (1) Example dataset 2 Column(Math, English), 6 record Num 80 90 2 60 75 3 4 30 5 100 6 15

EM Example (2) Distri. Of Math Distri. Of Eng mean : 56.67 variance : 776.73 Distri. Of Eng mean : 82.5 variance : 197.50 100 50 50 100

EM Example (3) Random Cluster Weight 2.93 3.07 Num Math English 1 80 90 2 50 75 3 85 100 4 30 70 5 95 6 60 Cluster1 Cluster2 0.25 0.75 0.8 0.2 0.43 0.57 0.7 0.3 0.15 0.85 0.6 0.40 2.93 3.07

(parameter adjustment) EM Example (4) Iteration 1 Maximization Step (parameter adjustment)

EM Example (4)

(parameter adjustment) EM Example (5) Iteration 2 Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)

(parameter adjustment) EM Example (6) Iteration 3 Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)

(parameter adjustment) EM Example (6) Iteration 3 Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)

EM Application (1) Weka Weka Experiment Data Waikato University in Newzealand Open Source Mining Tool http://www.cs.waikato.ac.nz/ml/weka Experiment Data Iris data Real Data Department Customer Data Modified Customer Data

EM Application (2) IRIS Data Data Info Attribute Information: sepal length in cm / sepal width / petal length / petal width in cm class : Iris Setosa / Iris Versicolour / Iris Virginica

EM Application (3) IRIS Data

EM Application (4) Weka Usage Weka Clustering Packages Command line Execution GUI Execution Weka.clusterers Java weka.clusterers.EM –t iris.arff –N 2 Java weka.clusterers.EM –t iris.arff –N 2 -V Java –jar weka.jar

EM Application (4) Weka Usage Options for clustering in weka -t <training file> Specify training file -T <test file> Specify test file -x <number of folds> Specify number of folds for cross-validation -s <random number seed> Specify random number seed -l <input file> Specify input file for model -d <output file> Specify ouput file for model -p Only output prediction for test instances

EM Application (5) Weka usage

EM Application (5) Weka usage – input file format % Summary Statistics: % Min Max Mean SD Class Correlation % sepal length: 4.3 7.9 5.84 0.83 0.7826 % sepal width: 2.0 4.4 3.05 0.43 -0.4194 % petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) % petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa

EM Application (6) Weka usage – output format Number of clusters: 3 Cluster: 0 Prior probability: 0.3333 Attribute: sepallength Normal Distribution. Mean = 5.006 StdDev = 0.3489 Attribute: sepalwidth Normal Distribution. Mean = 3.418 StdDev = 0.3772 Attribute: petallength Normal Distribution. Mean = 1.464 StdDev = 0.1718 Attribute: petalwidth Normal Distribution. Mean = 0.244 StdDev = 0.1061 Attribute: class Discrete Estimator. Counts = 51 1 1 (Total = 53) 0 50 ( 33%) 1 48 ( 32%) 2 52 ( 35%) Log likelihood: -2.21138

EM Application (6) Result Visualization

References DataMining DataMining, Concepts and Techiques. Morgan Cauffmann. IAN H. p218-p255. DataMining, Concepts and Techiques. Jiawei Han. Chapter 8. The Expectation Maximization Algorithm Frank Dellaert, Febrary 2002. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models.