K-Means Clustering Algorithm Mining Lab. 2004 10 27.

Slides:



Advertisements
Similar presentations
Copyright Jiawei Han, modified by Charles Ling for CS411a
Advertisements

What is Cluster Analysis?
Clustering AMCS/CS 340: Data Mining Xiangliang Zhang
Clustering.
Hierarchical Clustering, DBSCAN The EM Algorithm
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
CS690L: Clustering References:
Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
Qiang Yang Adapted from Tan et al. and Han et al.
Data Mining Techniques: Clustering
1 Clustering Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: J.W. Han, I. Witten, E. Frank.
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
Cluster Analysis.
4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)
Instructor: Qiang Yang
Unsupervised Learning and Data Mining
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
CLUSTERING (Segmentation)
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
Data Mining Strategies. Scales of Measurement  Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103,  Four Scales  Categorical.
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
DATA MINING CLUSTERING K-Means.
9/03Data Mining – Clustering G Dong (WSU) 1 4. Clustering Methods Concepts Partitional (k-Means, k-Medoids) Hierarchical (Agglomerative & Divisive, COBWEB)
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Clustering Anna Reithmeir Data Mining Proseminar 2017
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 10 —
DATA MINING Spatial Clustering
Clustering CSC 600: Data Mining Class 21.
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Data Mining -Cluster Analysis. What is a clustering ? Clustering is the process of grouping data into classes, or clusters, so that objects within a cluster.
CSE572, CBS598: Data Mining by H. Liu
Roberto Battiti, Mauro Brunato
Clustering.
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
DATA MINING Introductory and Advanced Topics Part II - Clustering
CSE572, CBS572: Data Mining by H. Liu
CSE572, CBS572: Data Mining by H. Liu
Clustering Wei Wang.
Text Categorization Berlin Chen 2003 Reference:
Junheng, Shengming, Yunsheng 11/09/2018
CSE572: Data Mining by H. Liu
Presentation transcript:

K-Means Clustering Algorithm Mining Lab

Content Clustering K-Means via EM

Clustering (1/2) Clustering ? Clustering algorithms divide a data set into natural groups (clusters). Instances in the same cluster are similar to each other, they share certain properties. e.g Customer Segmentation. Clustering vs. Classification Supervised Learning Unsupervised Learning Not target variable to be predicted.

Clustering (2/2) Categorization of Clustering Methods Partitioning mehtods K-Means / K-medoids / PAM / CRARA / CRARANS Hierachical methods CURE / CHAMELON / BIRCH Density-based methods DBSCAN / OPTICS Grid-based methods STING / CLIQUE / Wave-Cluster Model-based methods EM / COBWEB / Bayesian / Neural Model-Based Clustering Statistical Clustering Probability-based Clustering

K-Means (1) Algorithm Step 0 : Select K objects as initial centroids. Step 1 : (Assignment) For each object compute distances to k centroids. Assign each object to the cluster to which it is the closest. Step 2 : (New Centroids) Compute a new centroid for each cluster. Step 3: (Converage) Stop if the change in the centroids is less than the selected covergence criterion. Otherwise repeat Step 1.

K-Means (2) simple example Random Centroids Assignment New Centroids & (Check) Assignment New Centroids & (check) AssignmentCentroids & (check) Input Data

K-Means (3) weakness on outlier (noise)

K-Means (4) Calculation 0. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) 1. 1) 2) - (3, 4), (4, 4), (4, 2) - (0, 2) (1, 1), (1, 0) 2. 2) 3) - (3, 4), (4, 4), (4, 2) - (0, 2) (1, 1), (1, 0) 1. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) (100, 0) 1. 1) 2) - (0,2), (1,1), (1,0),(3,4),(4,4),(4,2) - (100,1) 2. 1) 2) - (0, 2),(1,1),(1,0),(3,4),(4,4),(4,2) - (100, 1)

K-Means (5) comparison with EM K-Means Hard Clustering. A instance belong to only one Cluster. Based on Euclidean distance. Not Robust on outlier, value range. EM Soft Clustering. A instance belong to several clusters with membership probability. Based on density probability. Can handle both numeric and nominal attributes. I C1 C2 I C1 C