Unsupervised learning introduction

Slides:



Advertisements
Similar presentations
K-Means Clustering Algorithm Mining Lab
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
PARTITIONAL CLUSTERING
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Unsupervised Learning and Data Mining
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
What is Cluster Analysis?
What is Cluster Analysis?
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Data Mining Strategies. Scales of Measurement  Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103,  Four Scales  Categorical.
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
Data mining and machine learning A brief introduction.
DATA MINING CLUSTERING K-Means.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Clustering.
Prepared by: Mahmoud Rafeek Al-Farra
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Clustering Unsupervised learning introduction Machine Learning.
Introduction Welcome Machine Learning.
Machine Learning Queens College Lecture 7: Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
Data Mining and Text Mining. The Standard Data Mining process.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Data Mining K-means Algorithm
K-means and Hierarchical Clustering
John Nicholas Owen Sarah Smith
Revision (Part II) Ke Chen
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Revision (Part II) Ke Chen
CSE572, CBS572: Data Mining by H. Liu
Machine Learning Algorithms – An Overview
Clustering The process of grouping samples so that the samples are similar within each group.
Unsupervised Learning: Clustering
CSE572: Data Mining by H. Liu
Presentation transcript:

Unsupervised learning introduction Clustering Unsupervised learning introduction Machine Learning

Supervised learning Training set:

Unsupervised learning Training set:

Applications of clustering Market segmentation Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Swap: market seg and organize clusters Organize computing clusters Astronomical data analysis

Clustering vARIANT

Clustering Category Based on the Clustering Algorithms, clustering are categorized into Four Major Category: Partitional (Centroid Based) Try to cluster data into k number of cluster. Example: K-Means, K-Means++, Fuzzy C-Means. Hierarchical Agglomerative Start with all data as an individual cluster Divisive Start with the entire data as a single cluster.

Distribution Based The clustering model most closely related to statistics is based on distribution models. Example: EM-clustering Unpopular because tend to overfitting Density Based In density-based clustering, clusters are defined as areas of higher density than the remainder of the data set.

Based on the data Clustering are categorized into: Numerical data clustering Categorical data clustering

Clustering K-means algorithm

Get rid of the legacy points

Get rid of the legacy points

K-means algorithm Input: (number of clusters) Training set (drop convention)

Randomly initialize cluster centroids Repeat { for = 1 to K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Replace with normal text, size with LATEX fonts

K-means for non-separated clusters T-shirt sizing Weight Height

Optimization objective Clustering Optimization objective Machine Learning

K-means optimization objective = index of cluster (1,2,…, ) to which example is currently assigned = cluster centroid ( ) = cluster centroid of cluster to which example has been assigned Optimization objective: Change numbers to LATEX as well

:= index (from 1 to ) of cluster centroid closest to for = 1 to K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Replace as previous; change spacing to fill page

Random initialization Clustering Random initialization Machine Learning

:= index (from 1 to ) of cluster centroid closest to for = 1 to K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Replace as previous; change spacing to fill page

Random initialization Should have Randomly pick training examples. Set equal to these LATEX font

Local optima

Random initialization For i = 1 to 100 { Randomly initialize K-means. Run K-means. Get . Compute cost function (distortion) } Pick clustering that gave lowest cost

Choosing the number of clusters Clustering Choosing the number of clusters Machine Learning

What is the right value of K?

Choosing the value of K Elbow method: Cost function Cost function (no. of clusters) (no. of clusters)

Choosing the value of K Sometimes, you’re running K-means to get clusters to use for some later/downstream purpose. Evaluate K-means based on a metric for how well it performs for that later purpose. E.g. T-shirt sizing T-shirt sizing Weight Weight Height Height