UNSUPERVISED LEARNING David Kauchak CS 451 – Fall 2013.

Slides:



Advertisements
Similar presentations
Text Clustering David Kauchak cs160 Fall 2009 adapted from:
Advertisements

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Regularization David Kauchak CS 451 – Fall 2013.
Clustering Beyond K-means
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Unsupervised learning
1 Machine Learning: Symbol-based 10d More clustering examples10.5Knowledge and Learning 10.6Unsupervised Learning 10.7Reinforcement Learning 10.8Epilogue.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
CS728 Web Clustering II Lecture 14. K-Means Assumes documents are real-valued vectors. Clusters based on centroids (aka the center of gravity or mean)
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering 1.
Lecture 13: Clustering (continued) May 12, 2010
CS276A Text Retrieval and Mining Lecture 13 [Borrows slides from Ray Mooney and Soumen Chakrabarti]
Cluster Analysis (1).
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
What is Cluster Analysis?
What is Cluster Analysis?
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Unsupervised Learning: Clustering 1 Lecture 16: Clustering Web Search and Mining.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering Unsupervised learning Generating “classes”
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Data mining and machine learning A brief introduction.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
Machine learning: Unsupervised learning David Kauchak cs311 Spring 2013 adapted from:
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Clustering 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 8: Clustering Mining Massive Datasets.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Text Clustering.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
tch?v=Y6ljFaKRTrI Fireflies.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
ITCS 6265 Information Retrieval & Web Mining Lecture 15 Clustering.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Introduction to Information Retrieval Introduction to Information Retrieval Modified from Stanford CS276 slides Chap. 16: Clustering.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Machine Learning Queens College Lecture 7: Clustering.
Clustering (Modified from Stanford CS276 Slides - Lecture 17 Clustering)
Today’s Topic: Clustering  Document clustering Motivations Document representations Success criteria  Clustering algorithms Partitional Hierarchical.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Flat clustering approaches
Lecture 12: Clustering May 5, Clustering (Ch 16 and 17)  Document clustering  Motivations  Document representations  Success criteria  Clustering.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Introduction to Information Retrieval Introduction to Information Retrieval Clustering Chris Manning, Pandu Nayak, and Prabhakar Raghavan.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Data Mining K-means Algorithm
Clustering (3) Center-based algorithms Fuzzy k-means
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Statistical Models and Machine Learning Algorithms --Review
Presentation transcript:

UNSUPERVISED LEARNING David Kauchak CS 451 – Fall 2013

Administrative Final project Schedule for the rest of the semester

Unsupervised learning Unsupervised learning: given data, i.e. examples, but no labels

K-means Start with some initial cluster centers Iterate:  Assign/cluster each example to closest center  Recalculate centers as the mean of the points in a cluster

K-means: an example

K-means: Initialize centers randomly

K-means: assign points to nearest center

K-means: readjust centers

K-means: assign points to nearest center

K-means: readjust centers

K-means: assign points to nearest center

K-means: readjust centers

K-means: assign points to nearest center No changes: Done

K-means variations/parameters Initial (seed) cluster centers Convergence  A fixed number of iterations  partitions unchanged  Cluster centers don’t change K!

How Many Clusters? Number of clusters K must be provided How should we determine the number of clusters? How did we deal with models becoming too complicated previously? too many too few

Many approaches Regularization!!! Statistical test

k-means loss revisited K-means is trying to minimize: What happens when k increases?

k-means loss revisited K-means is trying to minimize: Loss goes down! Making the model more complicated allows us more flexibility, but can “overfit” to the data

k-means loss revisited K-means is trying to minimize: (where N = number of points) What effect will this have? Which will tend to produce smaller k? 2 regularization options

k-means loss revisited (where N = number of points) 2 regularization options AIC penalizes increases in K more harshly Both require a change to the K-means algorithm Tend to work reasonably well in practice if you don’t know K

Statistical approach Assume data is Gaussian (i.e. spherical) Test for this  Testing in high dimensions doesn’t work well  Testing in lower dimensions does work well ideas?

Project to one dimension and check For each cluster, project down to one dimension  Use a statistical test to see if the data is Gaussian

Project to one dimension and check For each cluster, project down to one dimension  Use a statistical test to see if the data is Gaussian What will this look like projected to 1-D?

Project to one dimension and check For each cluster, project down to one dimension  Use a statistical test to see if the data is Gaussian

Project to one dimension and check For each cluster, project down to one dimension  Use a statistical test to see if the data is Gaussian What will this look like projected to 1-D?

Project to one dimension and check For each cluster, project down to one dimension  Use a statistical test to see if the data is Gaussian

Project to one dimension and check For each cluster, project down to one dimension  Use a statistical test to see if the data is Gaussian What will this look like projected to 1-D?

Project to one dimension and check For each cluster, project down to one dimension  Use a statistical test to see if the data is Gaussian Solution?

Project to one dimension and check For each cluster, project down to one dimension  Use a statistical test to see if the data is Gaussian Chose the dimension of the projection as the dimension with highest variance

On synthetic data Split too far

Compared to other approaches

K-Means time complexity Variables: K clusters, n data points, m features/dimensions, I iterations What is the runtime complexity?  Computing distance between two points (e.g. euclidean)  Reassigning clusters  Computing new centers  Iterate…

K-Means time complexity Variables: K clusters, n data points, m features/dimensions, I iterations What is the runtime complexity?  Computing distance between two points is O(m) where m is the dimensionality of the vectors/number of features.  Reassigning clusters: O(Kn) distance computations, or O(Knm)  Computing centroids: Each points gets added once to some centroid: O(nm)  Assume these two steps are each done once for I iterations: O(Iknm) In practice, K-means converges quickly and is fairly fast

What Is A Good Clustering? Internal criterion: A good clustering will produce high quality clusters in which:  the intra-class (that is, intra-cluster) similarity is high  the inter-class similarity is low How would you evaluate clustering?

Common approach: use labeled data Use data with known classes  For example, document classification data datalabel If we clustered this data (ignoring labels) what would we like to see? Reproduces class partitions How can we quantify this?

Common approach: use labeled data Purity, the proportion of the dominant class in the cluster      Cluster ICluster IICluster III Cluster I: Purity = 1/4 (max(3, 1, 0)) = 3/4 Cluster II: Purity = 1/6 (max(1, 4, 1)) = 4/6 Cluster III: Purity = 1/5 (max(2, 0, 3)) = 3/5 Overall purity?

Overall purity Cluster average: Weighted average: Cluster I: Purity = 1/4 (max(3, 1, 0)) = 3/4 Cluster II: Purity = 1/6 (max(1, 4, 1)) = 4/6 Cluster III: Purity = 1/5 (max(2, 0, 3)) = 3/5

Purity issues… Purity, the proportion of the dominant class in the cluster Good for comparing two algorithms, but not understanding how well a single algorithm is doing, why?  Increasing the number of clusters increases purity

Purity isn’t perfect Which is better based on purity? Which do you think is better? Ideas?

Common approach: use labeled data Average entropy of classes in clusters where p(class i ) is proportion of class i in cluster

Common approach: use labeled data Average entropy of classes in clusters entropy?

Common approach: use labeled data Average entropy of classes in clusters