Clustering Usman Roshan CS 675.

Slides:



Advertisements
Similar presentations
Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.
Advertisements

Partitional Algorithms to Detect Complex Clusters
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Hierarchical Clustering, DBSCAN The EM Algorithm
Unsupervised Learning
Clustering II.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
PCA, population structure, and K-means clustering BNFO 601.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
Visual Recognition Tutorial
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Segmentation Techniques Luis E. Tirado PhD qualifying exam presentation Northeastern University.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Flat clustering approaches
Dimensionality reduction
Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Clustering Usman Roshan CS 675. Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
Big Data Infrastructure
Spectral Methods for Dimensionality
Clustering Clustering definition: Partition a given set of objects into M groups (clusters) such that the objects of each group are ‘similar’ and ‘different’
Usman Roshan CS 675 Machine Learning
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Machine Learning and Data Mining Clustering
Clustering Usman Roshan.
Dimensionality reduction
Region Segmentation Readings: Chapter 10: 10
Data Mining K-means Algorithm
Classification of unlabeled data:
Dimensionality reduction
Haim Kaplan and Uri Zwick
Basic machine learning background with Python scikit-learn
Clustering (3) Center-based algorithms Fuzzy k-means
CSE 5243 Intro. to Data Mining
Advanced Artificial Intelligence
Problem Definition Input: Output: Requirement:
Gaussian Mixture Models And their training with the EM algorithm
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Segmentation (continued)
3.3 Network-Centric Community Detection
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Machine Learning and Data Mining Clustering
EM Algorithm and its Applications
Introduction to Machine learning
Clustering.
Machine Learning and Data Mining Clustering
Presentation transcript:

Clustering Usman Roshan CS 675

Clustering Suppose we want to cluster n vectors in Rd into two groups. Define C1 and C2 as the two groups. Our objective is to find C1 and C2 that minimize where mi is the mean of class Ci

Clustering NP hard even for 2-means NP hard even on plane K-means heuristic Popular and hard to beat Introduced in 1950s and 1960s

K-means algorithm for two clusters Input: Algorithm: Initialize: assign xi to C1 or C2 with equal probability and compute means: Recompute clusters: assign xi to C1 if ||xi-m1||<||xi-m2||, otherwise assign to C2 Recompute means m1 and m2 Compute objective Compute objective of new clustering. If difference is smaller than then stop, otherwise go to step 2.

K-means example

K-means Is it guaranteed to find the clustering which optimizes the objective? It is guaranteed to find a local optimal We can prove that the objective decreases with subsequence iterations

Proof sketch of convergence of k-means Justification of first inequality: by assigning xj to the closest mean the objective decreases or stays the same Justification of second inequality: for a given cluster its mean minimizes squared error loss

K-means clustering K-means is the Expected-Maximization solution if we assume data is generated by Gaussian distribution EM: Find clustering of data that maximizes likelihood Unsupervised means no parameters given. Thus we iterate between estimating expected and actual values of parameters PCA gives relaxed solution to k-means clustering. Sketch of proof: Cast k-means clustering as maximization problem Relax cluster indicator variables to be continuous and solution is given by PCA

K-means clustering K-medians variant: K-medoid Select cluster center that is the median Has the effect of minimizing L1 error K-medoid Cluster center is an actual datapoint (not same as k-medians) Algorithms similar to k-means Similar local minima problems to k-means

Other clustering algorithms Spherical k-means Normalize each datapoint (set length to 1) Clustering by finding center with minimum cosine angle to cluster points Similar iterative algorithm to Euclidean k-means Converges with similar proofs.

Other clustering algorithms Hierarchical clustering Initialize n clusters where each datapoint is in its own cluster Merge two nearest clusters into one Update distances of new cluster to existing ones Repeat step 2 until k clusters are formed.

Graph clustering Graph Laplacians widely used in spectral clustering (see tutorial on course website) Weights Cij may be obtained via Epsilon neighborhood graph K-nearest neighbor graph Fully connected graph Allows semi-supervised analysis (where test data is available but not labels)

Graph clustering We can cluster data using the mincut problem Balanced version is NP-hard We can rewrite balanced mincut problem with graph Laplacians. Still NP-hard because solution is allowed only discrete values By relaxing to allow real values we obtain spectral clustering.

Graph Laplacians We can perform clustering with the Laplacian L = D – C where Dii = ΣjCij Basic algorithm for k clusters: Compute first k eigenvectors vi of Laplacian matrix Let V = [v1, v2, …, vk] Cluster rows of V (using k-means)

Graph clustering Why does it work? Relaxation of NP-hard clustering What is relaxation: changing the hard objective into an easier one that will yield real (as oppose to discrete) solutions and be at most the true objective Can a relaxed solution give optimal discrete solution? No guarantees

Graph clustering Cut problems are NP-hard. We obtain relaxed solutions via eigenvectors of original weight matrix and its Laplacian. From Shi and Malik, IEEE Pattern Analysis and Machine Intelligence, 2000

Graph Laplacians and cuts Recall from earlier the WMV dimensionality reduction criterion WMV is also given by where X = [x1, x2, …, xn] contains each xi as a column and L is the graph Laplacian. Thus

Graph Laplacians and cuts For a graph with arbitrary edge weights if we define input vectors x in a certain way then the WMV criterion gives us the ratio cut. For proof see page 9 of tutorial (also give in next slide)

Graph Laplacians and cuts From Spectral clustering tutorial by Ulrike von Luxborg

Graph Laplacians and cuts From Spectral clustering tutorial by Ulrike von Luxborg

Application on population structure data Data are vectors xi where each feature takes on values 0, 1, and 2 to denote number of alleles of a particular single nucleotide polymorphism (SNP) Output yi is an integer indicating the population group a vector belongs to

Publicly available real data Datasets (Noah Rosenberg’s lab): East Asian admixture: 10 individuals from Cambodia, 15 from Siberia, 49 from China, and 16 from Japan; 459,188 SNPs African admixture: 32 Biaka Pygmy individuals, 15 Mbuti Pygmy, 24 Mandenka, 25 Yoruba, 7 San from Namibia, 8 Bantu of South Africa, and 12 Bantu of Kenya; 454,732 SNPs Middle Eastern admixture: contains 43 Druze from Israil-Carmel, 47 Bedouins from Israel-Negev, 26 Palestinians from Israel-Central, and 30 Mozabite from Algeria-Mzab; 438,596 SNPs

East Asian populations

African populations

Middle Eastern populations

K-means applied to PCA data PCA and kernel PCA (poly degree 2) K-means and spherical k-means Number of clusters set to true value