Clustering methods Course code: 175314 Pasi Fränti 10.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,

Slides:



Advertisements
Similar presentations
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Advertisements

Clustering Basic Concepts and Algorithms
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
PARTITIONAL CLUSTERING
K-means method for Signal Compression: Vector Quantization
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Neural Network Homework Report: Clustering of the Self-Organizing Map Professor : Hahn-Ming Lee Student : Hsin-Chung Chen M IEEE TRANSACTIONS ON.
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Image segmentation by clustering in the color space CIS581 Final Project Student: Qifang Xu Advisor: Dr. Longin Jan Latecki.
GPS Trajectories Analysis in MOPSI Project Minjie Chen SIPU group Univ. of Eastern Finland.
Clustering Methods: Part 2d Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Data mining and machine learning A brief introduction.
Self Organizing Maps (SOM) Unsupervised Learning.
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSIE Dept., National Taiwan Univ., Taiwan
Nearest Neighbor Searching Under Uncertainty
Dynamic Programming.
Ensembles of Partitions via Data Resampling
Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and.
Fast search methods Pasi Fränti Clustering methods: Part 5 Speech and Image Processing Unit School of Computing University of Eastern Finland
FAST DYNAMIC QUANTIZATION ALGORITHM FOR VECTOR MAP COMPRESSION Minjie Chen, Mantao Xu and Pasi Fränti University of Eastern Finland.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern.
Reference line approach in vector data compression Alexander Akimov, Alexander Kolesnikov and Pasi Fränti UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Chapter 4: Feature representation and compression
Image Segmentation Superpixel methods Speaker: Hsuan-Yi Ko.
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
A new clustering tool of Data Mining RAPID MINER.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Multilevel thresholding by fast PNN based algorithm UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Olli Virmajoki and Pasi Fränti.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
How to cluster data Algorithm review Extra material for DAA Prof. Pasi Fränti Speech & Image Processing Unit School of Computing University.
POSTER TEMPLATE BY: Background Objectives Psychophysical Experiment Photo OCR Design Project Pipeline and outlines ❑ Deep Learning.
Genetic Algorithms for clustering problem Pasi Fränti
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
南台科技大學 資訊工程系 Region partition and feature matching based color recognition of tongue image 指導教授:李育強 報告者 :楊智雁 日期 : 2010/04/19 Pattern Recognition Letters,
Clustering Categorical Data
COMP24111 Machine Learning K-means Clustering Ke Chen.
Agglomerative clustering (AC)
Semi-Supervised Clustering
The Johns Hopkins University
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Random Swap algorithm Pasi Fränti
Random Swap algorithm Pasi Fränti
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Problem Definition Input: Output: Requirement:
Foundation of Video Coding Part II: Scalar and Vector Quantization
Pasi Fränti and Sami Sieranoja
Introduction to Cluster Analysis
Topic 5: Cluster Analysis
Presentation transcript:

Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Part 1: Introduction

Sample data Sources of RGB vectors Red-Green plot of the vectors

Sample data Employment statistics:

Application example 1 Color reconstruction Image with compression artifacts Image with original colors

Application example 2 speaker modeling for voice biometrics Training data Feature extraction and clustering Matti Mikko Tomi Speaker models Tomi Matti Feature extraction Best match: Matti ! Mikko ?

Speaker modeling Speech dataResult of clustering

Application example 3 Image segmentation Normalized color plots according to red and green components. Image with 4 color clusters red green

Application example 4 Quantization Quantized signal Original signal Approximation of continuous range values (or a very large set of possible discrete values) by a small set of discrete symbols or integer values

Color quantization of images Color imageRGB samples Clustering

Application example 5 Clustering of spatial data

Clustered locations of users

Clustering of photos Timeline clustering

Clustering GPS trajectories Mobile users, taxi routes, fleet management

Conclusions from clusters Cluster 1: Office Cluster 2: Home

Part I: Clustering problem

Subproblems of clustering Where are the clusters? (Algorithmic problem) How many clusters? (Methodological problem: which criterion?) Selection of attributes (Application related problem) Preprocessing the data (Practical problems: normalization, outliers)

Clustering result as partition Illustrated by Voronoi diagram Illustrated by Convex hulls Cluster prototypes Partition of data

Cluster prototypes Partition of data Centroids as prototypes Partition by nearest prototype mapping Duality of partition and centroids

Cluster missingClusters missing Too many clusters Incorrect cluster allocation Incorrect number of clusters Challenges in clustering

How to solve? Solve the clustering:   Given input data (X) of N data vectors, and number of clusters (M), find the clusters.   Result given as a set of prototypes, or partition. Solve the number of clusters:   Define appropriate cluster validity function f.   Repeat the clustering algorithm for several M.   Select the best result according to f. Solve the problem efficiently. Algorithmic problem Mathematical problem Computer science problem

Taxonomy of clustering [Jain, Murty, Flynn, Data clustering: A review, ACM Computing Surveys, 1999.] One possible classification based on cost function. MSE is well defined and most popular.

Definitions and data Set of N data points: X={x 1, x 2, …, x N } Set of M cluster prototypes (centroids): C={c 1, c 2, …, c M }, P={p 1, p 2, …, p M }, Partition of the data:

Distance and cost function Euclidean distance of data vectors: Mean square error:

  Centroid condition: for a given partition (P), optimal cluster centroids (C) for minimizing MSE are the average vectors of the clusters: Dependency of data structures  Optimal partition: for a given centroids (C), optimal partition is the one with nearest centroid :

Complexity of clustering Clustering problem is NP complete [Garey et al., 1982] Optimal solution by branch-and-bound in exponential time. Practical solutions by heuristic algorithms. Number of possible clusterings:

Cluster software Main area Input area Output area Main area: working space for data Input area: inputs to be processed Output area: obtained results Menu Process: selection of operation

Clustering image Data set Codebook Partition Procedure to simulate k-means Open data set (file *.ts), move it into Input area Process – Random codebook, select number of clusters REPEAT Move obtained codebook from Output area into Input area Process – Optimal partition, select Error function Move codebook into Main area, partition into Input area Process – Optimal codebook UNTIL DESIRED CLUSTERING

XLMiner software

Example of data in XLMiner

Distance matrix & dendrogram

Conclusions   Clustering is a fundamental tools needed in Speech and Image processing.   Failing to do clustering properly may defect the application analysis.   Good clustering tool needed so that researchers can focus on application requirements.

1. 1. S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, 3rd edition, C. Bishop, Pattern Recognition and Machine Learning, Springer, A.K. Jain, M.N. Murty and P.J. Flynn, Data clustering: A review, ACM Computing Surveys, 31(3): , September M.R. Garey, D.S. Johnson and H.S. Witsenhausen, The complexity of the generalized Lloyd-Max problem, IEEE Transactions on Information Theory, 28(2): , March F. Aurenhammer: Voronoi diagrams-a survey of a fundamental geometric data structure, ACM Computing Surveys, 23 (3), , September Literature