University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Speech and.

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
Surface normals and principal component analysis (PCA)
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Support Vector Machines
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Machine Learning Lecture 8 Data Processing and Representation
Dimension reduction (1)
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Principal Component Analysis
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Dimensionality Reduction
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
An Introduction to Support Vector Machines Martin Law.
SVD(Singular Value Decomposition) and Its Applications
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Summarized by Soo-Jin Kim
Presented By Wanchen Lu 2/25/2013
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
An Introduction to Support Vector Machines (M. Law)
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Manifold learning: MDS and Isomap
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Spectral Methods for Dimensionality
CS 9633 Machine Learning Support Vector Machines
Dimensionality Reduction
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
Feature space tansformation methods
NonLinear Dimensionality Reduction or Unfolding Manifolds
Presentation transcript:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Ilja Sidoroff Pasi Fränti Dimensionality Clustering Methods: Part 6

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Dimensionality of data Dimensionality of data set = the minimum number of free variables needed to represent data without information loss An d-attribute data set has an intrinsic dimensionality (ID) of M if its elements lie entirely within an M-dimensional subspace of R d (M < d)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Dimensionality of data The use of more dimensions than necessary leads to problems: –greater storage requirements –the speed of algorithms is slower –finding clusters and creating good classifiers is more difficult (curse of dimensionality)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Curse of dimensionality When the dimensionality of space increases, distance measures become less useful –all points are more or less equidistant –most of the volume of a sphere is concentrated on a thin layer near the surface of the sphere (eg. next slide)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax V(r) – volume of sphere with radius r D – dimension of the sphere

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Two approaches Estimation of dimensionality –knowing ID of data set could help in tuning classification or clustering performance Dimensionality reduction –projecting data to some subspace –eg. 2D/3D visualisation of multi- dimensional data set –may result in information loss if the subspace dimension is smaller than ID

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Goodness of the projection Can be estimated by two measures: Trustworthiness: data points that are not neighbours in input space are not mapped as neighbours in output space. Continuity: data points that are close are not mapped far away in output space [11].

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Trustworthiness N - number of feature vectors r(i,j) – the rank of data sample j in the ordering according to the distance from i in the original data space U k (i) – set of feature vectors that are in the size k-neighbourhood of sample i in the projection space but not in the original space A(k) – Scales the measure between 0 and 1

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Continuity r'(i,j) – the rank of data sample j in the ordering according to the distance from i in the projection space V k (i) – set of feature vectors that are in the size k- neighbourhood of sample i in the original space but not in the projection space

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Example data sets Swiss roll: D points 2D manifold in 3D space

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Example data sets 16  16 pixel images of hands in different positions Each image can be considered as 4096-dimensional data element Could also be interpreted in terms of finger extension – wrist rotation (2D)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Example data sets

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Synthetic data sets [11] S-shaped manifold Sphere Six clusters

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Principal component analysis (PCA) Idea: find directions of maximal variance and align coordinate axis to them. If variance is zero, that dimension is not needed. Drawback: works well only with linear data [1]

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax PCA method (1/2) Center data so that its means are zero Calculate covariance matrix for data Calculate eigenvalues and eigenvectors of the covariance matrix Arrange eigenvectors according to the eigenvalues For dimensionality reduction, choose the desired number of eigenvectors (2 or 3 for visualization)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax PCA Method Intrinsic dimensionality = number of non-zero eigenvalues Dimensionality reduction by projection: y i = Ax i Here x i is the input vector, y i the output vector, and A is the matrix containing eigenvectors corresponding to the largest eigenvalues. For visualization typically 2 or 3 eigenvalues preserved.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Example of PCA The distances between points are different in projections. Test set c: – two clusters are projected into one cluster – s-shaped cluster is projected nicely

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Another example of PCA [10] Data set: point lying on circle: (x 2 + y 2 = 1), ID = 2 PCA yield two non-null eigenvalues u, v – principal components

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Limitations of PCA Since eigenvectors are orthogonal works well only with linear data Tends to overestimate ID Kernel PCA uses so called kernel trick to apply PCA also to non linear data –make non linear projection into a higher dimensional space, perform PCA analysis in this space

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Multidimensional scaling method (MDS) Project data into a new space while trying to preserve distances between data points Define stress E (difference of pairwise distances in original and projection spaces) E is minimized using some optimization algorithm With certain stress functions (i.e. Kruskal) when E is 0, perfect projection exists ID of the data is the smallest projection dimension where perfect projection exists

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Metric MDS The simplest stress function [2], raw stress: d(x i, x j ) distance in the original space d(y i, y j ) distance in the projection space y i, y j representation of x i, x j in output space

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Sammon's Mapping Sammon's mapping gives small distances a larger weight [5]:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Kruskal's stress Ranking the point distances accounts for decreasing distances in lower dimensional projections:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax MDS example Separates clusters better than PCA Local structures are not always preserved (leftmost test set)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Other MDS approaches ISOMAP [12] Curvilinear component analysis CCA [13]

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Local methods Previous methods are global in the sense that the all input data is considered at once. Local methods consider only some neighbourhood of data points  may be computationally less demanding Try to estimate topological dimension of the data manifold

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Fukunaga-Olsen algorithm [6] Assume that data can be divided into small regions, i.e. clustered Each cluster (voronoi set) of the data vector lies in an approximately linear surface => PCA method can be applied to each cluster Eigenvalues are normalized by diving by the largest eigenvalue

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Fukunaga-Olsen algorithm ID is defined as the number of normalized eigenvalues that are larger than a threshold T Defining a good threshold is a problem as such

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Near neighbour algorithm Trunk's method [7]: –An initial value for an integer parameter k is chosen (usually k=1). –k nearest neighbours for each data vector are identified. –for each data vector i, subspace spanned by vectors from i to each of its k neighbours is constructed.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Near neighbour algorithm –The angle between (k+1) th near neighbour and its projection to the subspace is calculated for each data vector –If the average of these angles is below a threshold, ID is k, otherwise increase k and repeat the process (k+1) th - neighbour subspace angle

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Pseudocode

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Near neighbour algorithm It is not clear how to select suitable value for threshold Improvements to Trunk's method –Pettis et al. [8] –Verver-Duin [9]

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Fractal methods Global methods, but different definition of dimensionality Basic idea: –count the observations inside a ball of radius r (f(r)). –analyse the growth rate of f(r) –if f grows as r k the dimensionality of data can be considered as k

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Fractal methods Dimensionality can be fractional, i.e. 1.5 So does not provide projections for lesser dimensional space (what is an R 1,5 anyway?) Fractal dimensionality estimate can be used in time-series analysis etc. [10]

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Fractal methods Different definitions for fractal dimensions [10] –Hausdorff dimension –Box-counting dimension –Correlation dimension In order to get an accurate estimate of the dimension D, the data set cardinality must be at least 10 D/2

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Hausdorff dimension data set is covered by cells s i with variable diameter r i, all r i < r in other words, we look for collection of covering sets s i with diameter less than or equal to r, which minimizes the sum d-dimensional Hausdorff measure:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Hausdorff dimension For every data set Γ d H is infinite if d is less than some critical value D H, and 0 if d is greater than D H The critical value D H is the Hausdorff dimension of the data set

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Box-Counting dimension Hausdorff dimension is not easy to calculate Box-Counting D B dimension is an upper bound of Hausdorff dimension, does not usually differ from it: v(r) – is the number of the boxes of size r needed to cover the data set

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Although Box-Counting dimension is easier to calculate than Hausdorff dimension, the algorithmic complexity grows exponentially with the set dimensionality => can be used only for low-dimensional data sets Correlation dimension is computationally more feasible fractal dimension measure Correlation dimension is an lower bound of the Box-Counting dimension Box-Counting dimension

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Correlation dimension Let x 1, x 2, x 3,..., x N be data points Correlation integral can be defined as: I(x) is indicator function: I(x) = 1, iff x is true, I(x) = 0, otherwise.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Correlation dimension (some explanation needed!!!)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Literature 1.M. Kirby, Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns, John Wiley and Sons, J. B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika 29 (1964) 1–27. 3.R. N. Shepard, The analysis of proximities: Multimensional scaling with an unknown distance function, Psychometrika 27 (1962) 125– R. S. Bennett, The intrinsic dimensionality of signal collections, IEEE Transactions on Information Theory 15 (1969) 517– J. W. J. Sammon, A nonlinear mapping for data structure analysis, IEEE Transaction on Computers C-18 (1969) 401– K. Fukunaga, D. R. Olsen, An algorithm for finding intrinsic dimensionality of data, IEEE Transactions on Computers 20 (2) (1976) 165– G. V. Trunk, Statistical estimation of the intrinsic dimensionality of a noisy signal collection, IEEE Transaction on Computers 25 (1976) 165–171.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax K. Pettis, T. Bailey, T. Jain, R. Dubes, An intrinsic dimensionality estimator from near-neighbor information, IEEE Transaction on Pattern Analysis and Machine Intelligence 1 (1) (1979) 25–37. 9.P. J. Verveer, R. Duin, An evaluation of intrinsic dimensionality estimators, IEEE Transaction on Pattern Analysis and Machine Intelligence 17 (1) (1995) 81– F. Camastra, Data dimensionality estimation methods: a survey, Pattern Recognition 36 (2003) J. Venna, Dimensionality reduction for visual exploration of similarity structures (2007), PhD thesis manuscript (submitted) 12.J. B. Tenenbaum, V. de Silva, J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (12) (2000) 2319– P. Demartines, J. Herault, Curvilinear component analysis: A self- organizing neural network for nonlinear mapping in cluster analysis, IEEE Transactions on Neural Networks 8 (1) (1997) 148– 154. Literature