Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.

Slides:



Advertisements
Similar presentations
3D Geometry for Computer Graphics
Advertisements

Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Intel Research Internet Coordinate Systems - 03/03/2004 Internet Coordinate Systems Marcelo Pias Intel Research Cambridge
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Sub Exponential Randomize Algorithm for Linear Programming Paper by: Bernd Gärtner and Emo Welzl Presentation by : Oz Lavee.
Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Dimensionality Reduction PCA -- SVD
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Principal Component Analysis
Principal Components. Karl Pearson Principal Components (PC) Objective: Given a data matrix of dimensions nxp (p variables and n elements) try to represent.
Dimensionality Reduction and Embeddings
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Dimensionality Reduction
Dimensionality Reduction
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Unsupervised Learning
CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Footer Here1 Feature Selection Copyright, 1996 © Dale Carnegie & Associates, Inc. David Mount For CMSC 828K: Algorithms and Data Structures for Information.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
Summarized by Soo-Jin Kim
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Presented By Wanchen Lu 2/25/2013
Linear Algebra Chapter 4 Vector Spaces.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
13 th Nov Geometry of Graphs and It’s Applications Suijt P Gujar. Topics in Approximation Algorithms Instructor : T Kavitha.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CSE 185 Introduction to Computer Vision Face Recognition.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
AGC DSP AGC DSP Professor A G Constantinides©1 Signal Spaces The purpose of this part of the course is to introduce the basic concepts behind generalised.
Principle Component Analysis and its use in MA clustering Lecture 12.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Clustering High-Dimensional Data. Clustering high-dimensional data – Many applications: text documents, DNA micro-array data – Major challenges: Many.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
Unsupervised Learning II Feature Extraction
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp Spring 2007.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Dimensionality Reduction
Dimensionality Reduction and Embeddings
Principal Component Analysis
Feature space tansformation methods
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
INTRODUCTION TO Machine Learning
Presentation transcript:

Dimensionality reduction

Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections Random projections Principal Component Analysis (PCA)

Multi-Dimensional Scaling (MDS) So far we assumed that we know both data points X and distance matrix D between these points What if the original points X are not known but only distance matrix D is known? Can we reconstruct X or some approximation of X?

Problem Given distance matrix D between n points Find a k-dimensional representation of every x i point i So that d(x i,x j ) is as close as possible to D(i,j) Why do we want to do that?

How can we do that? (Algorithm)

High-level view of the MDS algorithm Randomly initialize the positions of n points in a k-dimensional space Compute pairwise distances D’ for this placement Compare D’ to D Move points to better adjust their pairwise distances (make D’ closer to D) Repeat until D’ is close to D

The MDS algorithm Input: n x n distance matrix D Random n points in the k-dimensional space (x 1,…,x n ) stop = false while not stop – totalerror = 0.0 – For every i,j compute D’(i,j)=d(x i,x j ) error = (D(i,j)-D’(i,j))/D(i,j) totalerror +=error For every dimension m: x im = (x im -x jm )/D’(i,j)*error – If totalerror small enough, stop = true

Questions about MDS Running time of the MDS algorithm – O(n 2 I), where I is the number of iterations of the algorithm MDS does not guarantee that metric property is maintained in d’ Faster? Guarantee of metric property?

Problem (revisited) Given distance matrix D between n points Find a k-dimensional representation of every x i point i So that: – d(x i,x j ) is as close as possible to D(i,j) – d(x i,x j ) is a metric – Algorithm works in time linear in n

FastMap Select two pivot points x a and x b that are far apart. Compute a pseudo-projection of the remaining points along the “line” x a x b “Project” the points to a subspace orthogonal to “line” x a x b and recurse.

Selecting the Pivot Points The pivot points should lie along the principal axes, and hence should be far apart. – Select any point x 0 – Let x 1 be the furthest from x 0 – Let x 2 be the furthest from x 1 – Return (x 1, x 2 ) x0x0 x2x2 x1x1

Pseudo-Projections Given pivots (x a, x b ), for any third point y, we use the law of cosines to determine the relation of y along x a x b The pseudo-projection for y is This is first coordinate. xaxa xbxb y cycy d a,y d b,y d a,b

“Project to orthogonal plane” Given distances along x a x b compute distances within the “orthogonal hyperplane” Recurse using d ’(.,.), until k features chosen. xbxb xaxa y z y’ z’ d’ y’,z’ d y,z c z -c y

The FastMap algorithm D: distance function, Y: nxk data points f=0 //global variable FastMap(k,D) – If k<=0 return – (x a,x b )  chooseDistantObjects(D) – If(D(x a,x b )==0), set Y[i,f]=0 for every i and return – Y[i,f] = [D(a,i) 2 +D(a,b) 2 -D(b,i) 2 ]/(2D(a,b)) – D’(i,j) // new distance function on the projection – f++ – FastMap(k-1,D’)

FastMap algorithm Running time – Linear number of distance computations

The Curse of Dimensionality Data in only one dimension is relatively packed Adding a dimension “stretches” the points across that dimension, making them further apart Adding more dimensions will make the points further apart — high dimensional data is extremely sparse Distance measure becomes meaningless (graphs from Parsons et al. KDD Explorations 2004)

The curse of dimensionality The efficiency of many algorithms depends on the number of dimensions d – Distance/similarity computations are at least linear to the number of dimensions – Index structures fail as the dimensionality of the data increases

Goals Reduce dimensionality of the data Maintain the meaningfulness of the data

Dimensionality reduction Dataset X consisting of n points in a d- dimensional space Data point x i єR d (d-dimensional real vector): x i = [x i1, x i2,…, x id ] Dimensionality reduction methods: – Feature selection: choose a subset of the features – Feature extraction: create new features by combining new ones

Dimensionality reduction Dimensionality reduction methods: – Feature selection: choose a subset of the features – Feature extraction: create new features by combining new ones Both methods map vector x i єR d, to vector y i є R k, (k<<d) F : R d  R k

Linear dimensionality reduction Function F is a linear projection y i = A x i Y = A X Goal: Y is as close to X as possible

Closeness: Pairwise distances Johnson-Lindenstrauss lemma: Given ε>0, and an integer n, let k be a positive integer such that k≥k 0 =O(ε -2 logn). For every set X of n points in R d there exists F: R d  R k such that for all x i, x j єX (1-ε)||x i - x j || 2 ≤ ||F(x i )- F(x j )|| 2 ≤ (1+ε)||x i - x j || 2 What is the intuitive interpretation of this statement?

JL Lemma: Intuition Vectors x i єR d, are projected onto a k- dimensional space (k<<d): y i = R x i If ||x i ||=1 for all i, then, ||x i -x j || 2 is approximated by (d/k)||x i -x j || 2 Intuition: – The expected squared norm of a projection of a unit vector onto a random subspace through the origin is k/d – The probability that it deviates from expectation is very small

JL Lemma: More intuition x=(x 1,…,x d ), d independent Gaussian N(0,1) random variables; y = 1/|x|(x 1,…,x d ) z : projection of y into first k coordinates – L = |z| 2, μ = E[L] = k/d Pr(L ≥ (1+ε)μ)≤1/n 2 and Pr(L ≤ (1-ε)μ)≤1/n 2 f(y) = sqrt(d/k)z What is the probability that for pair (y,y’): |f(y)- f(y’)| 2 /(|y-y’|) does not lie in range [(1-ε),(1+ ε)]? What is the probability that some pair suffers?

Finding random projections Vectors x i єR d, are projected onto a k- dimensional space (k<<d) Random projections can be represented by linear transformation matrix R y i = R x i What is the matrix R?

Finding random projections Vectors x i єR d, are projected onto a k- dimensional space (k<<d) Random projections can be represented by linear transformation matrix R y i = R x i What is the matrix R?

Finding matrix R Elements R(i,j) can be Gaussian distributed Achlioptas* has shown that the Gaussian distribution can be replaced by All zero mean, unit variance distributions for R(i,j) would give a mapping that satisfies the JL lemma Why is Achlioptas result useful?