Download presentation
Presentation is loading. Please wait.
Published byGwendoline McCormick Modified over 6 years ago
1
CS 2750: Machine Learning Dimensionality Reduction
Prof. Adriana Kovashka University of Pittsburgh January 19, 2017
2
Plan for today Dimensionality reduction – motivation
Principal Component Analysis (PCA) Applications of PCA Other methods for dimensionality reduction
3
Why reduce dimensionality?
Data may intrinsically live in a lower-dim space Too many features and too few data Lower computational expense (memory, train/test time) Want to visualize the data in a lower-dim space Want to use data of different dimensionality
4
Goal Input: Data in a high-dim feature space
Output: Projection of same data into a lower-dim space F: high-dim X low-dim X
5
Goal Slide credit: Erik Sudderth
6
Some criteria for success
Find a projection where the data has: Low reconstruction error High variance of the data See hand-written notes for how we find the optimal projection
7
Principal Components Analysis
Slide credit: Subhransu Maji
8
Demo http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA_demo.m
Demo with eigenfaces:
9
Implementation issue Covariance matrix is huge (D2 for D pixels)
But typically # examples N << D Simple trick X is NxD matrix of normalized training data Solve for eigenvectors u of XXT instead of XTX Then Xu is eigenvector of covariance XTX Need to normalize each vector of Xu into unit length Adapted from Derek Hoiem
10
How to pick K? One goal can be to pick K such that P% of the variance of the data is preserved, e.g. 90% Let Λ = a vector containing the eigenvalues of the covariance matrix Total variance can be obtained from entries of Λ total_variance = sum(Λ); Take as many of these entries as needed K = find( cumsum(Λ) / total_variance >= P, 1);
11
Variance preserved at i-th eigenvalue
Figure 12.4 (a) from Bishop
12
Application: Face Recognition
Image from cnet.com
13
Face recognition: once you’ve detected and cropped a face, try to recognize it
Detection Recognition “Sally” Slide credit: Lana Lazebnik
14
Typical face recognition scenarios
Verification: a person is claiming a particular identity; verify whether that is true E.g., security Closed-world identification: assign a face to one person from among a known set General identification: assign a face to a known person or to “unknown” Slide credit: Derek Hoiem
15
The space of all face images
When viewed as vectors of pixel values, face images are extremely high-dimensional 24x24 image = 576 dimensions Slow and lots of storage But very few 576-dimensional vectors are valid face images We want to effectively model the subspace of face images Adapted from Derek Hoiem
16
Representation and reconstruction
Face x in “face space” coordinates: Reconstruction: = = + ^ x = µ w1u1+w2u2+w3u3+w4u4+ … Slide credit: Derek Hoiem
17
Recognition w/ eigenfaces
Process labeled training images Find mean µ and covariance matrix Σ Find k principal components (eigenvectors of Σ) u1,…uk Project each training image xi onto subspace spanned by principal components: (wi1,…,wik) = (u1Txi, … , ukTxi) Given novel image x Project onto subspace: (w1,…,wk) = (u1Tx, … , ukTx) Classify as closest training face in k-dimensional subspace M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991 Adapted from Derek Hoiem
18
Slide credit: Alexander Ihler
19
Slide credit: Alexander Ihler
20
Slide credit: Alexander Ihler
21
Slide credit: Alexander Ihler
22
Slide credit: Alexander Ihler
23
Slide credit: Alexander Ihler
24
Slide credit: Alexander Ihler
25
Slide credit: Alexander Ihler
26
Slide credit: Alexander Ihler
27
Slide credit: Alexander Ihler
28
Plan for today Dimensionality reduction – motivation
Principal Component Analysis (PCA) Applications of PCA Other methods for dimensionality reduction
29
PCA General dimensionality reduction technique
Preserves most of variance with a much more compact representation Lower storage requirements (eigenvectors + a few numbers per face) Faster matching What are some problems? Slide credit: Derek Hoiem
30
PCA limitations The direction of maximum variance is not always good for classification Slide credit: Derek Hoiem
31
PCA limitations PCA preserves maximum variance
A more discriminative subspace: Fisher Linear Discriminants FLD preserves discrimination Find projection that maximizes scatter between classes and minimizes scatter within classes Adapted from Derek Hoiem
32
Fisher’s Linear Discriminant
Using two classes as example: x2 x2 x1 x1 Poor Projection Good Slide credit: Derek Hoiem
33
Comparison with PCA Slide credit: Derek Hoiem
34
Other dimensionality reduction methods
Non-linear: Kernel PCA (Schölkopf et al., Neural Computation 1998) Independent component analysis – Comon, Signal Processing 1994 LLE (locally linear embedding) – Roweis and Saul, Science 2000 ISOMAP (isometric feature mapping) – Tenenbaum et al., Science 2000 t-SNE (t-distributed stochastic neighbor embedding) – van der Maaten and Hinton, JMLR 2008
35
ISOMAP example Figure from Carlotta Domeniconi
36
ISOMAP example Figure from Carlotta Domeniconi
37
t-SNE example Figure from Genevieve Patterson, IJCV 2014
38
t-SNE example Thomas and Kovashka, CVPR 2016
39
t-SNE example Thomas and Kovashka, CVPR 2016
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.