Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 27, 2016.

Similar presentations


Presentation on theme: "CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 27, 2016."— Presentation transcript:

1 CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 27, 2016

2 Complexity comparison of clustering methods In number of data points n: – K-means: O(n) … O(n kd ) 1 – Mean shift: O(n 2 ) 2 – Hierarchical agglomerative clustering: O(n 2 ) / O(n 2 logn) 3 – Normalized cuts: O(n 3 ) / O(n) 4 3 http://nlp.stanford.edu/IR-book/completelink.html 4 https://webdocs.cs.ualberta.ca/~dale/papers/cvpr09.pdf 1 http://theory.stanford.edu/~sergei/slides/kMeans-hour.pdf 2 http://lear.inrialpes.fr/people/triggs/events/iccv03/cdrom/iccv03/0456_georgescu.pdf

3 Mean shift vs K-means Statement last class: – “Mean shift can be made equivalent to K-means” If you have a proof for that, email it to me for extra credit

4 Plan for today Dimensionality reduction – motivation Principal Component Analysis (PCA) Applications of PCA Other methods for dimensionality reduction

5 Why reduce dimensionality? Data may intrinsically live in a lower-dim space Too many features and too few data Lower computational expense (memory, train/test time) Want to visualize the data in a lower-dim space Want to use data of different dimensionality

6 Goal Input: Data in a high-dim feature space Output: Projection of same data into a lower- dim space F: high-dim X  low-dim X

7 Goal Slide credit: Erik Sudderth

8 Some criteria for success Low reconstruction error High variance of the data BOARD

9 Principal Components Analysis Slide credit: Subhransu Maji

10 Principal Components Analysis

11 Lagrange Multipliers Goal: maximize f(x) subject to g(x) = 0 Formulate as follows and take derivative wrt x: Additional info: Bishop Appendix E David Barber’s textbook: http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage

12 Slide credit: Subhransu Maji Principal Components Analysis

13 Slide credit: Subhransu Maji Principal Components Analysis

14 Slide credit: Subhransu Maji Principal Components Analysis

15 Demo http://www.cs.pitt.edu/~kovashka/cs2750/PC A_demo.m http://www.cs.pitt.edu/~kovashka/cs2750/PC A_demo.m Demo with eigenfaces: http://www.cs.ait.ac.th/~mdailey/matlab/ http://www.cs.ait.ac.th/~mdailey/matlab/

16 Implementation issue Covariance matrix is huge (D 2 for D pixels) But typically # examples N << D Simple trick – X is DxN matrix of normalized training data – Solve for eigenvectors u of X T X instead of XX T – Then Xu is eigenvector of covariance XX T – Need to normalize each vector of Xu into unit length Adapted from Derek Hoiem

17 Slide credit: Alexander Ihler

18 How to pick K? One goal can be to pick K such that P% of the variance of the data is preserved, e.g. 90% Total variance can be obtained from entries of S – total_variance = sum(S.^ 2) Take as many of these entries as needed – K = find( cumsum(S.^ 2) / total_variance >= P, 1);

19 Variance preserved at i-th eigenvalue Figure 12.4 (a) from Bishop

20 Plan for today Dimensionality reduction – motivation Principal Component Analysis (PCA) Applications of PCA Other methods for dimensionality reduction

21 Application: Face Recognition Image from cnet.com

22 Face recognition: once you’ve detected and cropped a face, try to recognize it DetectionRecognition “Sally” Slide credit: Lana Lazebnik

23 Typical face recognition scenarios Verification: a person is claiming a particular identity; verify whether that is true – E.g., security Closed-world identification: assign a face to one person from among a known set General identification: assign a face to a known person or to “unknown” Slide credit: Derek Hoiem

24 Simple idea for face recognition 1.Treat face image as a vector of intensities 2.Recognize face by nearest neighbor in database (copy identity of image with minimum distance) Slide credit: Derek Hoiem

25 The space of all face images When viewed as vectors of pixel values, face images are extremely high-dimensional – 24x24 image = 576 dimensions – Slow and lots of storage But very few 576-dimensional vectors are valid face images We want to effectively model the subspace of face images Adapted from Derek Hoiem

26 Slide credit: Alexander Ihler

27

28

29 Eigenfaces (PCA on face images) 1.Compute the principal components (“eigenfaces”) of the covariance matrix 2.Keep K eigenvectors with largest eigenvalues 3.Represent all face images in the dataset as linear combinations of eigenfaces M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991Face Recognition using Eigenfaces Adapted from D. Hoiem

30 Representation and reconstruction Face x in “face space” coordinates: Reconstruction: =+ µ + w 1 u 1 +w 2 u 2 +w 3 u 3 +w 4 u 4 + … = ^ x= Slide credit: Derek Hoiem

31 Recognition w/ eigenfaces Process labeled training images Find mean µ and covariance matrix Σ Find k principal components (eigenvectors of Σ) u 1,…u k Project each training image x i onto subspace spanned by principal components: (w i1,…,w ik ) = (u 1 T x i, …, u k T x i ) Given novel image x Project onto subspace: (w 1,…,w k ) = (u 1 T x, …, u k T x) Classify as closest training face in k-dimensional subspace M. Turk and A. Pentland, Face Recognition using EigenfacesFace Recognition using Eigenfaces, CVPR 1991 Adapted from Derek Hoiem WARNING: SUPERVISED

32 Face recognition by humans: 20 resultsFace recognition by humans: 20 results (2005), Slides by Jianchao YangSlides

33 Face recognition by humans: 20 resultsFace recognition by humans: 20 results (2005), Slides by Jianchao YangSlides

34 Face recognition by humans: 20 resultsFace recognition by humans: 20 results (2005), Slides by Jianchao YangSlides

35 Digits example Figure 12.5 from Bishop

36 Slide credit: Alexander Ihler

37

38

39

40

41

42

43 Plan for today Dimensionality reduction – motivation Principal Component Analysis (PCA) Applications of PCA Other methods for dimensionality reduction

44 PCA General dimensionality reduction technique Preserves most of variance with a much more compact representation – Lower storage requirements (eigenvectors + a few numbers per face) – Faster matching What are some problems? Slide credit: Derek Hoiem

45 Limitations The direction of maximum variance is not always good for classification Slide credit: Derek Hoiem WARNING: SUPERVISED

46 Limitations PCA preserves maximum variance A more discriminative subspace: Fisher Linear Discriminants  “Fisher Faces” FLD preserves discrimination – Find projection that maximizes scatter between classes and minimizes scatter within classes Reference: Eigenfaces vs. Fisherfaces, Belheumer et al., PAMI 1997Eigenfaces vs. Fisherfaces, Belheumer et al., PAMI 1997 Adapted from Derek Hoiem WARNING: SUPERVISED

47 Illustration of the Projection Poor Projection x1 x2 x1 x2  Using two classes as example: Good Slide credit: Derek Hoiem WARNING: SUPERVISED

48 Comparing with PCA Slide credit: Derek Hoiem WARNING: SUPERVISED

49 Other dimensionality reduction methods Non-linear: – Kernel PCA (Schölkopf et al., Neural Computation 1998) – Independent component analysis – Comon, Signal Processing 1994 – LLE (locally linear embedding) – Roweis and Saul, Science 2000 – ISOMAP (isometric feature mapping) – Tenenbaum et al., Science 2000 – t-SNE (t-distributed stochastic neighbor embedding) – van der Maaten and Hinton, JMLR 2008

50 Kernel PCA Assume zero-mean data Data is transformed via φ(x) so covariance matrix C becomes: Eigenvalue problem becomes: Projection vector becomes:

51 Kernel PCA Figure 12.16 from Bishop

52 ISOMAP Example Figure from Carlotta Domeniconi

53 ISOMAP Example Figure from Carlotta Domeniconi

54 t-SNE Example Figure from Genevieve Patterson, IJCV 2014

55 t-SNE Example Thomas and Kovashka, in submission

56 t-SNE Example Thomas and Kovashka, in submission

57 t-SNE Example Thomas and Kovashka, in submission

58 Feature selection (task-dependent) Filtering approaches – Pick features that on their own can classify well – E.g. how well they separate cases Wrapper approaches – Greedily add features that most increase classification accuracy Embedded methods – Joint learning and selection (e.g. in SVMs)


Download ppt "CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 27, 2016."

Similar presentations


Ads by Google