ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for.

Slides:



Advertisements
Similar presentations
CHAPTER 13: Alpaydin: Kernel Machines
Advertisements

Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
INTRODUCTION TO Machine Learning 2nd Edition
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Dimension reduction (1)
Manifold Learning Dimensionality Reduction. Outline Introduction Dim. Reduction Manifold Isomap Overall procedure Approximating geodesic dist. Dijkstra’s.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Visualizing and Exploring Data Summary statistics for data (mean, median, mode, quartile, variance, skewnes) Distribution of values for single variables.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Principal Component Analysis
Dimensionality Reduction and Embeddings
Dimensional reduction, PCA
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Summarized by Soo-Jin Kim
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
PCA explained within the context of Face Recognition Berrin Yanikoglu FENS Computer Science & Engineering Sabancı University Updated Dec Some slides.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CSE 185 Introduction to Computer Vision Face Recognition.
Manifold learning: MDS and Isomap
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Discriminant Analysis
Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Dimensionality reduction
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Mesh Segmentation via Spectral Embedding and Contour Analysis Speaker: Min Meng
Feature Extraction 主講人:虞台文.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Spectral Methods for Dimensionality
Principal Component Analysis (PCA)
CSCE822 Data Mining and Warehousing
INTRODUCTION TO Machine Learning 3rd Edition
Machine Learning Dimensionality Reduction
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Dimensionality Reduction
INTRODUCTION TO Machine Learning
Feature space tansformation methods
Curse of Dimensionality
NonLinear Dimensionality Reduction or Unfolding Manifolds
INTRODUCTION TO Machine Learning
Presentation transcript:

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

Why Reduce Dimensionality? Reduces time complexity: Less computation Reduces space complexity: Less parameters Saves the cost of observing the feature Simpler models are more robust on small datasets More interpretable; simpler explanation Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensions 3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Feature Selection vs Extraction Feature selection: Choosing k<d important features, ignoring the remaining d – k Subset selection algorithms Feature extraction: Project the original x i, i =1,...,d dimensions to new k<d dimensions, z j, j =1,...,k Principal components analysis (PCA), linear discriminant analysis (LDA), factor analysis (FA) 4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Subset Selection There are 2 d subsets of d features Forward search: Add the best feature at each step Set of features F initially Ø. At each iteration, find the best new feature j = argmin i E ( F  x i ) Add x j to F if E ( F  x j ) < E ( F ) Hill-climbing O(d 2 ) algorithm Backward search: Start with all features and remove one at a time, if possible. Floating search (Add k, remove l) 5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Principal Components Analysis (PCA) Find a low-dimensional space such that when x is projected there, information loss is minimized. The projection of x on the direction of w is: z = w T x Find w such that Var(z) is maximized Var(z) = Var(w T x) = E[(w T x – w T μ) 2 ] = E[(w T x – w T μ)(w T x – w T μ)] = E[w T (x – μ)(x – μ) T w] = w T E[(x – μ)(x –μ) T ]w = w T ∑ w where Var(x)= E[(x – μ)(x –μ) T ] = ∑ 6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Maximize Var(z) subject to ||w||=1 ∑w 1 = αw 1 that is, w 1 is an eigenvector of ∑ Choose the one with the largest eigenvalue for Var(z) to be max Second principal component: Max Var(z 2 ), s.t., ||w 2 ||=1 and orthogonal to w 1 ∑ w 2 = α w 2 that is, w 2 is another eigenvector of ∑ and so on. 7 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

What PCA does z = W T (x – m) where the columns of W are the eigenvectors of ∑, and m is sample mean Centers the data at the origin and rotates the axes 8Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Proportion of Variance (PoV) explained when λ i are sorted in descending order Typically, stop at PoV>0.9 Scree graph plots of PoV vs k, stop at “elbow” How to choose k ? 9Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Factor Analysis Find a small number of factors z, which when combined generate x : x i – µ i = v i1 z 1 + v i2 z v ik z k + ε i where z j, j =1,...,k are the latent factors with E[ z j ]=0, Var(z j )=1, Cov(z i,, z j )=0, i ≠ j, ε i are the noise sources E[ ε i ]= ψ i, Cov(ε i, ε j ) =0, i ≠ j, Cov(ε i, z j ) =0, and v ij are the factor loadings 12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

PCA vs FA PCAFrom x to z z = W T (x – µ) FAFrom z to x x – µ = Vz + ε 13 x z z x Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Factor Analysis In FA, factors z j are stretched, rotated and translated to generate x 14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multidimensional Scaling 15 Given pairwise distances between N points, d ij, i,j =1,...,N place on a low-dim map s.t. distances are preserved. z = g (x | θ )Find θ that min Sammon stress Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Map of Europe by MDS 16 Map from CIA – The World Factbook: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Linear Discriminant Analysis Find a low-dimensional space such that when x is projected, classes are well-separated. Find w that maximizes 17 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

18 Between-class scatter: Within-class scatter: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Fisher’s Linear Discriminant 19 Find w that max LDA soln: Parametric soln: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

K>2 Classes 20 Within-class scatter: Between-class scatter: Find W that max The largest eigenvectors of S W -1 S B Maximum rank of K-1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

21Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Isomap Geodesic distance is the distance along the manifold that the data lies in, as opposed to the Euclidean distance in the input space 22Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Isomap Instances r and s are connected in the graph if ||x r -x s ||<  or if x s is one of the k neighbors of x r The edge length is ||x r - x s || For two nodes r and s not connected, the distance is equal to the shortest path between them Once the NxN distance matrix is thus formed, use MDS to find a lower-dimensional mapping 23Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

24 Matlab source from Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Locally Linear Embedding 1. Given x r find its neighbors x s (r) 2. Find W rs that minimize 3. Find the new coordinates z r that minimize 25Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

26Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

LLE on Optdigits Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)27 Matlab source from