Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Covariance Matrix Applications
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Machine Learning Lecture 8 Data Processing and Representation
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Principal Component Analysis
Principal Components. Karl Pearson Principal Components (PC) Objective: Given a data matrix of dimensions nxp (p variables and n elements) try to represent.
3D Geometry for Computer Graphics
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Dimensional reduction, PCA
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Face Recognition Using Eigenfaces
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Summarized by Soo-Jin Kim
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Outline Separating Hyperplanes – Separable Case
This week: overview on pattern recognition (related to machine learning)
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CSE 185 Introduction to Computer Vision Face Recognition.
Non-Linear Dimensionality Reduction
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Linear Models for Classification
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Dimensionality reduction
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Irena Váňová. B A1A1. A2A2. A3A3. repeat until no sample is misclassified … labels of classes Perceptron algorithm for i=1...N if then end * * * * *
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Unsupervised Learning II Feature Extraction
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Principal Component Analysis (PCA)
Dimensionality Reduction
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
CS 2750: Machine Learning Dimensionality Reduction
Face Recognition and Feature Subspaces
Lecture: Face Recognition and Feature Reduction
Machine Learning Basics
Machine Learning Dimensionality Reduction
Classification Discriminant Analysis
Principal Component Analysis
Principal Component Analysis
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Machine Learning Math Essentials Part 2
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensionality Reduction
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Feature Selection Methods
Principal Component Analysis
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Principal Component Analysis Machine Learning

Last Time Expectation Maximization in Graphical Models – Baum Welch

Now Unsupervised Dimensionality Reduction

Curse of Dimensionality In (nearly) all modeling approaches, more features (dimensions) require (a lot) more data – Typically exponential in the number of features This is clearly seen from filling a probability table. Topological arguments are also made. – Compare the volume of an inscribed hypersphere to a hypercube

Dimensionality Reduction We’ve already seen some of this. Regularization attempts to reduce the number of effective features used in linear and logistic regression classifiers

Linear Models When we regularize, we optimize a function that ignores as many features as possible. The “effective” number of dimensions is much smaller than D

Support Vector Machines In exemplar approaches (SVM, k-nn) each data point can be considered to describe a dimension. By selecting only those instances that maximize the margin (setting α to zero), SVMs use only a subset of available dimensions in their decision making.

Decision Trees Decision Trees explicitly select split points based on features that improve InformationGain or Accuracy Features that don’t contribute to the classification sufficiently are never used. weight <165 5M height <68 5F 1F / 1M

Feature Spaces Even though a data point is described in terms of N features, this may not be the most compact representation of the feature space Even classifiers that try to use a smaller effective feature space can suffer from the curse-of-dimensionality If a feature has some discriminative power, the dimension may remain in the effective set.

1-d data in a 2-d world

Dimensions of high variance

Identifying dimensions of variance Assumption: directions that show high variance represent the appropriate/useful dimension to represent the feature set.

Aside: Normalization Assume 2 features: – Percentile GPA – Height in cm. Which dimension shows greater variability?

Aside: Normalization Assume 2 features: – Percentile GPA – Height in cm. Which dimension shows greater variability?

Aside: Normalization Assume 2 features: – Percentile GPA – Height in m. Which dimension shows greater variability?

Principal Component Analysis Principal Component Analysis (PCA) identifies the dimensions of greatest variance of a set of data.

Eigenvectors Eigenvectors are orthogonal vectors that define a space, the eigenspace. Any data point can be described as a linear combination of eigenvectors. Eigenvectors of a square matrix have the following property. The associated lambda is the eigenvalue.

PCA Write each data point in this new space To do the dimensionality reduction, keep C < D dimensions. Each data point is now represented as a vector of c’s.

Identifying Eigenvectors PCA is easy once we have eigenvectors and the mean. Identifying the mean is easy. Eigenvectors of the covariance matrix, represent a set of direction of variance. Eigenvalues represent the degree of the variance.

Eigenvectors of the Covariance Matrix Eigenvectors are orthonormal In the eigenspace, the Gaussian is diagonal – zero covariance. All eigen values are non-negative. Eigenvalues are sorted. Larger eigenvalues, higher variance

Dimensionality reduction with PCA To convert from an original data point to PCA To reconstruct a point

Eigenfaces Encoded then Decoded. Efficiency can be evaluated with Absolute or Squared error

Some other (unsupervised) dimensionality reduction techniques Kernel PCA Distance Preserving Dimension Reduction Maximum Variance Unfolding Multi Dimensional Scaling (MDS) Isomap

Next Time – Model Adaptation and Semi-supervised Techniques Work on your projects.