DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Curse of Dimensionality Prof. Navneet Goyal Dept. Of Computer Science & Information Systems BITS - Pilani.
Evaluation of Decision Forests on Text Categorization
An Overview of Machine Learning
University of Amsterdam Search, Navigate, and Actuate - Quantitative Navigation Arnoud Visser 1 Search, Navigate, and Actuate Quantative Navigation.
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Spike Sorting Goal: Extract neural spike trains from MEA electrode data Method 1: Convolution of template spikes Method 2: Sort by spikes features.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Assuming normally distributed data! Naïve Bayes Classifier.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Instance Based Learning
Dimensional reduction, PCA
1 Visualizing the Legislature Howard University - Systems and Computer Science October 29, 2010 Mugizi Robert Rwebangira.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
PATTERN RECOGNITION AND MACHINE LEARNING
This week: overview on pattern recognition (related to machine learning)
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
The Curse of Dimensionality Richard Jang Oct. 29, 2003.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Jakob Verbeek December 11, 2009
Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Image Classification for Automatic Annotation
Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.
On the relevance of facial expressions for biometric recognition Marcos Faundez-Zanuy, Joan Fabregas Escola Universitària Politècnica de Mataró (Barcelona.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Big data classification using neural network
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Machine Learning Basics
Pawan Lingras and Cory Butz
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Boosting Nearest-Neighbor Classifier for Character Recognition
K Nearest Neighbor Classification
Feature Selection To avid “curse of dimensionality”
Object Modeling with Layers
SEG 4630 E-Commerce Data Mining — Final Review —
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Principal Component Analysis
Concave Minimization for Support Vector Machine Classifiers
Topological Signatures For Fast Mobility Analysis
Review of Statistical Pattern Recognition
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis

Why Dimensionality Reduction? It becomes more difficult to extract meaningful conclusions from a data set as data dimensionality increases D. L. Donoho  Curse of dimensionality  The number of training needed grow exponentially with the number of features  Peaking phenomena  Performance of a classifier degraded if sample size/# of feature is small  error cannot be reliably estimated when sample size/# of features is small

High dimensionality breakdown k- Nearest Neighbors Algorithm  100-dimension  (0.001) (1/100) =0.94≈1  All points spread to the surface of the high- dimensional structure so that nearest neighbor does not exits Assume 5000 points uniformly distributed in the unit sphere and we will select 5 nearest neighbors.  1-Dimension  5/5000=0.001 distance  2-Dimension  Must √0.001=0.03 circle area to get 5 neighbors

Advantages vs. Disadvantages  Simplify the pattern representation and the classifiers  Faster classifier with less memory consumption  Alleviate curse of dimensionality with limited data sample  Loss information  Increased error in the resulting recognition system AdvantagesDisadvantages

Feature Selection & Extraction  Transform the data in high-dimensional to fewer dimensional space  Dataset {x (1), x (2),…, x (m) } where x (i) C R n to {z (1), z (2),…, z (m) } where z (i) C R k, with k<=n

Solution: Dimensionality Reduction  Determine the appropriate subspace of dimensionality k from the original d- dimensional space, where k≤d  Given a set of d features, select a subset of size k that minimized the classification error. Feature ExtractionFeature Selection

Feature Extraction Methods Picture by Anil K. Jain etc.

Feature Selection Method Picture by Anil K. Jain etc.

Principal Components Analysis(PCA)  What is PCA?  A statistical method to find patterns in data  What are the advantages?  Highlight similarities and differences in data  Reduce dimensions without much information loss  How does it work?  Reduce from n-dimension to k-dimension with k<n  Example in R 2  Example in R 3

Data Redundancy Example Correlation between x1 and x2=1. For any highly correlatedx 1, x2, information is redundant Vector z1 Original Picture by Andrew Ng

Method for PCA using 2-D example  Step 1. Data Set (mxn) Lindsay I Smith

Find the vector that best fits the data Data was represented in x-y frame Can be Transformed to frame of eigenvectors x y Lindsay I Smith

PCA 2-dimension example  Goal: find a direction vector u in R 2 onto which to project all the data so as to minimize the error (distance from data points to the chosen line) Andrew Ng’s machine learning course lecture

PCA vs Linear Regression PCALinear Regression By Andrew Ng

 Step 2. Subtract the mean Method for PCA using 2-D example Lindsay I Smith

Method for PCA using 2-D example  Step 3. Calculate the covariance matrix  Step 4. Eigenvalues and unit eigenvectors of the covariance matrix  [U,S,V]=svd(sigma) or eig(sigma) in Matlab

Method for PCA using 2-D example  Step 5. Choosing and forming feature vector  Order the eigenvectors by eigenvalues from highest to lowest  Most Significant: highest eigenvalue  Choose k vectors from n vectors: reduced dimension Lose some but not much information  Most Significant: highest eigenvalue k In Matlab: Ureduce=U(:,1:k) extract first k vectors

Step 6. Deriving new data set  Transposed feature vector (k by n)  Mean-adjusted vector (n by m) RowFeatureVector RowDataAdjust eig1 eig2 eig3 … eig k col1 col2 col3…colm (kxm) Most significant vector

Transformed Data Visualization I eigvector1 eigvector2 Lindsay I Smith

Transformed Data Visualization II x y eigenvector1 Lindsay I Smith

3-D Example By Andrew Ng

Sources  “Statistical Pattern Recognition: A Review”  Jain, Anil. K; Duin, Robert. P.W.; Mao, Jianchang (2000). “Statistical pattern recognition: a review”. IEEE Transtactions on Pattern Analysis and Machine Intelligence 22 (1): 4-37  “Machine Learning” Online Course  Andrew Ng  Page.php?course=MachineLearning Page.php?course=MachineLearning  Smith, Lindsay I. "A tutorial on principal components analysis." Cornell University, USA 51 (2002): 52.

Lydia Song Thank you!