Foundation of High-Dimensional Data Visualization

Slides:



Advertisements
Similar presentations
Principal Component Analysis and Linear Discriminant Analysis
Advertisements

Face Recognition and Biometric Systems Eigenfaces (2)
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Dimension reduction (1)
Face Recognition By Sunny Tang.
Linear Discriminant Analysis
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Principal Component Analysis
March 15-17, 2002Work with student Jong Oh Davi Geiger, Courant Institute, NYU On-Line Handwriting Recognition Transducer device (digitizer) Input: sequence.
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Summarized by Soo-Jin Kim
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳 劉冠成 韓仁智
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Introduction to Pattern Recognition The applications of Pattern Recognition can be found everywhere. Examples include disease categorization, prediction.
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Face Recognition: An Introduction
Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Presented by Xianwang Wang Masashi Sugiyama.
IE 585 Competitive Network – Learning Vector Quantization & Counterpropagation.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Discriminant Analysis
Feature extraction using fuzzy complete linear discriminant analysis The reporter : Cui Yan
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Dimensionality reduction
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS623: Introduction to Computing with Neural Nets (lecture-16) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Feature Extraction 主講人:虞台文.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Multi-View Discriminant Analysis 多视判别分析
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Erich Smith Coleman Platt
Linear Discrimant Analysis(LDA)
LECTURE 10: DISCRIMINANT ANALYSIS
Recognition with Expression Variations
Discriminant Analysis
Figure 1.1 Rules for the contact lens data.
Classifiers Fujinaga.
Principal Component Analysis (PCA)
Principal Component Analysis and Linear Discriminant Analysis
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Techniques for studying correlation and covariance structure
Introduction PCA (Principal Component Analysis) Characteristics:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensionality Reduction
Classifiers Fujinaga.
Feature space tansformation methods
X.2 Linear Discriminant Analysis: 2-Class
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Introduction to Cluster Analysis
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Foundation of High-Dimensional Data Visualization (Clustering, Classification, and their Applications) Chaur-Chin Chen (陳朝欽) Institute of Information Systems & Applications (Department of Computer Science) National Tsing Hua University HsinChu (新竹), Taiwan (台灣) cchen@cs.nthu.edu.tw, October 16, 2013

Outline Motivation by Examples Data Description and Representation 8OX and iris Data Sets Supervised vs. Unsupervised Learning Dendrograms of Hierarchical Clustering PCA vs. LDA A Comparison of PCA and LDA Distribution of Volumes of Unit Spheres

Apple, Pineapple, Sugar Apple, Waxapple

Distinguish Starfruits (carambolas) from Bellfruits (waxapples) 1. Features(characteristics) Colors Shapes Size Tree leaves Other quantitative measurements 2. Decision rules: Classifiers 3. Performance Evaluation 4. Classification / Clustering

IRIS: Setosa, Virginica, Versicolor

Data Description 1. 8OX data set 2. IRIS data set 8: [11, 3, 2, 3, 10, 3, 2, 4] O:[ 4, 5, 2, 3, 4, 6, 3, 6] X: [11, 2,10, 3,11,4,11,3] The 8OX data set is derived from Munson’s handprinted character set. Included are 15 patterns from each of the characters ‘8’, ‘O’, ‘X’. Each pattern consists of 8 feature measurements. Setosa: [5.1, 3.5, 1.4, 0.2] Virginica: [7.0, 3.2, 4.7,1.4] Versicolor:[6.3, 3.3, 6.0, 2.5] The IRIS data set contains the measurements of three species of iris flowers, it consists of 50 patterns from each species on 4 features (sepal length, sepal width, petal length, petal width).

Supervised and Unsupervised Learning Problems ☺The problem of supervised learning can be defined as to design a function which takes the training data xi(k), i=1,2, …ni, k=1,2,…, C, as input vectors with the output as either a single category or a regression curve. ☺The unsupervised learning (Cluster Analysis) is similar to that of the supervised learning (Pattern Recognition) except that the categories are unknown in the training data.

Dendrograms of 8OX (30 patterns) and IRIS (30 paterns)

Problem Statement for PCA • Let X be an m-dimensional random vector with the covariance matrix C. The problem is to consecutively find the unit vectors a1, a2, . . . , am such that yi= xt ai with Yi = Xt ai satisfies 1. var(Y1) is the maximum. 2. var(Y2) is the maximum subject to cov(Y2, Y1)=0. 3. var(Yk) is the maximum subject to cov(Yk, Yi)=0, where k = 3, 4, · · ·,m and k > i. • Yi is called the i-th principal component • Feature extraction by PCA is called PCP

The Solutions Let (λi, ui) be the pairs of eigenvalues and eigenvectors of the covariance matrix C such that λ1 ≥ λ2 ≥ . . . ≥ λm ( ≥0 ) and ∥ui ∥2 = 1, ∀ 1 ≤ i ≤ m. Then ai = ui and var(Yi)=λi for 1 ≤ i ≤ m.

First and Second PCP for data8OX

First and Second PCP for datairis

Fundamentals of LDA Given the training patterns x1, x2, . . . , xn from K categories, where n1 + n2 + … + nK = n of m-dimensional column vectors. Let the between-class scatter matrix B, the within-class scatter matrix W, and the total scatter matrix T be defined below. 1. The sample mean vector u= (x1+x2+. . . +xn )/n 2. The mean vector of category i is denoted as ui 3. The between-class scatter matrix B= Σi=1K ni(ui − u)(ui − u)t 4. The within-class scatter matrix W= Σi=1K Σx in ωi(x-ui )(x-ui )t 5. The total scatter matrix T =Σi=1n (xi - u)(xi - u)t Then T= B+W

Fisher’s Discriminant Ratio Linear discriminant analysis for a dichotomous problem attempts to find an optimal direction w for projection which maximizes a Fisher’s discriminant ratio J(w) = The optimization problem is reduced to solving the generalized eigenvalue/eigenvector problem Bw= λ Ww by letting (n=n1n2) Similarly, for multiclass (more than 2 classes) problems, the objective is to find the first few vectors for discriminating points in different categories which is also based on optimizing J2(w) or solving Bw= λ Ww for the eigenvectors associated with few largest eigenvalues.

LDA and PCA on data8OX LDA on data8OX PCA on data8OX

LDA and PCA on datairis LDA on datairis PCA on datairis

Projection of First 3 Principal Components for data8OX

pca8OX.m fin=fopen('data8OX.txt','r'); d=8+1; N=45; % d features, N patterns fgetl(fin); fgetl(fin); fgetl(fin); % skip 3 lines A=fscanf(fin,'%f',[d N]); A=A'; % read data X=A(:,1:d-1); % remove the last columns k=3; Y=PCA(X,k); % better Matlab code X1=Y(1:15,1); Y1=Y(1:15,2); Z1=Y(1:15,1); X2=Y(16:30,1); Y2=Y(16:30,2); Z2=Y(16:30,2); X3=Y(31:45,1); Y3=Y(31:45,2); Z3=Y(31:45,3); plot3(X1,Y1,Z1,'d',X2,Y2,Z2,'O',X3,Y3,Z3,'X', 'markersize',12); grid axis([4 24, -2 18, -10,25]); legend('8','O','X') title('First Three Principal Component Projection for 8OX Data‘)

PCA.m % Script file: PCA.m % Find the first K Principal Components of data X % X contains n pattern vectors with d features function Y=PCA(X,K) [n,d]=size(X); C=cov(X); [U D]=eig(C); L=diag(D); [sorted index]=sort(L,'descend'); Xproj=zeros(d,K); % initiate a projection matrix for j=1:K Xproj(:,j)=U(:,index(j)); end Y=X*Xproj; % first K principal components