Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction Keywords: Dimensionality reduction, manifold learning, subspace learning,
FODAVA-Lead: Dimension Reduction and Data Reduction: Foundations for Visualization Haesun Park Division of Computational Science and Engineering College.
Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Principal Component Analysis
Dimensionality Reduction
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Singular Value Decomposition
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, and Weiguo Fan et.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
3D Geometry for Computer Graphics
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Visual Analytics for Interactive Exploration of Large-Scale Documents via Nonnegative Matrix Factorization Jaegul Choo*, Barry L. Drake †, and Haesun Park*
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Enhancing Tensor Subspace Learning by Element Rearrangement
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
This week: overview on pattern recognition (related to machine learning)
1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳 劉冠成 韓仁智
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Non Negative Matrix Factorization
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
 Karthik Gurumoorthy  Ajit Rajwade  Arunava Banerjee  Anand Rangarajan Department of CISE University of Florida 1.
FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and.
Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.
FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
SVD: Singular Value Decomposition
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
1 Robust Nonnegative Matrix Factorization Yining Zhang
1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh
Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.
Non-Linear Dimensionality Reduction
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Non-negative Matrix Factorization
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
2D-LDA: A statistical linear discriminant analysis for image matrix
Unsupervised Learning II Feature Extraction
Sparse nonnegative matrix factorization for protein sequence motifs information discovery Presented by Wooyoung Kim Computer Science, Georgia State University.
CSE 554 Lecture 8: Alignment
Nonlinear Dimensionality Reduction
Semi-Supervised Clustering
Shuang Hong Yang College of Computing, Georgia Tech, USA Hongyuan Zha
School of Computer Science & Engineering
Machine Learning Dimensionality Reduction
LSI, SVD and Data Management
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
PCA vs ICA vs LDA.
Singular Value Decomposition
A Hybrid PCA-LDA Model for Dimension Reduction Nan Zhao1, Washington Mio2 and Xiuwen Liu1 1Department of Computer Science, 2Department of Mathematics Florida.
Learning with information of features
Dimension Reduction for Under-sampled High Dimensional Data
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
Feature space tansformation methods
Facial Recognition as a Pattern Recognition Problem
Non-Negative Matrix Factorization
CAMCOS Report Day December 9th, 2015 San Jose State University
Presentation transcript:

Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology Atlanta, GA Joint work w/ Barry Drake, Peg Howland, Hyunsoo Kim, and Cheonghee Park DIMACS, May, 2007

Dimension Reduction Dimension Reduction for Clustered Data: Linear Discriminant Analysis (LDA) Generalized LDA (LDA/GSVD, regularized LDA) Orthogonal Centroid Method (OCM) Dimension Reduction for Nonnegative Data: Nonnegative Matrix Factorization (NMF) Applications: Text classification, Face recognition, Fingerprint classification, Gene clustering in Microarray Analysis …

2D Representation Utilize Cluster Structure if Known 2D representation of 150x1000 data with 7 clusters: LDA vs. SVD

A = [a 1  a n ]  mxn, clustered data N i = items in class i, | N i | = n i, total r classes c i = centroid, c = global centroid S b = ∑ 1≤ i≤ r ∑ j ∈ N i (c i – c) (c i – c) T S w = ∑ 1≤ i≤ r ∑ j ∈ N i (a j – c i ) (a j – c i ) T S t = ∑ 1≤ i≤ n (a i – c ) (a i – c ) T, Dimension Reduction for Clustered Data Measure for Cluster Quality S w + S b = S t

Optimal Dimension Reducing Transformation High quality clusters have small trace(S w ) & large trace(S b ) Want: G s.t. min trace(G T S w G) & max trace(G T S b G) max trace ((G T S w G) -1 (G T S b G))  LDA (Fisher 36, Rao 48) max trace (G T S b G)  Orthogonal Centroid (Park et al. 03) max trace (G T (S w +S b )G)  PCA (Pearson 1901, Hotelling 33) max trace (G T A A T G)  LSI (Deerwester et al. 90) G T y  qx1, q << m G T  qxm y  mx1 G T G=I

Classical LDA (Fisher ’36, Rao ‘48) max trace ((G T S w G) -1 (G T S b G)) G : leading (r-1) e.vectors of S w -1 S b Fails when m>n (undersampled), S w singular SwSw HwHw HwTHwT = x S b =H b H b T, H b  n 1 (c   -c), …,  n r  (c r - c) ] : mxr S w =H w H w T, H w =[a 1 -c 1, a 2 -c 1, …, a n -c r ] : mxn

LDA based on GSVD (LDA/GSVD) (Howland, Jeon, Park, SIMAX03, Howland and Park, IEEE TPAMI 04) S w -1 S b x = x  S b x= S w x   2 H b H b T x =  2 H w H w T x U T H b T X V T H w T X  b 0) =  w 0) = 0 0 X T H b H b T X = X T S b X and X T H w H w T X = X T S w X Classical LDA is a special case of LDA/GSVD X T S b X = I DbDb 0 0 X T S w X = 0 Dw Dw I 0

Generalization of LDA for Undersampled Problems  Regularized LDA (Friedman ’89, Zhao et al. ’99 … )  LDA/GSVD : Solution G = [ X 1 X 2 ] (Howland, Jeon, Park ’03)  Solutions based on Null(S w ) and Range(S b )… (Chen et al. ’00, Yu & Yang ’01, Park & Park ’03 …)  Two-stage methods: Face Recognition: PCA + LDA (Swets & Weng ’96, Zhao et al. 99 ) Information Retrieval: LSI + LDA (Torkkola ’01) Mathematical Equivalence: (Howland and Park ’03) PCA+ LDA/GSVD = LDA/GSVD LSI +LDA/GSVD = LDA/GSVD More efficient = QRD + LDA/GSVD

QRD Preprocessing in Dim. Reduction (Distance Preserving Dim. Redution) AQ1Q1 R For undersampled data A:mxn, m>>n = Q 1 : orthonormal basis for span(A) Dimension reduction of A by Q 1 T, Q 1 T A = R: nxn Q 1 T preserves distance of L 2 norm: || a i || 2 = || Q 1 T a i || 2 || a i - a j || 2 = || Q 1 T (a i - a j )|| 2 In cos distance: cos(a i, a j ) = cos(Q 1 T a i, Q 1 T a j ) Q1Q1 Q2Q2 = R 0 Applicable to PCA, LDA, LDA/GSVD, Isomap, LTSA, LLE, …

Data Dim.# r LDA/GSVDregLDA (LDA) QR+LDA/GSVDQR+LDA/regGSVD Text 5896 x Yale x AT&T x Feret 3000 x OptDigit 64 x Isolet 617 x Speed Up with QRD Preprocessing (computation time)

ClassificationMedline Data (1250 items, 5 Clusters) Methods FullOCMLDA/GSVD Dim centroid (L 2 ) centroid (Cosine) nn (L 2 ) nn (Cosine) nn(L 2 ) nn (Cosine) SVM Text Classification with Dim. Reduction Classification accuracy (%) Similarity measures: L 2 norm and Cosine (Kim, Howland, Park, JMLR03) Reuters Data (9579 items, 90 Clusters) FullOCM

Dim. Red. Method Dim kNN k=1 k=5 k=9 Full Space LDA/GSVD (90) Regularized LDA  (85) Proj. to null (S w ) (84) ( Chen et al., ’00) Transf. to range(S b ) (82) ( Yu & Yang, ’01) Prediction Accuracy in %, leave-one-out ( and average of 100 random split) Yale Face Database: 243 x 320 pixels = full dimension of images/person x 15 people = 165 images After Preprocessing (avg 3x3): 8586 x 165 Face Recognition on Yale Data (C. Park and H. Park, icdm04)

Fingerprint Classification Results on NIST Fingerprint Database fingerprint images of size 512x512 By KDA/GSVD, dimension reduced from 105x105 to 4 KDA/GSVD: Nonlinear Extension of LDA/GSVD based on Kernel Functions Rejection rate(%) KDA/GSVD kNN & NN Jain et al., SVM Yao et al., (C. Park and H. Park, Pattern Recognition, 2005)

Nonnegativity Preserving Dim. Reduction Nonnegative Matrix Factorization (Paatero&Tappa 94, Lee&Seung NATURE 99, Pauca et al. SIAM DM 04, Hoyer 04, Lin 05, Berry 06, Kim and Park 06, …) AW H Given A:mxn with A>=0 and k << min (m,n), find W:mxk and H:kxn with W>=0 and H>=0 s.t. ~=~=  min || A – WH || F NMF/ANLS: Two-block Coordinate Descent Method in Bound-constrained Opt. Iterate the following ANLS ( Kim and Park, Bioinformatics, to appear ) : fixing W, solve min H>=0 || W H –A|| F fixing H, solve min W>=0 || H T W T –A T || F Any limit point is a stationary point (Grippo and Siandrone 00)

Nonnegativity Constraints? Better Approximation vs. Better Representation/Interpretation Given A : m x n and k < min(m, n)  SVD: Best Approximation  min ||A – W H|| F, A = U  V T, A  U k    V k T  NMF: Better Representation/Interpretation?  min ||A – W H|| F, W>=0, H>=0 ? Nonnegative Constraints are physically meaningful Pixels in digital image, Molecule concentration in bioinformatics Signal Intensities, Visualization…. Interpretation of analysis results: nonsubtractive combinations of nonnegative basis vectors

Performance of NMF Algorithms The relative residuals vs. the number of iterations for NMF/ANLS, NMF/MUR, and NMF/ALS on a zero residual artificial problem A:200x50

Recovery of Factors by SVD and NMF A: 2500x28, W:2500x3, H:3x28 where A=W*H Recovery of the factors W and H by SVD and NMF/ANLS

Summary Effective Algorithms for Dimension Reduction and Matrix Decompositions that exploits prior knowledge Design of New Algorithms: e.g. for undersampled data Take Advantage of Prior Knowledge for Physically More Meaningful Modeling Storage and Efficiency Issues for Massive Scale Data Adaptive Algorithms * Applicable to a wide range of problems (Text classification, Facial recognition, Fingerprint classification, Gene class discovery in Microarray data, Protein secondary structure prediction … ) Thank you!