Download presentation
Presentation is loading. Please wait.
1
Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology Atlanta, GA Joint work w/ Barry Drake, Peg Howland, Hyunsoo Kim, and Cheonghee Park DIMACS, May, 2007
2
Dimension Reduction Dimension Reduction for Clustered Data: Linear Discriminant Analysis (LDA) Generalized LDA (LDA/GSVD, regularized LDA) Orthogonal Centroid Method (OCM) Dimension Reduction for Nonnegative Data: Nonnegative Matrix Factorization (NMF) Applications: Text classification, Face recognition, Fingerprint classification, Gene clustering in Microarray Analysis …
3
2D Representation Utilize Cluster Structure if Known 2D representation of 150x1000 data with 7 clusters: LDA vs. SVD
4
A = [a 1 a n ] mxn, clustered data N i = items in class i, | N i | = n i, total r classes c i = centroid, c = global centroid S b = ∑ 1≤ i≤ r ∑ j ∈ N i (c i – c) (c i – c) T S w = ∑ 1≤ i≤ r ∑ j ∈ N i (a j – c i ) (a j – c i ) T S t = ∑ 1≤ i≤ n (a i – c ) (a i – c ) T, Dimension Reduction for Clustered Data Measure for Cluster Quality S w + S b = S t
5
Optimal Dimension Reducing Transformation High quality clusters have small trace(S w ) & large trace(S b ) Want: G s.t. min trace(G T S w G) & max trace(G T S b G) max trace ((G T S w G) -1 (G T S b G)) LDA (Fisher 36, Rao 48) max trace (G T S b G) Orthogonal Centroid (Park et al. 03) max trace (G T (S w +S b )G) PCA (Pearson 1901, Hotelling 33) max trace (G T A A T G) LSI (Deerwester et al. 90) G T y qx1, q << m G T qxm y mx1 G T G=I
6
Classical LDA (Fisher ’36, Rao ‘48) max trace ((G T S w G) -1 (G T S b G)) G : leading (r-1) e.vectors of S w -1 S b Fails when m>n (undersampled), S w singular SwSw HwHw HwTHwT = x S b =H b H b T, H b n 1 (c -c), …, n r (c r - c) ] : mxr S w =H w H w T, H w =[a 1 -c 1, a 2 -c 1, …, a n -c r ] : mxn
7
LDA based on GSVD (LDA/GSVD) (Howland, Jeon, Park, SIMAX03, Howland and Park, IEEE TPAMI 04) S w -1 S b x = x S b x= S w x 2 H b H b T x = 2 H w H w T x U T H b T X V T H w T X b 0) = w 0) = 0 0 X T H b H b T X = X T S b X and X T H w H w T X = X T S w X Classical LDA is a special case of LDA/GSVD X T S b X = I DbDb 0 0 X T S w X = 0 Dw Dw I 0
8
Generalization of LDA for Undersampled Problems Regularized LDA (Friedman ’89, Zhao et al. ’99 … ) LDA/GSVD : Solution G = [ X 1 X 2 ] (Howland, Jeon, Park ’03) Solutions based on Null(S w ) and Range(S b )… (Chen et al. ’00, Yu & Yang ’01, Park & Park ’03 …) Two-stage methods: Face Recognition: PCA + LDA (Swets & Weng ’96, Zhao et al. 99 ) Information Retrieval: LSI + LDA (Torkkola ’01) Mathematical Equivalence: (Howland and Park ’03) PCA+ LDA/GSVD = LDA/GSVD LSI +LDA/GSVD = LDA/GSVD More efficient = QRD + LDA/GSVD
9
QRD Preprocessing in Dim. Reduction (Distance Preserving Dim. Redution) AQ1Q1 R For undersampled data A:mxn, m>>n = Q 1 : orthonormal basis for span(A) Dimension reduction of A by Q 1 T, Q 1 T A = R: nxn Q 1 T preserves distance of L 2 norm: || a i || 2 = || Q 1 T a i || 2 || a i - a j || 2 = || Q 1 T (a i - a j )|| 2 In cos distance: cos(a i, a j ) = cos(Q 1 T a i, Q 1 T a j ) Q1Q1 Q2Q2 = R 0 Applicable to PCA, LDA, LDA/GSVD, Isomap, LTSA, LLE, …
10
Data Dim.# r LDA/GSVDregLDA (LDA) QR+LDA/GSVDQR+LDA/regGSVD Text 5896 x 210748.842.20.140.03 Yale 77760 x 16515-- 0.960.22 AT&T 10304 x 40040-- 0.070.02 Feret 3000 x 1301010.99.30.030.01 OptDigit 64 x 5610108.979.600.02 Isolet 617 x 77972698.199.336.70 Speed Up with QRD Preprocessing (computation time)
11
ClassificationMedline Data (1250 items, 5 Clusters) Methods FullOCMLDA/GSVD Dim2209554 centroid (L 2 )84.8 88.9 centroid (Cosine)88.0 83.9 15nn (L 2 )83.488.289.0 15nn (Cosine)82.388.383.9 30nn(L 2 )83.988.689.0 30nn (Cosine)83.588.483.9 SVM88.988.787.2 Text Classification with Dim. Reduction Classification accuracy (%) Similarity measures: L 2 norm and Cosine (Kim, Howland, Park, JMLR03) Reuters Data (9579 items, 90 Clusters) FullOCM 1194190 78.8978.00 80.4580.46 78.6585.51 80.2186.19 87.1187.03
12
Dim. Red. Method Dim kNN k=1 k=5 k=9 Full Space 8586 79.4 76.4 72.1 LDA/GSVD 14 98.8 (90) 98.8 98.8 Regularized LDA 14 97.6 (85) 97.6 97.6 Proj. to null (S w ) 14 97.6 (84) 97.6 97.6 ( Chen et al., ’00) Transf. to range(S b ) 14 89.7 (82) 94.6 91.5 ( Yu & Yang, ’01) Prediction Accuracy in %, leave-one-out ( and average of 100 random split) Yale Face Database: 243 x 320 pixels = full dimension of 77760 11 images/person x 15 people = 165 images After Preprocessing (avg 3x3): 8586 x 165 Face Recognition on Yale Data (C. Park and H. Park, icdm04)
13
Fingerprint Classification Results on NIST Fingerprint Database 4 4000 fingerprint images of size 512x512 By KDA/GSVD, dimension reduced from 105x105 to 4 KDA/GSVD: Nonlinear Extension of LDA/GSVD based on Kernel Functions Rejection rate(%) 0 1.8 8.5 KDA/GSVD 90.7 91.3 92.8 kNN & NN Jain et al., 99 - 90.0 91.2 SVM Yao et al., 03 - 90.0 92.2 (C. Park and H. Park, Pattern Recognition, 2005)
14
Nonnegativity Preserving Dim. Reduction Nonnegative Matrix Factorization (Paatero&Tappa 94, Lee&Seung NATURE 99, Pauca et al. SIAM DM 04, Hoyer 04, Lin 05, Berry 06, Kim and Park 06, …) AW H Given A:mxn with A>=0 and k << min (m,n), find W:mxk and H:kxn with W>=0 and H>=0 s.t. ~=~= min || A – WH || F NMF/ANLS: Two-block Coordinate Descent Method in Bound-constrained Opt. Iterate the following ANLS ( Kim and Park, Bioinformatics, to appear ) : fixing W, solve min H>=0 || W H –A|| F fixing H, solve min W>=0 || H T W T –A T || F Any limit point is a stationary point (Grippo and Siandrone 00)
15
Nonnegativity Constraints? Better Approximation vs. Better Representation/Interpretation Given A : m x n and k < min(m, n) SVD: Best Approximation min ||A – W H|| F, A = U V T, A U k V k T NMF: Better Representation/Interpretation? min ||A – W H|| F, W>=0, H>=0 ? Nonnegative Constraints are physically meaningful Pixels in digital image, Molecule concentration in bioinformatics Signal Intensities, Visualization…. Interpretation of analysis results: nonsubtractive combinations of nonnegative basis vectors
16
Performance of NMF Algorithms The relative residuals vs. the number of iterations for NMF/ANLS, NMF/MUR, and NMF/ALS on a zero residual artificial problem A:200x50
17
Recovery of Factors by SVD and NMF A: 2500x28, W:2500x3, H:3x28 where A=W*H Recovery of the factors W and H by SVD and NMF/ANLS
18
Summary Effective Algorithms for Dimension Reduction and Matrix Decompositions that exploits prior knowledge Design of New Algorithms: e.g. for undersampled data Take Advantage of Prior Knowledge for Physically More Meaningful Modeling Storage and Efficiency Issues for Massive Scale Data Adaptive Algorithms * Applicable to a wide range of problems (Text classification, Facial recognition, Fingerprint classification, Gene class discovery in Microarray data, Protein secondary structure prediction … ) Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.