June 25-29, 2006ICML2006, Pittsburgh, USA Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama Tokyo Institute of.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
An Introduction of Support Vector Machine
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction Keywords: Dimensionality reduction, manifold learning, subspace learning,
1 A Survey on Distance Metric Learning (Part 1) Gerry Tesauro IBM T.J.Watson Research Center.
A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Principal Component Analysis
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
Distance Metric Learning: A Comprehensive Survey
Proceedings of the 2007 SIAM International Conference on Data Mining.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Comparing Kernel-based Learning Methods for Face Recognition Zhiguo Li
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Distance Metric Learning in Data Mining
An Introduction to Support Vector Machines Martin Law.
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
Aug. 27, 2003IFAC-SYSID2003 Functional Analytic Framework for Model Selection Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Fraunhofer FIRST-IDA,
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Presented by Xianwang Wang Masashi Sugiyama.
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Manifold learning: MDS and Isomap
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University.
Feature extraction using fuzzy complete linear discriminant analysis The reporter : Cui Yan
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Dimensionality reduction
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
2D-LDA: A statistical linear discriminant analysis for image matrix
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Spectral Methods for Dimensionality
Shuang Hong Yang College of Computing, Georgia Tech, USA Hongyuan Zha
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
کاربرد نگاشت با حفظ تنکی در شناسایی چهره
Dimensionality reduction
CS 2750: Machine Learning Dimensionality Reduction
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Feature space tansformation methods
Metric Learning by Collapsing Classes
Using Manifold Structure for Partially Labeled Classification
Presentation transcript:

June 25-29, 2006ICML2006, Pittsburgh, USA Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama Tokyo Institute of Technology, Japan

2 Dimensionality Reduction High dimensional data is not easy to handle: Need to reduce dimensionality We focus on Linear dimensionality reduction: Supervised dimensionality reduction:

3 Within-Class Multimodality Medical checkup: hormone imbalance (high/low) vs. normal Digit recognition: even (0,2,4,6,8) vs. odd (1,3,5,7,9) Multi-class classification: one vs. rest One of the classes has several modes Class 2 (red)Class 1 (blue)

4 Goal of This Research We want to embed multimodal data so that Between-class separability is maximized Within-class multimodality is preserved Separable but within-class multimodality lost Separable and within-class multimodality preserved Within-class multimodality preserved but non-separable

5 Fisher Discriminant Analysis (FDA) Within-class scatter matrix: Between-class scatter matrix: FDA criterion: Within-class scatter is made small Between-class scatter is made large Fisher (1936)

6 Interpretation of FDA :Number of samples in class :Total number of samples Pairwise expressions: Samples in the same class are made close Samples in different classes are made apart

7 Examples of FDA apart close apart close FDA does not take within-class multimodality into account SimpleLabel-mixed clusterMultimodal NOTE: FDA can extract only C-1 features since :Number of classes

8 Locality Preserving Projection (LPP) Locality matrix: Affinity matrix: e.g., LPP criterion: Nearby samples in original space are made close Constraint is to avoid He & Niyogi (NIPS2003)

9 Examples of LPP LPP does not take between-class separability into account (unsupervised) close SimpleLabel-mixed clusterMultimodal

10 Our Approach Nearby samples in the same class are made close Far-apart samples in the same class are not made close Samples in different classes are made apart don’t care apart close We combine FDA and LPP

11 Local Fisher Discriminent Analysis Local within-class scatter matrix: Local between-class scatter matrix:

12 How to Obtain Solution Since LFDA has a similar form to FDA, solution can be obtained just by solving a generalized eigenvalue problem:

13 Examples of LFDA LFDA works well for all three cases! SimpleLabel-mixed clusterMultimodal Note: Usually so LFDA can extract more than C features (cf. FDA)

14 Neighborhood Component Analysis (NCA) Minimize leave-one-out error of a stochastic k-nearest neighbor classifier Obtained embedding is separable NCA involves non-convex optimization There are local optima No analytic solution available Slow iterative algorithm LFDA has analytic form of global solution Goldberger, Roweis, Hinton & Salakhutdinov (NIPS2004)

15 Maximally Collapsing Metric Learning (MCML) Idea is similar to FDA Samples in the same class are close (“one point”) Samples in different classes are apart MCML involves non-convex optimization There exists a nice convex approximation Non-global solution No analytic solution available Slow iterative algorithm Globerson & Roweis (NIPS2005)

16 Simulations Visualization of UCI data sets: Letter recognition (D=16) Segment (D=18) Thyroid disease (D=5) Iris (D=4) Extract 3 classes from original data Merge 2 classes Class 2 (red)Class 1 (blue)

17 Summary of Simulation Results Separable and multimodality preserved Separable but no multimodality Multimodality preserved but no separability LettSegmThyrIrisComments FDANo multi-modal LPPNo label-separability LFDA NCASlow, local optima MCMLSlow, no multi-modal

18 Blue vs. Red FDALPPLFDA MCMLNCA Letter Recognition

19 FDALPPLFDA MCMLNCA Blue vs. Red Segment

20 FDALPPLFDA MCMLNCA Blue vs. Red Thyroid Disease

21 FDALPPLFDA MCMLNCA Blue vs. Red Iris

22 Kernelization LFDA can be non-linearized by kernel trick FDA: Kernel FDA LPP: Laplacian eigenmap MCML: Kernel MCML NCA: not available yet? Mika et al. (NNSP1999) Belkin & Niyogi (NIPS2001) Globerson & Roweis (NIPS2005)

23 Conclusions LFDA effectively combines FDA and LPP. LFDA is suitable for embedding multimodal data. Same as FDA, LFDA has analytic optimal solution thus computationally efficient. Same as LPP, LFDA needs to pre-specify affinity matrix. We used local scaling method for computing affinity, which does not include any tuning parameter. Zelnik-Manor & Perona (NIPS2004)