Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University.

Slides:

Advertisements

Similar presentations

Coherent Laplacian 3D protrusion segmentation Oxford Brookes Vision Group Queen Mary, University of London, 11/12/2009 Fabio Cuzzolin.

Advertisements

Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics

Partitional Algorithms to Detect Complex Clusters

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems (JMLR 2006) Tijl De Bie and Nello Cristianini Presented by.

A Geometric Perspective on Machine Learning 何晓飞浙江大学计算机学院 1.

1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction Keywords: Dimensionality reduction, manifold learning, subspace learning,

SVM—Support Vector Machines

AGE ESTIMATION: A CLASSIFICATION PROBLEM HANDE ALEMDAR, BERNA ALTINEL, NEŞE ALYÜZ, SERHAN DANİŞ.

A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui

Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.

Data Visualization STAT 890, STAT 442, CM 462

Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.

Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.

Principal Component Analysis

Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.

Proceedings of the 2007 SIAM International Conference on Data Mining.

Sufficient Dimensionality Reduction with Irrelevance Statistics Amir Globerson 1 Gal Chechik 2 Naftali Tishby 1 1 Center for Neural Computation and School.

Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

Structure Preserving Embedding Blake Shaw, Tony Jebara ICML 2009 (Best Student Paper nominee) Presented by Feng Chen.

1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳劉冠成韓仁智

Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University

Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.

IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,

Transductive Regression Piloted by Inter-Manifold Relations.

Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.

Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.

GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.

CMU at TDT 2004 — Novelty Detection Jian Zhang and Yiming Yang Carnegie Mellon University.

Manifold learning: MDS and Isomap

Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Nonlinear Dimensionality Reduction Approach (ISOMAP)

Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.

Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.

An Efficient Greedy Method for Unsupervised Feature Selection

Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &

Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.

Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA.

June 25-29, 2006ICML2006, Pittsburgh, USA Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama Tokyo Institute of.

2D-LDA: A statistical linear discriminant analysis for image matrix

Ultra-high dimensional feature selection Yun Li

A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.

Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Spectral Methods for Dimensionality

CSCE822 Data Mining and Warehousing

Intrinsic Data Geometry from a Training Set

Shuang Hong Yang College of Computing, Georgia Tech, USA Hongyuan Zha

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Dimensionality reduction

Principal Component Analysis (PCA)

COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION Fabricio Aparecido Breve, Daniel Carlos Guimarães Pedronette State University.

Machine Learning Dimensionality Reduction

Jianping Fan Dept of CS UNC-Charlotte

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

Boosting Nearest-Neighbor Classifier for Character Recognition

Learning with information of features

Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.

Nonlinear Dimension Reduction:

Using Manifold Structure for Partially Labeled Classification

Presentation transcript:

Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University

Problem: High-dimension Data  Text document  Image  Video  Gene Expression  Financial  Sensor  …

Problem: High-dimension Data  Text document  Image  Video  Gene Expression  Financial  Sensor  …

Solution: Feature Selection Reduce the dimensionality by finding a relevant feature subset

Feature Selection Techniques  Supervised  Fisher score  Information gain  Unsupervised (discussed here)  Max variance  Laplacian Score, NIPS 2005  Q-alpha, JMLR 2005  MCFS, KDD 2010 (Our Algorithm)  …

Outline  Problem setting  Multi-Cluster Feature Selection (MCFS) Algorithm  Experimental Validation  Conclusion

Problem setting  Unsupervised Multi clusters/classes Feature Selection  How traditional score-ranking methods fail:

Multi-Cluster Feature Selection (MCFS) Algorithm  Objective  Select those features such that the multi-cluster structure of the data can be well preserved  Implementation  Spectral analysis to explorer the intrinsic structure  L1-regularized least-square to select best features

Spectral Embedding for Cluster Analysis 

 Laplacian Eigenmaps  Can unfold the data manifold and provide the flat embedding for data points  Can reflect the data distribution on each of the data clusters  Thoroughly studied and well understood

Learning Sparse Coefficient Vectors 

Feature Selection on Sparse Coefficient Vectors 

Algorithm Summary 1. Construct p-nearest neighbor graph W 2. Solve generalized eigen-problem to get K eigenvectors corresponding to the smallest eigenvalues 3. Solve K L1-regulairzed regression to get K sparse coefficient vectors 4. Compute the MCFS score for each feature 5. Select d features according to MCFS score

Complexity Analysis 

Experiments  Unsupervised feature selection for  Clustering  Nearest neighbor classification  Compared algorithms  MCFS  Q-alpha  Laplacian score  Maximum variance

Experiments (USPS Clustering)  USPS Hand Written Digits  9298 samples, 10 classes, 16x16 gray-scale image each

Experiments (COIL20 Clustering)  COIL20 image dataset  1440 samples, 20 classes, 32x32 gray-scale image each

Experiments (ORL Clustering)  ORL face dataset  400 images of 40 subjects  32x32 gray-scale images 10 Classes 20 Classes 30 Classes 40 Classes

Experiments (Isolet Clustering)  Isolet spoken letter recognition data  1560 samples, 26 classes  617 features each sample

Experiments (Nearest Neighbor Classification) 

Experiments (Parameter Selection)  Number of nearest neighbors p: stable  Number of eigenvectors: best equal to number of classes

Conclusion  MCFS  Well handle multi-class data  Outperform state-of-art algorithms  Performs especially well when number of selected features is small (< 50)

Questions ?