Sparse Learning Based on L2,1-norm

Slides:



Advertisements
Similar presentations
Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.
Advertisements

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.
Regularized risk minimization
Regularization David Kauchak CS 451 – Fall 2013.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
Manifold Sparse Beamforming
by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright
Chapter 9 Perceptrons and their generalizations. Rosenblatt ’ s perceptron Proofs of the theorem Method of stochastic approximation and sigmoid approximation.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Proximal Methods for Sparse Hierarchical Dictionary Learning Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Francis Bach Presented by Bo Chen,
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
Principal Component Analysis
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.
Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Unconstrained Optimization Problem
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
Transductive Regression Piloted by Inter-Manifold Relations.
1 Robust Nonnegative Matrix Factorization Yining Zhang
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
An Efficient Greedy Method for Unsupervised Feature Selection
R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Regularized Least-Squares and Convex Optimization.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Jinbo Bi Joint work with Tingyang Xu, Chi-Ming Chen, Jason Johannesen
Linli Xu Martha White Dale Schuurmans University of Alberta
Multiplicative updates for L1-regularized regression
Boosting and Additive Trees (2)
Unsupervised Riemannian Clustering of Probability Density Functions
Outlier Processing via L1-Principal Subspaces
کاربرد نگاشت با حفظ تنکی در شناسایی چهره
Nonnegative polynomials and applications to learning
USPACOR: Universal Sparsity-Controlling Outlier Rejection
Graph Based Multi-Modality Learning
Outline Background Motivation Proposed Model Experimental Results
Feature space tansformation methods
Generally Discriminant Analysis
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Using Manifold Structure for Partially Labeled Classification
Part 3. Linear Programming
Optimization Problem Based on L2,1-norms
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Primal Sparse Max-Margin Markov Networks
An Efficient Projection for L1-∞ Regularization
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Learned Convolutional Sparse Coding
Presentation transcript:

Sparse Learning Based on L2,1-norm Xiaohong Chen 04-13-2012

Outline Review of sparse learning Efficient and robust feature selection via joint l2,1-norm minimzation Exploiting the entire feature space with sparsity for automatic image annotation Further works

Outline Review of sparse learning Efficient and robust feature selection via joint l2,1-norm minimzation Exploiting the entire feature space with sparsity for automatic image annotation Further works

Review of Sparse Learning

Some examples LeastR LeastC GlLeastR

Shortcoming of Sparse Learning The projection matrix W is optimized one by one, and their sparsity patterns are independent, so it can’t reflect the sparsity of the original features, e.g.,

Matrix norm

Outline Review of sparse learning Efficient and robust feature selection via joint l2,1-norm minimzation Exploiting the entire feature space with sparsity for automatic image annotation Further works

Efficient and robust feature selection via joint l2,1-norm minimzation

Robust Feature Selection Based on l21-norm Given training data {x1, x2,…, xn} and the associated class labels {y1,y2,…, yn} Least square regression solves the following optimizaiton problem to obtain the projection matrix W Add a regularization R(W) to the robust version of LS,

Robust Feature Selection Based on l21-norm Possible regularizations Ridge regularization Lasso regularization Penalize all c regression coefficients corresponding to a single feature as a whole

Robust Feature Selection Based on l21-norm

Robust Feature Selection Based on l21-norm Denote (14)

Robust Feature Selection Based on l21-norm Then we have (19)

The iterative algorithm to solve problem (14) Theorem1: The algorithm will monotonically decrease the objective of the problem in Eq.(14) in each iteration, and converge to the global optimum of the problem.

Proof of theorem1 u

Proof of theorem1

(1) (2) (1)+(2)

Experimental results-1

Experimental results-2

Experimental results-3

Outline Review of sparse learning Efficient and robust feature selection via joint l2,1-norm minimzation Exploiting the entire feature space with sparsity for automatic image annotation Further works

Exploiting the entire feature space with sparsity for automatic image annotation

The illustration of image annotation

The illustration of image annotation

The illustration of image annotation

Formulation The algorithm can be generalized as the following problem Applying manifold learning and semi-supervised learning to define the loss function, then obtain the optimization problem

Formulation The definition of A and B are shown in the paper. With the Lagrange technique, we have

The SFSS Algorithm

Because Wt+1 is the minimum of

Owing to The fact that And above inequality Incorporating (19) to (18), we can get: The objective of the framework is convex, so the proposed approach converges to the global optimum.

Experimental results-1 MAP: Mean Average precision

Experimental results-2

Experimental results-3

Experimental results-4

Outline Review of sparse learning Efficient and robust feature selection via joint l2,1-norm minimzation Exploiting the entire feature space with sparsity for automatic image annotation Further works

Future works-1 Incorporate sparse learning based on L21-norm into multi-view dimensionality reduction, e.g., A risk: a degenerate solution! How to avoid?

Future works-2 (2) Incorporate the space structural information of the features to preserving the continuity of the features

Reference [1]F.Nie, D.Xu, X.Cai, and C.Ding. Efficient and robust feature selection via joint l2,1-norm minimzation. NIPS 2010. [2]Z.Ma, Y.Yang, F.Nie, J.Uijlings, and N.Sebe. Exploiting the entire feature space with sparsity for automatic image annotation. Proceedings of the 19th ACM international conference on Multimedia:283-292 [3]Y.Yang,H.Shen, Z.Ma, Z.Huang, and X.Zhou. L2,1-norm regularization discriminative feature selection for unsupervised learning. [4] DBLP:Feiping Nie http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/n/Nie:Feiping.html

Thanks! Q&A