Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.

Slides:



Advertisements
Similar presentations
Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.
Advertisements

A CTION R ECOGNITION FROM V IDEO U SING F EATURE C OVARIANCE M ATRICES Kai Guo, Prakash Ishwar, Senior Member, IEEE, and Janusz Konrad, Fellow, IEEE.
SVM—Support Vector Machines
Generalizing Backpropagation to Include Sparse Coding David M. Bradley and Drew Bagnell Robotics Institute Carnegie.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.
Learning Data Representations with “Partial Supervision” Ariadna Quattoni.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Group Norm for Learning Latent Structural SVMs Overview Daozheng Chen (UMD, College Park), Dhruv Batra (TTI Chicago), Bill Freeman (MIT), Micah K. Johnson.
Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.
An Introduction to Support Vector Machines Martin Law.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
1 Blockwise Coordinate Descent Procedures for the Multi-task Lasso with Applications to Neural Semantic Basis Discovery ICML 2009 Han Liu, Mark Palatucci,
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Non Negative Matrix Factorization
Cs: compressed sensing
Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.
Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009.
Mathematical formulation XIAO LIYING. Mathematical formulation.
Learning With Structured Sparsity
Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
An Introduction to Support Vector Machines (M. Law)
Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.
N-best Reranking by Multitask Learning Kevin Duh Katsuhito Sudoh Hajime Tsukada Hideki Isozaki Masaaki Nagata NTT Communication Science Laboratories 2-4.
Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Boris Babenko, Steve Branson, Serge Belongie University of California, San Diego ICCV 2009, Kyoto, Japan.
Joint Tracking of Features and Edges STAN BIRCHFIELD AND SHRINIVAS PUNDLIK CLEMSON UNIVERSITY ABSTRACT LUCAS-KANADE AND HORN-SCHUNCK JOINT TRACKING OF.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Markov Random Fields & Conditional Random Fields
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Transfer Learning for Image Classification. Transfer Learning Approaches Leverage data from related tasks to improve performance:  Improve generalization.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Scalable Person Re-identification on Supervised Smoothed Manifold
Compact Bilinear Pooling
Jinbo Bi Joint work with Jiangwen Sun, Jin Lu, and Tingyang Xu
Multiplicative updates for L1-regularized regression
Intrinsic images and shape refinement
Learning Deep L0 Encoders
Boosting and Additive Trees (2)
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Jan Rupnik Jozef Stefan Institute
Group Norm for Learning Latent Structural SVMs
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Optimal sparse representations in general overcomplete bases
Objects as Attributes for Scene Classification
Sparse Learning Based on L2,1-norm
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Primal Sparse Max-Margin Markov Networks
An Efficient Projection for L1-∞ Regularization
Boris Babenko, Steve Branson, Serge Belongie
Presentation transcript:

Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Goal :  Efficient training of jointly sparse models in high dimensional spaces. Joint Sparsity Why? :  Learn from fewer examples.  Build more efficient classifiers.  Interpretability.

3 ChurchAirport Grocery StoreFlower-Shop

4 ChurchAirport Grocery StoreFlower-Shop

5 ChurchAirport Grocery StoreFlower-Shop

l 1,∞ Regularization  How do we promote joint (i.e. row) sparsity ? Coefficients for feature 2 Coefficients for task 2 An l 1 norm on the maximum absolute values of the coefficients across tasks promotes sparsity. Use few features The l ∞ norm on each row promotes non-sparsity on each row. Share parameters

Contributions  An efficient projected gradient method for l 1,∞ regularization  Our projection works on O(n log n) time, same cost as l 1 projection  Experiments in Multitask image classification problems  We can discover jointly sparse solutions  l 1,∞ regularization leads to better performance than l 2 and l 1 regularization

Multitask Application Joint Sparse Approximation Collection of Tasks

l 1,∞ Regularization: Constrained Convex Optimization Formulation  We use a Projected SubGradient method. Main advantages: simple, scalable, guaranteed convergence rates. A convex function Convex constraints  Projected SubGradient methods have been recently proposed:  l 2 regularization, i.e. SVM [Shalev-Shwartz et al. 2007]  l 1 regularization [Duchi et al. 2008]

Euclidean Projection into the l 1-∞ ball 10

Characterization of the solution 11

Characterization of the solution Feature IFeature II Feature IIIFeature IV

Mapping to a simpler problem  We can map the projection problem to the following problem which finds the optimal maximums μ:

Efficient Algorithm 14

Efficient Algorithm 15

Complexity 16  The total cost of the algorithm is dominated by sorting the entries of A.  The total cost is in the order of:

Synthetic Experiments 17  Generate a jointly sparse parameter matrix W:  For every task we generate pairs: where:  We compared three different types of regularization :  l 1,∞ projection  l 1 projection  l 2 projection

Synthetic Experiments 18

Dataset: Image Annotation 19  40 top content words  Raw image representation: Vocabulary Tree (Nister and Stewenius 2006)  dimensions president actressteam

Results 20 Most of the differences are statistically significant

Results 21

Dataset: Indoor Scene Recognition 22  67 indoor scenes.  Raw image representation: similarities to a set of unlabeled images.  2000 dimensions. bakerybar Train station

Results

Conclusions  We proposed an efficient global optimization algorithm for l 1,∞ regularization.  We presented experiments on image classification tasks and shown that our method can recover jointly sparse solutions.  A simple an efficient tool to implement an l 1,∞ penalty, similar to standard l 1 and l 2 penalties.