Download presentation
Presentation is loading. Please wait.
Published byReynard Washington Modified over 9 years ago
1
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL
2
Goal : Efficient training of jointly sparse models in high dimensional spaces. Joint Sparsity Why? : Learn from fewer examples. Build more efficient classifiers. Interpretability.
3
3 ChurchAirport Grocery StoreFlower-Shop
4
4 ChurchAirport Grocery StoreFlower-Shop
5
5 ChurchAirport Grocery StoreFlower-Shop
6
l 1,∞ Regularization How do we promote joint (i.e. row) sparsity ? Coefficients for feature 2 Coefficients for task 2 An l 1 norm on the maximum absolute values of the coefficients across tasks promotes sparsity. Use few features The l ∞ norm on each row promotes non-sparsity on each row. Share parameters
7
Contributions An efficient projected gradient method for l 1,∞ regularization Our projection works on O(n log n) time, same cost as l 1 projection Experiments in Multitask image classification problems We can discover jointly sparse solutions l 1,∞ regularization leads to better performance than l 2 and l 1 regularization
8
Multitask Application Joint Sparse Approximation Collection of Tasks
9
l 1,∞ Regularization: Constrained Convex Optimization Formulation We use a Projected SubGradient method. Main advantages: simple, scalable, guaranteed convergence rates. A convex function Convex constraints Projected SubGradient methods have been recently proposed: l 2 regularization, i.e. SVM [Shalev-Shwartz et al. 2007] l 1 regularization [Duchi et al. 2008]
10
Euclidean Projection into the l 1-∞ ball 10
11
Characterization of the solution 11
12
Characterization of the solution Feature IFeature II Feature IIIFeature IV
13
Mapping to a simpler problem We can map the projection problem to the following problem which finds the optimal maximums μ:
14
Efficient Algorithm 14
15
Efficient Algorithm 15
16
Complexity 16 The total cost of the algorithm is dominated by sorting the entries of A. The total cost is in the order of:
17
Synthetic Experiments 17 Generate a jointly sparse parameter matrix W: For every task we generate pairs: where: We compared three different types of regularization : l 1,∞ projection l 1 projection l 2 projection
18
Synthetic Experiments 18
19
Dataset: Image Annotation 19 40 top content words Raw image representation: Vocabulary Tree (Nister and Stewenius 2006) 11000 dimensions president actressteam
20
Results 20 Most of the differences are statistically significant
21
Results 21
22
Dataset: Indoor Scene Recognition 22 67 indoor scenes. Raw image representation: similarities to a set of unlabeled images. 2000 dimensions. bakerybar Train station
23
Results
24
Conclusions We proposed an efficient global optimization algorithm for l 1,∞ regularization. We presented experiments on image classification tasks and shown that our method can recover jointly sparse solutions. A simple an efficient tool to implement an l 1,∞ penalty, similar to standard l 1 and l 2 penalties.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.