Download presentation
Published byRolando Boulden Modified over 9 years ago
1
Automatic Image Annotation Using Group Sparsity
Shaoting Zhang1, Junzhou Huang1, Yuchi Huang1, Yang Yu1, Hongsheng Li2, Dimitris Metaxas1 1CBIM, Rutgers University, NJ 2IDEA Lab, Lehigh University, PA
2
Introductions Goal: image annotation is to automatically assign relevant text keywords to any given image, reflecting its content. Previous methods: Topic models [Barnard, et.al., J. Mach. Learn Res.’03; Putthividhya, et.al., CVPR’10] Mixture models [Carneiro, et.al., TPAMI’07; Feng, et.al., CVPR’04] Discriminative models [Grangier, et.al., TPAMI’08; Hertz, et.al., CVPR’04] Nearest neighbor based methods [Makadia, et.al., ECCV’08; Guillaumin, et.al., ICCV’09] Add more references
3
Introductions Limitations: Our method and contributions:
Features are often preselected, yet the properties of different features and feature combinations are not well investigated in the image annotation task. Feature selection is not well investigated in this application. Our method and contributions: Use feature selection to solve annotation problem. Use clustering prior and sparsity prior to guide the selection.
4
Outline Regularization based Feature Selection Obtain Image Pairs
Annotation framework L2 norm regularization L1 norm regularization Group sparsity based regularization Obtain Image Pairs Experiments
5
Regularization based Feature Selection
Given similar/dissimilar image pair list (P1,P2) …… …… …… Note that we use absolute value for the difference. FP1 FP2 X
6
Regularization based Feature Selection
1 -1 … X w Y
7
Regularization based Feature Selection
Annotation framework Weights Similarity Testing input High similarity Training data
8
Regularization based Feature Selection
L2 regularization Robust, solvable: (XTX+λI)-1XTY No sparsity % L2 norm tries to produce small weights. However, usually it cannot push weights to zero. The intuitive explanation is that the magnitude of the slope of a quadratic function decreases when approaching zero (magnitude of slope will linearly decrease to zero when approaching zero). Thus the penalty assigned to the weight changing also decreases. Large weight has large penalty. Thus it’s not preferred. However, small weight has almost no difference with zero weight. Thus there is generally no penalty. w Histogram of weights
9
Regularization based Feature Selection
L1 regularization Convex optimization Basis pursuit, Grafting, Shooting, etc. Sparsity prior % In this case the magnitude of slope is constant (except for 0, which is not differentiable). Thus the weights will be pushed constantly towards zero. Furthermore, it’s not so sensitive for large weights compared to L2 norm. w Histogram of weights
10
Regularization based Feature Selection
RGB HSV Group sparsity[1] L2 inside the same group, L1 for different groups Benefits: removal of whole feature groups Projected-gradient[2] First we need to divide groups manually. In this case, we just naturally define RGB, HSV, etc. as different groups. Within the same group, we use L2. For different groups, we use L1. The intuition is that we either push the whole group to zero, or keep the whole group small (but nonzero). =0 ≠0 [1] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68:49–67, 2006. [2] E. Berg, M. Schmidt, M. Friedlander, and K. Murphy. Group sparsity via linear-time projection. In Technical report, TR ,
11
Outline Regularization based Feature Selection Obtain Image Pairs
Only rely on keyword similarity Also rely on feedback information Experiments
12
Obtain Image Pairs Previous method[1] solely relies on keyword similarity, which induces a lot of noise. Traditional method assumes that images sharing more than 3 keywords are similar, and images having no common keyword are dissimilar. However, similar keywords do not necessary mean that their feature distances are close. In this case (left figure), although most pairs have small distance in feature space, there are still a lot of exceptions. Combine both similar and dissimilar pairs together, it is difficult to linearly separate them using distance measurement. Furthermore, using this method, the number of dissimilar images is much larger than the one of similar images, which will bias the training. Distance histogram of similar pairs Distance histogram of all pairs [1] A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, pages 316–329, 2008.
13
Obtain Image Pairs Inspired by the relevance feedback and the expectation maximization method. k1 nearest k2 farthest (candidates of similar pairs) (candidates of dissimilar pairs) Using our method, the noises of similar image pairs (positive sample) are much less.
14
Outline Regularization based Feature Selection Obtain Image Pairs
Experiments Experimental settings Evaluation of regularization methods Evaluation of generality Some annotation results
15
Experimental Settings
Data protocols Corel5K (5k images) IAPR TC12[1] (20k images) Evaluation Average precision Average recall #keywords recalled (N+) [1] M. Grubinger, P. D. Clough, H. Muller, and T. Deselaers. The iapr tc-12 benchmark - a new evaluation resource for visual information systems
16
Experimental Settings
Features RGB, HSV, LAB Opponent rghistogram Transformed color distribution Color from Saliency[1] Haar, Gabor[2] SIFT[3], HOG[4] [1] X. Hou and L. Zhang. Saliency detection: A spectral residual approach. In CVPR, 2007. [2] A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, pages 316–329, 2008. [3] K. van de Sande, T. Gevers, and C. Snoek. Evaluating color descriptors for object and scene recognition. PAMI, 99(1),2010. [4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886–893, 2005.
17
Evaluation of Regularization Methods
Precision Recall N+ Corel5K IIAPR TC12
18
Evaluation of Generality
Weights computed from Corel5K, then applied on IAPR TC12. Precision Recall N+ λ λ λ
19
Some Annotation Results
Since we transfer 5 keywords every time (while the ground truth may only have 2-4 keywords), our precision is adversely affected. There may be redundancy in predicted keywords. However, as we will see, some keywords (not in ground truth) actually describe the image well. In other words, they are better than human annotation in some sense.
20
Conclusions and Future Work
Proposed a feature selection framework using both sparsity and clustering priors to annotate images. The sparse solution improves the scalability. Image pairs from relevance feedback perform much better. Future work Different grouping methods. Automatically find groups (dynamic group sparsity). More priors (combine with other methods). Extend this framework to object recognition.
21
Thanks for listening
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.