Automatic Image Annotation Using Group Sparsity

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Junzhou Huang, Shaoting Zhang, Dimitris Metaxas CBIM, Dept. Computer Science, Rutgers University Efficient MR Image Reconstruction for Compressed MR Imaging.
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Presented by Xinyu Chang
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Leveraging Stereopsis for Saliency Analysis
Query Specific Fusion for Image Retrieval
Patch to the Future: Unsupervised Visual Prediction
Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006 Boosted Histograms for Improved Object Detection.
Yuanlu Xu Human Re-identification: A Survey.
INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Robust Object Tracking via Sparsity-based Collaborative Model
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
1 P. Arbelaez, M. Maire, C. Fowlkes, J. Malik. Contour Detection and Hierarchical image Segmentation. IEEE Trans. on PAMI, Student: Hsin-Min Cheng.
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1, Lehigh University.
Large-Scale Object Recognition with Weak Supervision
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.
Discriminative and generative methods for bags of features
Computer Vision Group, University of BonnVision Laboratory, Stanford University Abstract This paper empirically compares nine image dissimilarity measures.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Presented by Zeehasham Rasheed
1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.
On the Object Proposal Presented by Yao Lu
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Speaker: Chi-Yu Hsu Advisor: Prof. Jian-Jung Ding Leveraging Stereopsis for Saliency Analysis, CVPR 2012.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Learning Based Hierarchical Vessel Segmentation
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Multi-task Low-rank Affinity Pursuit for Image Segmentation Bin Cheng, Guangcan Liu, Jingdong Wang, Zhongyang Huang, Shuicheng Yan (ICCV’ 2011) Presented.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Locality-constrained Linear Coding for Image Classification
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Histograms of Oriented Gradients for Human Detection(HOG)
Hierarchical Matching with Side Information for Image Classification
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Week 10 Presentation Wesna LaLanne - REU Student Mahdi M. Kalayeh - Mentor.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Presented by David Lee 3/20/2006
Ultra-high dimensional feature selection Yun Li
Multi-view Synchronization of Human Actions and Dynamic Scenes Emilie Dexter, Patrick Pérez, Ivan Laptev INRIA Rennes - Bretagne Atlantique
ICCV 2007 Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1,
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
A. M. R. R. Bandara & L. Ranathunga
Presented by David Lee 3/20/2006
An Additive Latent Feature Model
Video Google: Text Retrieval Approach to Object Matching in Videos
Saliency detection Donghun Yeo CV Lab..
Object detection as supervised classification
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Outline Background Motivation Proposed Model Experimental Results
Video Google: Text Retrieval Approach to Object Matching in Videos
Actively Learning Ontology Matching via User Interaction
Presentation transcript:

Automatic Image Annotation Using Group Sparsity Shaoting Zhang1, Junzhou Huang1, Yuchi Huang1, Yang Yu1, Hongsheng Li2, Dimitris Metaxas1 1CBIM, Rutgers University, NJ 2IDEA Lab, Lehigh University, PA

Introductions Goal: image annotation is to automatically assign relevant text keywords to any given image, reflecting its content. Previous methods: Topic models [Barnard, et.al., J. Mach. Learn Res.’03; Putthividhya, et.al., CVPR’10] Mixture models [Carneiro, et.al., TPAMI’07; Feng, et.al., CVPR’04] Discriminative models [Grangier, et.al., TPAMI’08; Hertz, et.al., CVPR’04] Nearest neighbor based methods [Makadia, et.al., ECCV’08; Guillaumin, et.al., ICCV’09] Add more references

Introductions Limitations: Our method and contributions: Features are often preselected, yet the properties of different features and feature combinations are not well investigated in the image annotation task. Feature selection is not well investigated in this application. Our method and contributions: Use feature selection to solve annotation problem. Use clustering prior and sparsity prior to guide the selection.

Outline Regularization based Feature Selection Obtain Image Pairs Annotation framework L2 norm regularization L1 norm regularization Group sparsity based regularization Obtain Image Pairs Experiments

Regularization based Feature Selection Given similar/dissimilar image pair list (P1,P2) …… …… …… Note that we use absolute value for the difference. FP1 FP2 X

Regularization based Feature Selection 1 -1 … X w Y

Regularization based Feature Selection Annotation framework Weights Similarity Testing input High similarity Training data

Regularization based Feature Selection L2 regularization Robust, solvable: (XTX+λI)-1XTY No sparsity % L2 norm tries to produce small weights. However, usually it cannot push weights to zero. The intuitive explanation is that the magnitude of the slope of a quadratic function decreases when approaching zero (magnitude of slope will linearly decrease to zero when approaching zero). Thus the penalty assigned to the weight changing also decreases. Large weight has large penalty. Thus it’s not preferred. However, small weight has almost no difference with zero weight. Thus there is generally no penalty. w Histogram of weights

Regularization based Feature Selection L1 regularization Convex optimization Basis pursuit, Grafting, Shooting, etc. Sparsity prior % In this case the magnitude of slope is constant (except for 0, which is not differentiable). Thus the weights will be pushed constantly towards zero. Furthermore, it’s not so sensitive for large weights compared to L2 norm. w Histogram of weights

Regularization based Feature Selection RGB HSV Group sparsity[1] L2 inside the same group, L1 for different groups Benefits: removal of whole feature groups Projected-gradient[2] First we need to divide groups manually. In this case, we just naturally define RGB, HSV, etc. as different groups. Within the same group, we use L2. For different groups, we use L1. The intuition is that we either push the whole group to zero, or keep the whole group small (but nonzero). =0 ≠0 [1] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68:49–67, 2006. [2] E. Berg, M. Schmidt, M. Friedlander, and K. Murphy. Group sparsity via linear-time projection. In Technical report, TR-2008-09, 2008. http://www.cs.ubc.ca/~murphyk/Software/L1CRF/index.html

Outline Regularization based Feature Selection Obtain Image Pairs Only rely on keyword similarity Also rely on feedback information Experiments

Obtain Image Pairs Previous method[1] solely relies on keyword similarity, which induces a lot of noise. Traditional method assumes that images sharing more than 3 keywords are similar, and images having no common keyword are dissimilar. However, similar keywords do not necessary mean that their feature distances are close. In this case (left figure), although most pairs have small distance in feature space, there are still a lot of exceptions. Combine both similar and dissimilar pairs together, it is difficult to linearly separate them using distance measurement. Furthermore, using this method, the number of dissimilar images is much larger than the one of similar images, which will bias the training. Distance histogram of similar pairs Distance histogram of all pairs [1] A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, pages 316–329, 2008.

Obtain Image Pairs Inspired by the relevance feedback and the expectation maximization method. k1 nearest k2 farthest (candidates of similar pairs) (candidates of dissimilar pairs) Using our method, the noises of similar image pairs (positive sample) are much less.

Outline Regularization based Feature Selection Obtain Image Pairs Experiments Experimental settings Evaluation of regularization methods Evaluation of generality Some annotation results

Experimental Settings Data protocols Corel5K (5k images) IAPR TC12[1] (20k images) Evaluation Average precision Average recall #keywords recalled (N+) [1] M. Grubinger, P. D. Clough, H. Muller, and T. Deselaers. The iapr tc-12 benchmark - a new evaluation resource for visual information systems. 2006.

Experimental Settings Features RGB, HSV, LAB Opponent rghistogram Transformed color distribution Color from Saliency[1] Haar, Gabor[2] SIFT[3], HOG[4] [1] X. Hou and L. Zhang. Saliency detection: A spectral residual approach. In CVPR, 2007. [2] A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, pages 316–329, 2008. [3] K. van de Sande, T. Gevers, and C. Snoek. Evaluating color descriptors for object and scene recognition. PAMI, 99(1),2010. [4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886–893, 2005.

Evaluation of Regularization Methods Precision Recall N+ Corel5K IIAPR TC12

Evaluation of Generality Weights computed from Corel5K, then applied on IAPR TC12. Precision Recall N+ λ λ λ

Some Annotation Results Since we transfer 5 keywords every time (while the ground truth may only have 2-4 keywords), our precision is adversely affected. There may be redundancy in predicted keywords. However, as we will see, some keywords (not in ground truth) actually describe the image well. In other words, they are better than human annotation in some sense.

Conclusions and Future Work Proposed a feature selection framework using both sparsity and clustering priors to annotate images. The sparse solution improves the scalability. Image pairs from relevance feedback perform much better. Future work Different grouping methods. Automatically find groups (dynamic group sparsity). More priors (combine with other methods). Extend this framework to object recognition.

Thanks for listening