IIIT Hyderabad Multimodal Semantic Indexing for Image Retrieval P. L. Chandrika Advisors: Dr. C. V. Jawahar Centre for Visual Information Technology, IIIT-

Slides:

Advertisements

Similar presentations

Google News Personalization: Scalable Online Collaborative Filtering

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Topic models Source: Topic models, David Blei, MLSS 09.

Music Recommendation by Unified Hypergraph: Music Recommendation by Unified Hypergraph: Combining Social Media Information and Music Content Jiajun Bu,

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.

Three things everyone should know to improve object retrieval

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Query Specific Fusion for Image Retrieval

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Special Topic on Image Retrieval Local Feature Matching Verification.

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Hinrich Schütze and Christina Lioma

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.

Object retrieval with large vocabularies and fast spatial matching

Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)

WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.

Scalable Text Mining with Sparse Generative Models

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Indexing Techniques Mei-Chen Yeh.

Image Annotation and Feature Extraction

Introduction to Machine Learning for Information Retrieval Xiaolong Wang.

Near Duplicate Image Detection: min-Hash and tf-idf weighting

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

IIIT Hyderabad Thesis Presentation By Raman Jain ( ) Towards Efficient Methods for Word Image Retrieval.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Category Discovery from the Web slide credit Fei-Fei et. al.

CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague.

Understanding The Semantics of Media Chapter 8 Camilo A. Celis.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

SINGULAR VALUE DECOMPOSITION (SVD)

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

IIIT Hyderabad Efficient Image Retrieval Methods For Large Scale Dynamic Image Databases Suman Karthik Advisor: Dr. C.V.Jawahar.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret

Latent Dirichlet Allocation

Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:

Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento.

Bayesian Networks in Document Clustering Slawomir Wierzchon, Mieczyslaw Klopotek Michal Draminski Krzysztof Ciesielski Mariusz Kujawiak Institute of Computer.

IIIT Hyderabad Learning in Large Scale Image Retrieval Systems Under the guidance of: Dr. C. V. Jawahar & Dr. Vikram Pudi by Pradhee Tandon Roll No

Cross-modal Hashing Through Ranking Subspace Learning

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta

Document Clustering Based on Non-negative Matrix Factorization

The topic discovery models

Capturing, Processing and Experiencing Indian Monuments

Nonparametric Semantic Segmentation

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

The topic discovery models

Image Segmentation Techniques

The topic discovery models

Ying Dai Faculty of software and information science,

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Presentation transcript:

IIIT Hyderabad Multimodal Semantic Indexing for Image Retrieval P. L. Chandrika Advisors: Dr. C. V. Jawahar Centre for Visual Information Technology, IIIT- Hyderabad

IIIT Hyderabad Problem Setting Rose Petals Red Green Bud Gift Love Flower Words *J Sivic & Zisserman,2003; Nister & Henrik,2006; Philbin,Sivic,Zisserman et la,2008; Semantics Not Captured

IIIT Hyderabad Contribution Latent Semantic Indexing(LSI) is extended to Multi-modal LSI. pLSA (probabilistic Latent Semantic Analysis) is extended to Multi-modal pLSA. Extending Bipartite Graph Model to Tripartite Graph Model. A graph partitioning algorithm is reﬁned for retrieving relevant images from a tripartite graph model. Verification on data sets and comparisons.

IIIT Hyderabad Background In Latent semantic Indexing, the term document matrix is decomposed using singular value decomposition. In Probabilistic Latent Semantic Indexing, P(d), P(z|d), P(w|z) are computed used EM algorithm.

IIIT Hyderabad Semantic Indexing w d P(w|d) * Hoffman 1999; Blei, Ng & Jordan, 2004; R. Lienhart and M. Slaney,2007 Animal Flower Whippetdaffodil tulip GSD doberman rose Whippet doberman GSD daffodil tuliprose LSI, pLSA, LDA

IIIT Hyderabad Literature LSI. pLSA. Incremental pLSA. Multilayer multimodal pLSA. High space complexity due to large matrix operations. Slow, resource intensive offline processing. *R. Lienhart and M. Slaney., “Plsa on large scale image databases,” in ECCV, *H. Wu, Y. Wang, and X. Cheng, “Incremental probabilistic latent semantic analysis for automatic question recommendation,” in AMC on RSRS, *R. Lienhart, S. Romberg, and E. H¨orster, “Multilayer plsa for multimodal image retrieval,” in CIVR, 2009.

IIIT Hyderabad Tensor We represent the multi-modal data using 3 rd order tensor. Multimodal LSI Most of the current image representations either solely on visual features or on surrounding text. Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor

IIIT Hyderabad MultiModal LSI Higher Order SVD is used to capture the latent semantics. Finds correlated within the same mode and across different modes. HOSVD extension of SVD and represented as

IIIT Hyderabad HOSVD Algorithm

IIIT Hyderabad Multimodal PLSA An unobserved latent variable z is associated with the text words w t,visual words w v and the documents d. The join probability for text words, images and visual words is Assumption: Thus,

IIIT Hyderabad Multimodal PLSA The joint probabilistic model for the above generative model is given by the following: Here we capture the patterns between images, text words and visual words by using EM algorithm to determine the hidden layers connecting them.

IIIT Hyderabad Multimodal PLSA E-Step: M-Step:

IIIT Hyderabad w1 w3 w2 w5 w1 w3 w2 w5 w1 w3 w2 w5 w1 w3 w2 w5 w1 w3 w2 w5 w2 w6 w5 w4 w3 w1 Bipartite Graph Model wordsDocuments TF IDF

IIIT Hyderabad BGM w2w6w5w4w3w1 w7w8 Query Image Results : Cash Flow *Suman karthik, chandrika pulla & C.V. Jawahar, "Incremental On-line semantic Indexing for Image Retrieval in Dynamic. Databases“, Workshop on Semantic Learning and Applications, CVPR, 2008

IIIT Hyderabad Tripartite Graph Model Tensor represented as a Tripartite graph of text words, visual words and images.

IIIT Hyderabad Tripartite Graph Model The edge weights between text words with visual word are computed as: Learning edge weights to improve performance. –Sum-of-squares error and log loss. –L-BFGS for fast convergence and local minima * Wen-tan, Yih, “Learning term-weighting functions for similarity measures,” in EMNLP, 2009.

IIIT Hyderabad Offline Indexing Bipartite graph model as a special case of TGM. Reduce the computational time for retrieval. Similarity Matrix for graphs G a and G b A special case is G a = G b =G′. A and B are adjacency matrixes for G a and G b

IIIT Hyderabad Datasets University of Washington(UW) –1109 images. – manually annotated key words. Multi-label Image – 139 urban scene images. –Overlapping labels: Buildings, Flora, People and Sky. –Manually created ground truth data for 50 images. IAPR TC12 –20,000 images of natural scenes(sports and actions, landscapes, cites etc). –291 vocabulary size and 17,825 images for training. –1,980 images for testing. Corel –5000 images. –4500 for training and 500 for testing. – 260 unique words. Holiday dataset 1491 images 500 categories

IIIT Hyderabad Experimental Settings Pre-processing –Sift feature extraction. –Quantization using k-means. Performance measures : –The mean Average precision(mAP). –Time taken for semantic indexing. –Memory space used for semantic indexing.

IIIT Hyderabad BGM vs pLSA,IpLSA ModelmAPTimeSpace Probabilistic LSI s3267Mb Incremental PLSA s3356Mb BGM s57Mb * On Holiday dataset

IIIT Hyderabad BGA vs pLSA,IpLSA pLSA – Cannot scale for large databases. – Cannot update incrementally. – Latent topic initialization difficult – Space complexity high IpLSA – Cannot scale for large databases. – Cannot update new latent topics. – Latent topic initialization difficult – Space complexity high BGM+Cashflow – Efficient – Low space com plexity

IIIT Hyderabad Results DatasetsVisual-basedTag-basedPseudo single mode MMLSI UW Multilabel IAPR Corel DatasetsVisual- based Tag-basedPseudo single mode mm-pLSAOur MM- pLSA UW Multilabel IAPR Corel LSI vs MMLSI pLSA vs MMpLSA

IIIT Hyderabad TGM vs MMLSI,MMpLSA,mm-pLSA MMLSI and MMpLSA – Cannot scale for large databases. – Cannot update incrementally. – Latent topic initialization difficult – Space complexity high TGM+Cashflow – Efficient – Low space complexity mm-pLSA – Merge dictionaries with different modes. – No intraction between different modes. DatasetsMMLSIMMpLSAmm-pLSATGM- TFIDF TGM- learning UW Multilabel IAPR Corel

IIIT Hyderabad TGM vs MMLSI,MMpLSA,mm-pLSA ModelmAPTimespace MMLSI s4856Mb MMpLSA s4267Mb mm-pLSA s3812Mb TGM0.6755s168Mb TGM – Takes few milliseconds for semantic indexing. – Low space complexity

IIIT Hyderabad Conclusion MMLSI and MMpLSA –Outperforms single mode and existing multimodal. LSI, pLSA and multimodal techniques proposed. –Memory and computational intensive. TGM –Fast and effective retrieval. –Scalable. –Computationally light intensive. –Less resource intensive.

IIIT Hyderabad Future work Learning approach to determine the size of the concept space. Various methods can be explored to determine the weights in TGM. Extending the algorithms designed for Video Retrieval.

IIIT Hyderabad Related Publications Suman Karthik, Chandrika Pulla, C.V.Jawahar, "Incremental On-line semantic Indexing for Image Retrieval in Dynamic. Databases" 4th International Workshop on Semantic Learning and Applications, CVPR, Chandrika pulla, C.V.Jawahar,“Multi Modal Semantic Indexing for Image Retrieval”,In Proceedings of Conference on Image and Video Retrieval(CIVR), Chandrika pulla, Suman Karthik, C.V.Jawahar,“Effective Semantic Indexing for Image Retrieval”, In Proceedings of International Conference on Pattern Recognition(ICPR), Chandrika pulla, C.V.Jawahar,“Tripartite Graph Models for Multi Modal Image Retrieval”, In Proceedings of British Machine Vision Conference(BMVC), 2010.

IIIT Hyderabad Thank you