IIIT Hyderabad Efficient Image Retrieval Methods For Large Scale Dynamic Image Databases Suman Karthik 200407013 Advisor: Dr. C.V.Jawahar.

Slides:

Advertisements

Similar presentations

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.

Advertisements

Aggregating local image descriptors into compact codes

Three things everyone should know to improve object retrieval

Content-Based Image Retrieval

IIIT Hyderabad Multimodal Semantic Indexing for Image Retrieval P. L. Chandrika Advisors: Dr. C. V. Jawahar Centre for Visual Information Technology, IIIT-

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.

Discriminative Relevance Feedback With Virtual Textual Representation For Efficient Image Retrieval Suman Karthik and C.V.Jawahar.

Query Specific Fusion for Image Retrieval

1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Special Topic on Image Retrieval Local Feature Matching Verification.

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.

Fitting: The Hough transform

Bag of Features Approach: recent work, using geometric information.

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.

Object retrieval with large vocabularies and fast spatial matching

Lecture 28: Bag-of-words models

ACM Multimedia th Annual Conference, October , 2004

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Bag-of-features models

Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Presented by Zeehasham Rasheed

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Discriminative and generative methods for bags of features

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Chapter 5: Information Retrieval and Web Search

Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.

Object Recognition and Augmented Reality

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

CS 766: Computer Vision Computer Sciences Department, University of Wisconsin-Madison Indexing and Retrieval James Hill, Ozcan Ilikhan, Mark Lenz {jshill4,

Indexing Techniques Mei-Chen Yeh.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

CSE 185 Introduction to Computer Vision Pattern Recognition.

Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.

Multimedia Databases (MMDB)

Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture,

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Chapter 6: Information Retrieval and Web Search

IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar.

Fitting: The Hough transform

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:

A Distributed Multimedia Data Management over the Grid Kasturi Chatterjee Advisors for this Project: Dr. Shu-Ching Chen & Dr. Masoud Sadjadi Distributed.

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.

CS654: Digital Image Analysis

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

IIIT Hyderabad Learning in Large Scale Image Retrieval Systems Under the guidance of: Dr. C. V. Jawahar & Dr. Vikram Pudi by Pradhee Tandon Roll No

The topic discovery models

CS 2770: Computer Vision Feature Matching and Indexing

Learning Mid-Level Features For Recognition

Video Google: Text Retrieval Approach to Object Matching in Videos

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

The topic discovery models

Image Segmentation Techniques

The topic discovery models

Video Google: Text Retrieval Approach to Object Matching in Videos

Presentation transcript:

IIIT Hyderabad Efficient Image Retrieval Methods For Large Scale Dynamic Image Databases Suman Karthik Advisor: Dr. C.V.Jawahar

IIIT Hyderabad Cheap Imaging Hardware Plummeting Storage costs User Generated Content Images

IIIT Hyderabad Image Databases Large Scale –Millions to billions of images Dynamic –Highly dynamic in nature Number of Images on Flickr from December 2005 to November 2007 In millions

IIIT Hyderabad CBIR Content Based IR –Uses image content Pros –Good Quality –Annotation agnostic Cons –Inefficient –Not scalable shapecolortexture

IIIT Hyderabad w N d z D PLSA, Hoffman, 2001 Bag Of Words Words *J Sivic & Zisserman,2003; Nister & Henrik,2006; Philbin,Sivic,Zisserman et la,2008; Feature Extraction Vector Quantization Semantic Indexing Index Compute SIFT descriptors [Lowe’99] W D1D2D3 Inverted Index

IIIT Hyderabad Dynamic Databases Large scale New images added continuously High rate of change Nature of data not known apriori Internet Videos Images

IIIT Hyderabad Text vs Images Dynamic databases Vocabulary known Rate of change of vocabulary low Stable vocabulary Vocabulary unknown Rate of change of vocabulary high Unstable vocabulary

IIIT Hyderabad Quantization and Semantic indexing In Dynamic Databases As DB changes vocabulary is outmoded Updating vocabulary is too costly Not incremental Cannot keep up with rate of change As DB changes semantic index is invalid Updating semantic index is resource intensive Not incremental Cannot keep up with rate of change or scale

IIIT Hyderabad Dynamic Databases Internet Videos Images Dynamic Database Feature Extraction Vector Quantization Semantic Indexing Index Quantization and semantic indexing methods are a bottleneck

IIIT Hyderabad Objective 1 A. Motivation CBIR is inefficient and not scalable B. Objective Develop methods to improve efficiency and scalability of CBIR C. Contributions C 1.1 – Virtual Textual Representation C 1.2 – A new efficient indexing structure C 1.3 – Relevance feedback methods that improves performance

IIIT Hyderabad Objective 2 A. Motivation Quantization is bottleneck for BoW when dealing with dynamic image databases B. Objective Develop incremental quantization method for BoW model to successfully deal with dynamic image databases C. Contributions C 2.1 – Incremental Vector Quantization C 2.2 – Comparison of retrieval performance with existing methods C 2.3 – Comparison of incremental quantization with existing methods

IIIT Hyderabad Objective 3 A. Motivation Semantic Indexing is not scalable for BoW when dealing with dynamic image databases B. Objective Develop incremental semantic indexing method for BoW model to successfully deal with dynamic image databases C. Contributions C 3.1 – Bipartite Graph Model C 3.2 – An algorithm for semantic indexing on BGM C 3.3 – Search engines for images

IIIT Hyderabad CBIR

IIIT Hyderabad * Image retrieval: Past, present, and future, Yong Rui, Thomas S. Huang, Shih F. Chang In International Symposium on Multimedia Information Processing 1997 Literature Global image retrieval Region based image retrieval Region Based Relevance feedback Costly nearest neighbor based retrieval Spatial Indexing Relevance feedback heavily used * Blobworld: A System for Region-Based Image Indexing and Retrieval, Chad Carson, Megan Thomas, Serge Belongie, Joseph M. Hellerstein, Jitendra Malik In Third International Conference on Visual Information Systems 1999 * Region-Based Relevance Feedback In Image Retrieval, Feng Jing, Mingjing Li, Hong-jiang Zhang, Bo Zhang, Proc. IEEE International Symposium on Circuits and Systems 2002

IIIT Hyderabad Search

IIIT Hyderabad Transformation Feature Space Bins represented by strings or words Quantization Color Compactness Position

IIIT Hyderabad Virtual Textual Representation Quantization –Uniform quantization (grid) –Density based quantization(kmeans) Each cell is a string Transformation Document Image Words SegmentsText Segmentation

IIIT Hyderabad CBIR Indexing Spatial Databases Relevance feedback skews the feature space rendering spatial databases inefficient*. * Indexing for Relevance Feedback Image Retrieval, Jing Peng, Douglas R. Heisterkamp, In Proceedings of the IEEE International Conference on Image Processing (ICIP’03) details

IIIT Hyderabad Elastic Bucket Trie Null A B C A B A B B Nodes Buckets CAB CBA Overflow Split A B Query BBC Retrieved Bucket Insert

IIIT Hyderabad Relevance Feedback Query Retrieved Relevance Feedback

IIIT Hyderabad Region importance based relevance feedback KEYWORDS Relevant ImagesExtracted Words Keyword Selection Pseudo Image for next iteration Errors In Retrieval

IIIT Hyderabad Discriminative Relevance Feedback Classification is given precedence over clustering. Discriminative segments become the keywords. Non-discriminative segments are ignored. SURFERS WAVES ROSES FLOWERS

IIIT Hyderabad Discriminative Relevance Feedback KEYWORDS Relevant ImagesExtracted Words Keyword Selection Pseudo Image for next iteration Irrelevant Images No Errors In Retrieval

IIIT Hyderabad Performance Discriminative Relevance Feedback consistently out performs Region Based Importance method. High Fscore Low Fscore

IIIT Hyderabad Global image retrieval Local Image retrieval Spatial Indexing Non Spatial indexing Global relevance Feedback or No relevance feedback Region based Relevance feedback Our work Early CBIR Blobworld, (no indexing) Simplicity (no indexing)

IIIT Hyderabad Analysis Relevance feedback algorithms need to be modified to work with text. Keywords emerge with relevance feedback signifying association between key segments. EBT can be used without any modifications with discriminative relevance feedback. Advent of Bag of Words model for image retrieval

IIIT Hyderabad Quantization

IIIT Hyderabad Literature Kmeans Hierarchical Kmeans Kmeans, Soft assignment Time consuming offline quantization Representative data available apriori Quantization is not incremental * Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic, Andrew Zisserman, ICCV 2003* Scalable Recognition with a Vocabulary Tree, D. Nistér and H. Stewénius, CVPR 2006* Lost in quantization: Improving particular object retrieval in large scale image databases, James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman, CVPR 2008

IIIT Hyderabad Losses Perceptual Loss –Under quantization –Synonymy –Poor precision Binning Loss –Over quantization –Polysemy –Poor recall Quantization

IIIT Hyderabad Incremental Vector Quantization Control perceptual loss Minimize binning loss Create quality code books Data dependent Incremental in nature

IIIT Hyderabad Algorithm r L = 2 L: minimum cardinality of a cell Puts a upper bound on perceptual loss Builds quality codebooks by ignoring noise Soft BinAssignment: Minimizes binning loss

IIIT Hyderabad

An experiment Given –All possible feature points in a feature space that could be generated by natural processes. Quantize –K-means with apriori knowledge of entire data –IVQ with no apriori information. Performance –F-score –Time taken for incremental quantization Details

IIIT Hyderabad Fscore IVQ: 1115 bins Kmeans: 1000 bins IVQ outperforms Kmeans

IIIT Hyderabad Time IVQ outperforms Kmeans IVQ quantizes in 0.1 seconds IVQ time complexity is linear Kmeans takes 1000 seconds Time complexity exponential

IIIT Hyderabad Holiday Dataset Datasets Holiday dataset 1491 images 500 categories Pre-processing sift feature extraction. quantization using k-means. quantization using ivq

IIIT Hyderabad Incremental Quantization S = seconds, D = Days Batch = 100 images of 100,000 image ALOI dataset Added sequentially Datasets ALOI dataset 100,000 images 1000 batches of 100 image each Pre-processing sift feature extraction. quantization using k- means/online kmeans. quantization using IVQ

IIIT Hyderabad Analysis IVQ bins higher than Kmeans (constant perceptual loss) IVQ efficient due to local changes LSH used to accelerate IVQ Semantic indexing can improve mAP More

IIIT Hyderabad Kmeans Offline quantization Online quantization Non density based Density Based Non incremental Incremental Online Kmeans Regular Lattice IVQ (local) Adaptive Vocabulary Tree (global)

IIIT Hyderabad Semantic Indexing

IIIT Hyderabad Semantic Indexing w d P(w|d) * Hoffman 1999; Blei, Ng & Jordan, 2004; R. Lienhart and M. Slaney,2007 Animal Flower Whippetdaffodil tulip GSD doberman rose Whippet doberman GSD daffodil tuliprose LSI, pLSA, LDA Words clustered around latent topics Visual Words clustered around latent topics

IIIT Hyderabad Literature Visual pLSA Visual LDA Spatial semantic indexing High space complexity due to large matrix operations. Slow, resource intensive offline processing. * Discovering Objects and Their Location in Images, Josef Sivic, Bryan Russell, Alexei A. Efros, Andrew Zisserman, and Bill Freeman, ICCV 2005 * Image Retrieval on Large-Scale Image Databases, Eva Horster, Rainer Lienhart, Malcolm Slaney, CIVR 2007* Spatial Latent Dirichlet Allocation, X. Wang and E. Grimson, in Proceedings of Neural Information Processing Systems Conference (NIPS) 2007

IIIT Hyderabad Bipartite Graph Model Vector space model is encoded as bipartite graph of words and document. TF values retained as edge weights. IDF values retained as term weights d2 wordsDocuments Cash Flow Algorithm d1 d3 d4 d5 w1 w2 w3 w4 w5 w6 Saddam Captured Iraq Pullout Obama Elected Bush Popularity Financial Crisis subprime reforms war Iraq elections democrats TF IDF

IIIT Hyderabad Feature extraction –Local detectors, SIFT Vector quantization –K-means BGM insertion –Words, Documents –TF –IDF BGM with BoW …

IIIT Hyderabad w1w2 Query image w1w2w3w4 w5 Result : Why BGM is Superior ? Cash Flow Result : Inverted Index

IIIT Hyderabad Na ï ve vs BGM Datasets 9000 images of flickr. 9 Sports Categories 5 Animal Categories Pre-processing sift feature extraction. quantization using k-means. F-score 2*(p*r)/(p+r)

IIIT Hyderabad BGM vs pLSA, IpLSA pLSA –Cannot scale for large databases. –Cannot update incrementally. –Latent topic intialization difficult –Space complexity high IpLSA –Cannot scale for large databases. –Cannot update new latent topics. –Latent topic intialization difficult –Space complexity high BGM+Cashflow –Efficient –Low space complexity mAPTimeSpace pLSA s3267Mb IpLSA s3356Mb BGM s57Mb Number Of Concepts Known Number Of Concepts unknown mAPTimeSpace pLSA s3267Mb IpLSA s3356Mb BGM s57Mb Datasets Holiday dataset 1491 images 500 categories Pre-processing sift feature extraction. quantization using k-means.

IIIT Hyderabad Near Duplicate Retrieval Dataset: 500,000 movie frames –SIFT vectors –Kmeans quantization Indexed using text search library Ferret. –Efficient Indexing and retrieval –Effectively scalable to large data. Query frame given as query to Ferret index. Cash propagated to every node until cut-off.

IIIT Hyderabad Sample Retrieval Fastest Indian QueryRetrieval Fight Club Harry Potter

IIIT Hyderabad Analysis Low index insert time for new images –Less than 200 seconds to insert 1000 images in a million image index Marginally higher retrieval time –Due to multiple levels of graph traversal Memory usage minimal Works without concept number apriori BGM is a hybrid model –Generative –discriminative

IIIT Hyderabad Offline Semantic indexing Online Semantic indexing Generative Discriminative Non incremental Incremental BGM (generative + discriminative) PLSA BGM IDF BGM TF LDA IPLSA

IIIT Hyderabad Conclusion Efficient methods for retrieval in large scale dynamic image databases Scalability and adaptability have been addressed A step closer to real world image retrieval Features and their mixture, a long way to go

IIIT Hyderabad Future Work Quality and quantity of features Automatic feature modeling Text search engines for image search GPU based quantization methods Multiple vocabularies for image retrieval Multimodal semantic indexing with BGM

IIIT Hyderabad List of publications Suman Karthik, C.V. Jawahar, "Incremental On-line semantic Indexing for Image Retrieval in Dynamic. Databases" 4th International Workshop on Semantic Learning and Applications, CVPR, 2008, Florida Suman Karthik, C.V. Jawahar, "Analysis of Relevance Feedback in Content Based Image Retrieval", Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision (ICARCV), 2006, Singapore. Suman Karthik, C.V. Jawahar, Virtual Textual Representation for Efficient Image Retrieval. Proceedings of the 3rd International Conference on Visual Information Engineering (VIE), September 2006 in Bangalore, India. Suman Karthik, C.V. Jawahar, Effecient Region Based Indexing and Retrieval for Images with Elastic Bucket Tries, Proceedings of the International Conference on Pattern Recognition (ICPR), 2006

IIIT Hyderabad The End

IIIT Hyderabad Intuitive way of learning content Transformation Over segmentation and subsequent deduction of content through relevance feedback. Document Image Words SegmentsText Segmentation Discriminative Relevance Feedback leverages this advantage to achieve better performance than standard techniques.

IIIT Hyderabad Kmeans Pros –Simple –Efficient Cons –Computationally expensive –Representative Training Set –Sensitive to parameter K

IIIT Hyderabad A naive quantization scheme Quantization F2 F1 F3 Advantages: - High speed. No quantization overhead - As dataset size grows precision increases Disadvantages: - Not data dependent, no idea of visual concept - Information loss due to hard assignment * Suman karthik, C. V. jawahar, Virtual Textual Representation for Efficient Image Retrieval VIE 2006 * Tuytelaars, T. and Schmid, C. Vector Quantizing Feature Space with a Regular Lattice ICCV 2007

IIIT Hyderabad C 2.1 Methodology Data –1000 Random feature vectors each generated from 1000 normal distributions in a 2-d feature space. A total of 1 million feature points in the space. –100,000 Virtual images falling into 100 categories where each category image is generated by drawing random numbers from 10 normal distributions from the above data. Algorithms –Kmeans (quantized with the entire data and ideal K=1000) –IVQ –Kmeans with soft assignment Measures –F-score for retrieval performance –Time estimates for incremental quantization Back

IIIT Hyderabad Performance Back

IIIT Hyderabad Performance

IIIT Hyderabad Image Retrieval Contemporary approach –Uses textual cues Pros –Simple –Efficient Cons –Images are Subjective –Text cues unscalable –Quality Suffers Rose Petals Red Green Bud Gift Love Flower

IIIT Hyderabad Losses High Perceptual Loss High Binning Loss Optimal Quantization

IIIT Hyderabad Image retrieval as Text retrieval Can an image be indexed, queried for and retrieved as a text document? Can this become… …this????????????

IIIT Hyderabad Relevance Feedback Statistical –Delta mean algorithm –Query Point Movement –Inverse Variance –Membership Criterion Kernel Based –Parzen Windows –SVM –Kernel BDA] Entropy Based –KL divergence <<Back

IIIT Hyderabad Semantic Indexing for Images Objects and their location in images. Large Scale Image Databases Web image selection Spatial Latent Dirichlet Allocation Image auto-annotation Sivic, J. Russell, B.C. Efros, A.A. Zisserman, A. Freeman Lienhart, R. Slaney, M Keiji YanaiXianggang Wang, Eric GrimsonMonay, Florent and Gatica-Perez, Daniel, High space complexity due to large matrix operations. Slow, resource intensive offline processing.