Image Retrieval Discussion

Name: Image Retrieval Discussion
Uploaded: 2017-12-06T07:19:44+00:00
Duration: PTM16S10
Channel: Derrick Cain
Description: Image Retrieval Discussion

Image Retrieval Discussion
Andrew Chi Brian Cristante COMP : January 27, 2015

Image Retrieval AI / Vision Problem
Systems Design / Software Engineering Problem Domain Sensory Gap: “What features should we use?” Semantic Gap: “How should we index the images and retrieve them?” Design issues Broad Narrow Complex architecture Scalability Text Attributes SIFT Intention Gap: Type of search and user Integrity of results Visual Keywords Query-Dependent? Semantic Hierarchies Centrality Measures Efficiency PageRank Specific Image Set of Images Exploratory Search

What is “Image Retrieval?”
Have a specific image stored somewhere and want to find it again Dig around in a collection for some image that suits our needs Just browsing and want guidance, helpful hints, and logical organization Query can be text or another image, or both

First Attempts Web searches: use text associated with the image, in context, on its webpage Captions Surrounding text Other metadata IBM QBIC (1995) QBIC = Query By Image Content Use low-level features to tag images with “visual keywords” Search “red,” “square shaped,” “metallic”

The Landscape of the Problem
Around the year 2000, researchers began to break down the problem: Sensory Gap: the difference between a real-world object and how it looks in an image Semantic Gap: the difference between low-level features and the actual content of the image Intention Gap: what does the user even want?

The Landscape of the Problem
This leads to our central questions: How should we represent our images? How should we index and organize our images? How should we interpret a natural-language query by the user? What are some algorithms we can use to actually retrieve an image in response to a query?

Sensory Gap Addressing the sensory gap means choosing appropriate features that give us the information we need from an image. Some information is naturally lost in creating an image. This can’t be helped. (Or can it?)

What’s a Good Feature? So we have billions of images that don’t necessarily share visual characteristics. How should we represent them to highlight their similarities and differences? There’s no clear-cut answer to this …

What’s a Good Feature? SIFT (Gradient-based) Bag of (visual) words

What’s a Good Feature? VisualRank paper:
Search by web text to narrow the number of images under consideration Want to find the most “important” image in terms of its similarity to the other images Local features can capture more subtle differences Choose SIFT features, which are rather robust (scale, rotation, illumination, but not color)

What’s a Good Feature? VisualRank paper asks: could we make the choice of features adapt to users’ queries? We’ll save this for discussion.

Semantic Gap To cross the semantic gap for retrieval, we have to make links between the features we’ve extracted and what a user would be searching for That’s why, in our concept map, we say that the semantic gap makes us think about how to index and retrieve images (represented in whatever way) Think of building a data structure and devising an algorithm to traverse that data structure

Attributes Elements of semantic significance
Farhadi et al., 2009 Elements of semantic significance Descriptive (“furry”), subcomponents (“has nose”), or discriminative (something a dog has but a cat does not)

Attributes Lie inside the semantic gap, between low-level features and the full semantic interpretation of the image. Category “Car” Semantic Gap Red, has 4 wheels, has engine Attributes Features [0, -0.5, 1.3, 1.6, 0.1, -0.2, …, 0.3] Image (255, 0, 31)

Slide credit: Behjat Siddiquie

Semantic Hierarchies Organize images in a tree of increasingly more specific categories IS-A relationships Need a large number of images for this to be non-trivial This can be used for a variety of vision tasks, including retrieval Exploratory search Finding representatives of some category Building datasets Find images that contain semantically similar objects -- but not necessarily visually similar! ImageNet ( Big crossover with NLP (WordNet)

Retrieval with Semantic Hierarchies
Semantic hierarchies and attributes can be used together for efficient retrieval methods Compute similarity (“image distance”) by comparing attributes Use the hierarchy to weight the co-occurrence of attributes That is, the hierarchy accounts for prior knowledge For you math nerds … 𝑺𝒊𝒎𝒊𝒍𝒂𝒓𝒊𝒕𝒚 𝑨, 𝑩 = 𝒊,𝒋 𝑺 𝒊𝒋 ∙𝜹 𝒊 𝑨 ∙𝜹 𝒋 (𝑩) Where: A, B are images i, j index attributes δi(A) is the “indicator function” Sij is the co-occurrence score

Retrieval with Semantic Hierarchies
Use hashing to retrieve images in sub-linear time with respect to the size of the collection (Deng, A. Berg, Fei-Fei, 2011) Highly parallelizable

Narrow domain: medical image search
Simultaneous phrase and image- based search Image retrieval: Extract low-level features (color, texture, shape) Transform features into visual keywords, annotations Compute similarity between query image and database images

Types of Centrality Degree centrality 𝑥 𝑖 = 𝑗 𝐴 𝑖𝑗
Eigenvector centrality 𝑥 𝑖 = 𝑗 𝐴 𝑖𝑗 𝑥 𝑗 Katz centrality 𝑥 𝑖 =𝛼 𝑗 𝐴 𝑖𝑗 𝑥 𝑗 +𝛽 PageRank 𝑥 𝑖 =𝛼 𝑗 𝐴 𝑖𝑗 𝑥 𝑗 𝑘 𝑗 𝑜𝑢𝑡 +𝛽 A B A and B have eigenvector centrality 0, but non-zero Katz centrality. From Networks: An Introduction, by M.E.J. Newman, 2010.

Image Rank at Web Scale 1.8 billion photos shared per day
How long would it take to just compute the similarity matrix? N = 1.8 x 109, 100 cycles per similarity, GHz (N2/2)*100 / (3*109) / 1000 / / 365 = 1.7 years O(n2) is far too slow. Source: KPCB, Internet Trends 2014

Locality-Sensitive Hashing (LSH)
Key idea: avoid computing the entire distance matrix Most pairs of images will extremely dissimilar Find a way to compare only the images that have a good chance of being similar Hashing Normally usually used to spread data uniformly. LSH does the opposite. Used for dimensionality reduction.

LSH on Sets (MinHash) Similarity of two sets (of features, n-grams, etc.) Jaccard similarity: 𝐽 𝑆,𝑇 = 𝑆∩𝑇 𝑆∪𝑇 MinHash: Use a normal hash function to hash every element of both sets. Now, assign each set to the bucket denoted by the minimum (numerical) hash of any of its elements. What is the probability two sets S and T will be assigned to the same bucket?

LSH on n-Dimensional Feature Vectors

LSH for VisualRank: Algorithm
Extract local (SIFT) features from images A-D Hash features using many LSH functions of the form: ℎ 𝑎 ,𝑏 𝑉 = 𝑎 ∙𝑉+𝑏 𝑊 Features match if they hash to the same bucket in >3 tables. Images match if they share >3 matching features. Estimate similarity of matching images using #matches/#avgtotal

LSH for VisualRank: Performance
Time to compute single similarity matrix (single CPU): 1,000 images 15 minutes Large scale estimate (if your name is Google): 1,000 CPUs Top 100,000 queries Use top 1000 images for each query Less than 30 hours Specifics not published, but MapReduce is the likely platform.

MapReduce (sort of)

MapReduce (more accurate)
(split) Map (shuffle) Reduce (collect) all, 2 the, 1 world, 1 and, 1 stage, 1 all, 2 all, 2 all, 4 All the world's a stage, and all… the, 1 the, 1 the, 2 Complete works of Shakespeare and, 2 all, 2 my, 2 soul, 1 and, 1 and, 2 and, 1 And all my soul, and all my… Hist and, 4 world, 1 world, 1 and, 1 this, 1 the, 1 hand, 1 that, 1 slew, 1 stage, 1 stage, 1 And this the hand that slew… my, 1 my, 1 soul, 1 soul, 1 this, 1 this, 1

Questions Suggest a method for implementing the VisualRank LSH algorithm at a large scale. MapReduce: what are the mappers and reducers? UNC Kure/Killdevil: what would each 12-core node do? Say you are not Google  How would you approach this problem without knowing the 100,000 most likely queries beforehand?

Questions Why might you wish to use graph centrality as a ranking mechanism for image retrieval? Why might you prefer to use a semantic hierarchy instead? (Open-ended) If you were a large search engine, how might you learn and deploy query-dependent feature representations of images? Could you also leverage the information in a semantic hierarchy?

References (Survey paper) Datta, Ritendra, Dhiraj Joshi, Jia Li, and James Z. Wang. “Image Retrieval: Ideas, Influences, and Trends of the New Age.” ACM Comput. Surv. 40, no. 2 (May 2008): 5:1–5:60. doi: / Deng, Jia, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. “ImageNet: A Large-Scale Hierarchical Image Database.” In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, 248–55, doi: /CVPR Ghosh, P., S. Antani, L.R. Long, and G.R. Thoma. “Review of Medical Image Retrieval Systems and Future Directions.” In th International Symposium on Computer-Based Medical Systems (CBMS), 1–6, doi: /CBMS Kurtz, Camille, Adrien Depeursinge, Sandy Napel, Christopher F. Beaulieu, and Daniel L. Rubin. “On Combining Image-Based and Ontological Semantic Dissimilarities for Medical Image Retrieval Applications.” Medical Image Analysis 18, no. 7 (October 2014): 1082– doi: /j.media Siddiquie, B., R.S. Feris, and L.S. Davis. “Image Ranking and Retrieval Based on Multi-Attribute Queries.” In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 801–8, doi: /CVPR Zhang, Hanwang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, and Tat-Seng Chua. “Attribute-Augmented Semantic Hierarchy: Towards a Unified Framework for Content-Based Image Retrieval.” ACM Trans. Multimedia Comput. Commun. Appl. 11, no. 1s (October 2014): 21:1–21:21. doi: /

Image Retrieval Discussion

Similar presentations

Presentation on theme: "Image Retrieval Discussion"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Image Retrieval Discussion

Similar presentations

Presentation on theme: "Image Retrieval Discussion"— Presentation transcript:

Similar presentations

About project

Feedback