CS 2770: Computer Vision Feature Matching and Indexing

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Three things everyone should know to improve object retrieval
Presented by Xinyu Chang
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Content-Based Image Retrieval
Recap: Bag of Words for Large Scale Retrieval Slide Credit: Nister Slide.
Part 1: Bag-of-words models by Li Fei-Fei (Princeton)
FATIH CAKIR MELIHCAN TURK F. SUKRU TORUN AHMET CAGRI SIMSEK Content-Based Image Retrieval using the Bag-of-Words Concept.
Outline SIFT Background SIFT Extraction Application in Content Based Image Search Conclusion.
Marco Cristani Teorie e Tecniche del Riconoscimento1 Teoria e Tecniche del Riconoscimento Estrazione delle feature: Bag of words Facoltà di Scienze MM.
TP14 - Indexing local features
Large-Scale Image Retrieval From Your Sketches Daniel Brooks 1,Loren Lin 2,Yijuan Lu 1 1 Department of Computer Science, Texas State University, TX, USA.
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
1 Image Retrieval Hao Jiang Computer Science Department 2009.
Image alignment Image from
Object Recognition. So what does object recognition involve?
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Robust and large-scale alignment Image from
What is Texture? Texture depicts spatially repeating patterns Many natural phenomena are textures radishesrocksyogurt.
Object retrieval with large vocabularies and fast spatial matching
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
Agenda Introduction Bag-of-words model Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition. So what does object recognition involve?
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
“Bag of Words”: recognition using texture : Advanced Machine Perception A. Efros, CMU, Spring 2006 Adopted from Fei-Fei Li, with some slides from.
Keypoint-based Recognition and Object Search
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.
Object Recognition and Augmented Reality
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
CS 766: Computer Vision Computer Sciences Department, University of Wisconsin-Madison Indexing and Retrieval James Hill, Ozcan Ilikhan, Mark Lenz {jshill4,
Indexing Techniques Mei-Chen Yeh.
Clustering with Application to Fast Object Search
Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.
CSE 473/573 Computer Vision and Image Processing (CVIP)
Special Topic on Image Retrieval
04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.
Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.
CVPR 2006 New York City Spatial Random Partition for Common Visual Pattern Discovery Junsong Yuan and Ying Wu EECS Dept. Northwestern Univ.
Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Local features: detection and description
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Prof. Alex Berg (Credits to many other folks on individual slides)
Highlighted Project 2 Implementations
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
TP12 - Local features: detection and description
Large-scale Instance Retrieval
Object Recognition and Augmented Reality
Video Google: Text Retrieval Approach to Object Matching in Videos
Large-scale Instance Retrieval
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
CS 1674: Intro to Computer Vision Scene Recognition
CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5
Video Google: Text Retrieval Approach to Object Matching in Videos
Part 1: Bag-of-words models
Prof. Adriana Kovashka University of Pittsburgh September 17, 2019
Presentation transcript:

CS 2770: Computer Vision Feature Matching and Indexing Prof. Adriana Kovashka University of Pittsburgh February 16, 2017

Plan for Today Matching features Indexing features Visual words Application to image retrieval

Matching local features ? Image 1 Image 2 To generate candidate matches, find patches that have the most similar appearance (e.g., lowest feature Euclidean distance) Simplest approach: compare query to all other features, take the closest (or closest k, or within a thresholded distance) as matches K. Grauman

Robust matching ? ? ? ? Image 1 Image 2 At what Euclidean distance value do we have a good match? To add robustness to matching, can consider ratio: distance of query to best match / distance to second best match If low, first match looks good If high, could be ambiguous match K. Grauman

Matching SIFT descriptors Nearest neighbor (Euclidean distance) Threshold ratio of 1st nearest to 2nd nearest descriptor Lowe IJCV 2004

Efficient matching So far we discussed matching features across just two images What if you wanted to match a query feature from one image, to features from all frames in a video? Or an image to other images in a giant database? With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image? Adapted from K. Grauman

Indexing local features: Setup Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT) Descriptor’s feature space Database images K. Grauman CS 376 Lecture 18 7

Indexing local features: Setup When we see close points in feature space, we have similar descriptors, which indicates similar local content Query image Descriptor’s feature space Database images K. Grauman CS 376 Lecture 18

Indexing local features: Inverted file index For text documents, an efficient way to find all pages on which a word occurs is to use an index… We want to find all images in which a feature occurs. To use this idea, we’ll need to map our features to “visual words”. K. Grauman

Visual words: Main idea Extract some local features from a number of images … e.g., SIFT descriptor space: each point is 128-dimensional D. Nister, CVPR 2006

Visual words: Main idea D. Nister, CVPR 2006

Visual words: Main idea D. Nister, CVPR 2006

Visual words: Main idea D. Nister, CVPR 2006

Each point is a local descriptor, e.g. SIFT feature vector. D. Nister, CVPR 2006

“Quantize” the space by grouping (clustering) the features. Note: For now, we’ll treat clustering as a black box. D. Nister, CVPR 2006

Visual words Patches on the right = regions used to compute SIFT Each group of patches belongs to the same “visual word” Figure from Sivic & Zisserman, ICCV 2003 Adapted from K. Grauman CS 376 Lecture 18

Visual words for indexing Map high-dimensional descriptors to tokens/words by quantizing the feature space Each cluster has a center 1 2 Determine which word to assign to each new image region by finding the closest cluster center Word #3 3 To compare features: Only compare query feature to others in same cluster (speed up) Descriptor’s feature space Query To compare images: see next slide Adapted from K. Grauman CS 376 Lecture 18

Inverted file index Database images are loaded into the index, by mapping words to image numbers K. Grauman

Inverted file index When will this indexing process give us a gain in efficiency? w91 For a new query image, find which database images share a word with it, and retrieve those images as matches (or inspect only those further) We call this retrieval process instance recognition Adapted from K. Grauman CS 376 Lecture 18

retinal, cerebral cortex, How to describe documents with words? Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. China, trade, surplus, commerce, exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value sensory, brain, visual, perception, retinal, cerebral cortex, eye, cell, optical nerve, image Hubel, Wiesel ICCV 2005 short course, L. Fei-Fei CS 376 Lecture 18

Describing images w/ visual words Cluster 1 Summarize entire image based on its distribution (histogram) of word occurrences Analogous to bag of words representation commonly used for documents Cluster 2 Cluster 3 Feature patches: Cluster 4 Adapted from K. Grauman CS 376 Lecture 18 21

Describing images w/ visual words Summarize entire image based on its distribution (histogram) of word occurrences Analogous to bag of words representation commonly used for documents times appearing times appearing Feature patches: times appearing Visual words K. Grauman CS 376 Lecture 18 22

Comparing bags of words Rank images by normalized scalar product between their occurrence counts---nearest neighbor search for similar images. [1 8 1 4] [5 1 1 0] 𝑠𝑖𝑚 𝑑 𝑗 ,𝑞 = 𝑑 𝑗 ,𝑞 𝑑 𝑗 𝑞 = 𝑖=1 𝑉 𝑑 𝑗 𝑖 ∗𝑞(𝑖) 𝑖=1 𝑉 𝑑 𝑗 (𝑖) 2 ∗ 𝑖=1 𝑉 𝑞(𝑖) 2 sim(d_j, q) = dot(d_j, q) / (norm(d_j, 2) * norm(q, 2)) for vocabulary of V words K. Grauman CS 376 Lecture 18

Bags of words: pros and cons + flexible to geometry / deformations / viewpoint + compact summary of image content + good results in practice basic model ignores geometry – must verify afterwards, or encode via features background and foreground mixed when bag covers whole image optimal vocabulary formation remains unclear Adapted from K. Grauman CS 376 Lecture 18 24

Summary: Inverted file index and bags of words similarity Offline: Extract features in database images, cluster them to find words = cluster centers, make index Online (during search): Extract words in query (extract features and map each to closest cluster center) Use inverted file index to find database images relevant to query Rank database images by comparing word counts of query and database image Adapted from K. Grauman

One more trick: tf-idf weighting Term frequency – inverse document frequency Describe image by frequency of each word within it, but downweight words that appear often in the database (Standard weighting for text retrieval) Total number of documents in database Number of occurrences of word i in document d Number of documents in which word i occurs Number of words in document d Normalized bag-of-words Adapted from K. Grauman CS 376 Lecture 18

Bags of words for content-based image retrieval Slide from Andrew Zisserman Sivic & Zisserman, ICCV 2003 CS 376 Lecture 18

Slide from Andrew Zisserman Sivic & Zisserman, ICCV 2003 CS 376 Lecture 18

Video Google System Collect all words within query region Inverted file index to find relevant frames Compare word counts (bag of words) Spatial verification Sivic & Zisserman, ICCV 2003 Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle/index.html Retrieved frames K. Grauman 29

Preview: Spatial Verification Query Query DB image with high BoW similarity DB image with high BoW similarity Both image pairs have many visual words in common Ondrej Chum

Preview: Spatial Verification Query Query DB image with high BoW similarity DB image with high BoW similarity Only some of the matches are mutually consistent Ondrej Chum

Example Applications Mobile tourist guide Object/building recognition Aachen Cathedral Mobile tourist guide Object/building recognition Self-localization Photo/video augmentation [Quack, Leibe, Van Gool, CIVR’08] B. Leibe

Scoring retrieval quality Database size: 10 images Relevant (total): 5 images (e.g. images of Golden Gate) Results (ordered): Query precision = # returned relevant / # returned recall = # returned relevant / # total relevant 1 0.8 0.6 precision 0.4 0.2 0.2 0.4 0.6 0.8 1 recall Ondrej Chum CS 376 Lecture 18

Indexing and retrieval: Summary Bag of words representation: quantize feature space to make discrete set of visual words Index individual words Summarize image by distribution of words Inverted index: pre-compute index to enable faster search at query time Recognition of instances: match local features Optionally, perform spatial verification Adapted from K. Grauman