1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints

Advertisements

Feature Detection. Description Localization More Points Robust to occlusion Works with less texture More Repeatable Robust detection Precise localization.

Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.

Three things everyone should know to improve object retrieval

Presented by Xinyu Chang

BRISK (Presented by Josh Gleason)

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

Distinctive Image Features from Scale- Invariant Keypoints Mohammad-Amin Ahantab Technische Universität München, Germany.

Image alignment Image from

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.

Robust and large-scale alignment Image from

WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.

Object retrieval with large vocabularies and fast spatial matching

A Study of Approaches for Object Recognition

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Distinctive Image Feature from Scale-Invariant KeyPoints

Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Scale Invariant Feature Transform (SIFT)

Automatic Matching of Multi-View Images

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

1 Image Features Hao Jiang Sept Image Matching 2.

Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Computer vision.

Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

Bag of Visual Words for Image Representation & Visual Search Jianping Fan Dept of Computer Science UNC-Charlotte.

Reporter: Fei-Fei Chen. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking.

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague.

CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.

Distinctive Image Features from Scale-Invariant Keypoints Ronnie Bajwa Sameer Pawar * * Adapted from slides found online by Michael Kowalski, Lehigh University.

Harris Corner Detector & Scale Invariant Feature Transform (SIFT)

Features Digital Visual Effects, Spring 2006 Yung-Yu Chuang 2006/3/15 with slides by Trevor Darrell Cordelia Schmid, David Lowe, Darya Frolova, Denis Simakov,

Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.

Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.

Scale Invariant Feature Transform (SIFT)

776 Computer Vision Jan-Michael Frahm Spring 2012.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.

776 Computer Vision Jan-Michael Frahm Spring 2012.

SIFT Scale-Invariant Feature Transform David Lowe

Interest Points EE/CSE 576 Linda Shapiro.

Distinctive Image Features from Scale-Invariant Keypoints

Scale Invariant Feature Transform (SIFT)

TP12 - Local features: detection and description

Digital Visual Effects, Spring 2006 Yung-Yu Chuang 2006/3/22

Video Google: Text Retrieval Approach to Object Matching in Videos

CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5

From a presentation by Jimmy Huff Modified by Josiah Yoder

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Video Google: Text Retrieval Approach to Object Matching in Videos

ECE734 Project-Scale Invariant Feature Transform Algorithm

SIFT SIFT is an carefully designed procedure with empirically determined parameters for the invariant and distinctive features.

Presented by Xu Miao April 20, 2005

Presentation transcript:

1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and Andrew Zisserman University of Oxford ICCV 2007

2 outline Image description Scaling up visual vocabularies Query expansion to improve recall Conclusion

3 When do (images of) objects match? Two requirements: “patches” (parts) correspond Configuration (spatial layout) corresponds Can we use retrieval mechanisms from text retrieval? Need a visual analogy of a textual word. Success of text retrieval efficient scalable high precision

4 Feature matching algorithm Detector Moravec detector Harris coner detector SIFT detector Descriptor SIFT descriptor

5 Image description SIFT (Scale Invariant Feature Transform) SIFT detector Scale-space extrema detection Keypoint localization SIFT descriptor Orientation assignment Keypoint descriptor

6 SIFT detector 1.Detection of scale-space extrema Improve Harris corner detector scale invariance DoG(Difference-of-Gaussian) filter for scale space

7 SIFT detector  doubles for the next octave K=2 (1/s)

8 SIFT detector Keypoint localization (local extrema) X is selected if it is larger or smaller than all 26 neighbors

9 SIFT detector 2.Accurate keypoint localization Reject points with low contrast (flat) Fit a 3D quadratic function for sub-pixel maxima

10 SIFT detector Throw out low contrast (<0.03)

11 SIFT detector Reject points poorly localized along an edge Harris Detector  Change of intensity for the shift [u,v]: Intensity Shifted intensity Window function or Window function w(x,y) = Gaussian 1 in window, 0 outside

12 Harris Detector: Mathematics For small shifts [u,v] we have a bilinear approximation: Where M is a 2  2 matrix computed from image derivatives:

13 Harris Detector λ1, λ2 – eigenvalues of M Compute the response of the detector at each pixel

14 Hessian detector r=10 Let Keep the points with

15 SIFT descriptor 3. Orientation assignment To make the feature invariant to rotation For a keypoint, L is the Gaussian-smoothed image with the closest scale orientation histogram (36 bins) (Lx, Ly) m θ

16 SIFT descriptor

17 SIFT descriptor

18 SIFT descriptor 4. Local image descriptor Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array (128D)

19 Bag of visual word Quantize the visual descriptor to index the image Representation is a sparse histogram for each image Similarity measure is L2 distance between tf-idf weighted histograms

20 Datasets

21 Train the dictionary K-mean is far too slow for our needs D~128, N~20M+, K~1M Approximate k-mean (AKM) Reduce the number of candidates of nearest cluster heads Approximate nearest neighbor Use multiple, randomized k-d trees for search Points nearby in the space can be found by backtracking around the tree some small number of steps

22 AKM Point List = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)] Sorted List = [(2,3), (4,7), (5,4), (7,2), (8,1),(9,6)]

23 AKM Multiple randomized trees increase the chances of finding nearby points Original K-means complexity = O(N K) Approximate K-means complexity = O(N log K) This means we can scale to very large K

24 Matching a query region Stage 1: generate a short list of possible frames using bag of visual word representation Accumulate all visual words within the query region Use “book index” to find other frames Compute similarity for frames tf-idf ranked list of all the frames in dataset

25 Matching a query region Stage 2: re-rank short list on spatial consistency Discard mismatches Compute matching score Accumulate the score from all matches

26 Beyond Bag of Words We can measure spatial consistency between the query and each result to improve retrieval quality Many spatially consistent matches – correct result Few spatially consistent matches – incorrect result

27 Spatial Verification It is vital for query-expansion that we do not expand using false positives, or use features which occur in the result image, but not in the object of interest Use hypothesize and verify procedure to estimate homography between query and target > 20 inliers = spatially verified result

Spatial Verification Usage Re-ranking the top ranked results Procedure 1. Estimate a transformation for each target image 2. Refine the estimations – Reduce the errors due to outliers  RANSAC  RANdom SAmple Consensus 3. Re-rank target images – Scoring target images to the sum of the idf value for the inlier words – Verified images above unverified images 28

29 Estimating spatial correspondences 1. Test each correspondence

30 Estimating spatial correspondences 2. Compute a (restricted) affine transformation

31 Estimating spatial correspondences 3. Score by number of consistent matches Use RANSAC on full affine transformation

32 Methods for Computing Latent Object Models Query expansion baseline Transitive closure expansion Average query expansion Recursive average query expansion Multiple image resolution expansion

33 Query expansion baseline 1. Find top 5 (unverified) results from original query 2. Average the term-frequency vectors 3. Requery once 4. Append these results

34 Transitive closure expansion 1. Create a priority queue based on number of inliers 2. Get top image in the queue 3. Find region corresponding to original query 4. Use this region to issue a new query 5. Add new verified results to queue 6. Repeat until queue is empty

35 Average query expansion 1. Obtain top (m < 50) verified results of original query 2. Construct new query using average of these results 3. Requery once 4. Append these results

36 Recursive average query expansion 1. Improvement of average query expansion method 2. Recursively generate queries from all verified results returned so far 3. Stop once 30 verified images are found or once no new images can be positively verified

37 Multiple image resolution expansion 1. For each verified result of original query calculate relative change in resolution required to project verified region onto query region 2. Place results in 3 bands: 3. Construct average query for each band 4. Execute independent queries 5. Merge results 1. Verified images from first query first, then… 2. Expanded queries in order of number of inliers (0, 4/5)(2/3, 3/2)(5/4, infinity)

38 Query Expansion

39 Experiment Datasets Oxford buildings dataset ~5,000 high resolution (1024x768) images Crawled from Flickr using 11 landmarks as queries

40 Datasets Ground truth labels Good – a nice, clear picture of landmark OK – more than 25% of object visible Bad – object not present Junk – less than 25% of object visible Flickr1 dataset ~100k high resolution images Crawled from Flickr using 145 most popular tags Flickr2 dataset ~1M medium resolution (500 x 333) images Crawled from Flickr using 450 most popular tags

41 Evaluation Procedure Compute Average Precision (AP) score for each of the 5 queries for a landmark Area under the precision-recall curve Precision = RPI / TNIR Recall = RPI / TNPC RPI = retrieved positive images TNIR = total number of images retrieved TNPC = total number of positives in the corpus Average these to obtain a Mean Average Precision (MAP) for the landmark Use two databases D1: Oxford + Flickr1 datasets (~100k) D2: D1 + Flickr1 datasets (~1M) Recall Precision

42 Method and Dataset Comparison ori = original query qeb = query expansion baseline trc = transitive closure expansion avg = average query expansion rec = recursive average query expansion sca = multiple image resolution expansion

43

44

45

46 Conclusion Have successfully ported methods from text retrieval to the visual domain: Visual words enable posting lists for efficient retrieval of specific objects Spatial re-ranking improves precision Query expansion improves recall, without drift Outstanding problems: Include spatial information into index Universal vocabularies