CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Object Recognition Using Locality-Sensitive Hashing of Shape Contexts Andrea Frome, Jitendra Malik Presented by Ilias Apostolopoulos.
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Presented by Xinyu Chang
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE Shumeet Baluja, Member, IEEE IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Robust and large-scale alignment Image from
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Distinctive Image Feature from Scale-Invariant KeyPoints
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Object Recognition Using Distinctive Image Feature From Scale-Invariant Key point D. Lowe, IJCV 2004 Presenting – Anat Kaspi.
Scale Invariant Feature Transform (SIFT)
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
Overview Introduction to local features
Presenting by, Prashanth B R 1AR08CS035 Dept.Of CSE. AIeMS-Bidadi. Sketch4Match – Content-based Image Retrieval System Using Sketches Under the Guidance.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
PageRank for Product Image Search Yushi Jing, Shumeet Baluja College of Computing, Georgia Institute of Technology Google, Inc. WWW 2008 Referred Track:
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.
Local invariant features Cordelia Schmid INRIA, Grenoble.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Distinctive Image Features from Scale-Invariant Keypoints Ronnie Bajwa Sameer Pawar * * Adapted from slides found online by Michael Kowalski, Lehigh University.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
CSE 185 Introduction to Computer Vision Feature Matching.
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Presented by David Lee 3/20/2006
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
SIFT.
SIFT Scale-Invariant Feature Transform David Lowe
Interest Points EE/CSE 576 Linda Shapiro.
Presented by David Lee 3/20/2006
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Distinctive Image Features from Scale-Invariant Keypoints
Scale Invariant Feature Transform (SIFT)
Video Google: Text Retrieval Approach to Object Matching in Videos
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5
Aim of the project Take your image Submit it to the search engine
Brief Review of Recognition + Context
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Multimedia Information Retrieval
SIFT.
Video Google: Text Retrieval Approach to Object Matching in Videos
CSE 185 Introduction to Computer Vision
Presented by Xu Miao April 20, 2005
Recognition and Matching based on local invariant features
Presentation transcript:

CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley

Outline  Introduction  Computation of Image Feature  SIFT  Canonical Image Selection  Experiments & Results  Analysis  Conclusions and Future Work

Introduction

 Image search has become a popular feature  Most search engines just use text-based search.  Image searches use very little image information  Success of text-based search of web page  Difficulty and expense using image-based signal  Most search engines like Yahoo, MSN, Google, etc., exam the text of the pages from which the images are linked.

 Example: Searching for Taipei 101 by text-based, rather than examining visual contents. Picture from: Picture from:

↑ Search results for “cayman” snapshot from Google. Search results for “coca-cola” →

 Why yield the results?  Difficulty in associating images with keywords  Large variation in image quality  User perceived semantic content  Approach: Visual similarities among the images  Rather than assuming that every user who get a good image result, the approach relies on the combined preference of many users.

 Common “visual theme”  best capture the visual themes returned to the user  Content-based image retrieval is an actively explored area  Analyzing the “coherence” of the top results from a traditional image search engine  G. Park, Y. Baek, and H. Lee. Majority based ranking approach in web image retrieval  R. Fergus, P. Perona, and A. Zisserman. A visual category filter for google images  The approach is an logical extension of their

 Global Feature like color histograms and curvature, only capture few information, has no distinctive information.  Example: Given 1000 images from Google Search for “starbucks”, only color histogram is used.

 Local features are more robust against image deformation, variations and noise  They don’t check whether image-based system can improve the quality of search results when apply to a large set of query.  Attempts to find the single most representative image for popular product using only image feature  Experiment: Human evaluators

 Product searches (i.e. “ipod”,“Coca Cola”, “polo shirt”, etc) for two reasons.  This is an extremely popular category of searches.  It provide a good set of queries from which to quantitatively evaluate our performance.  Examining the single most representative image  Importance and wide-applicability of this task. Froogle, NextTag.com, Shopping.com, to Amazon.com. Showing a single image next to a product listing.

Computation of Image Feature

Query on “golden gate” or “Starbucks”

 The ability to identify similar sub-images.  Global features are too restrictive for our task.  Use local features: local information content  Harris corners, Scale Invariant Feature Transform (SIFT), Shape Context, Spin Images and etc.  K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors  Demonstrated experimentally that SIFT gives the best matching results.

SIFT(Scale Invariant Feature Transform)  Advantage  Its ability to generate highly distinctive features that are invariant to image transformations (translation, rotation, scaling) and robust to illumination variation.  SIFT algorithm’s main four stage:  Scale-space extrema detection  Accurate keypoint localization  Orientation assignment  Keypoint descriptor

convolution operation octave = s layer

Accurate keypoint localization

Canonical Image Selection

Local Coherence-based Image Selection Algorithm  1. Given a text query, retrieve the top 1000 images from Google image search and generate SIFT features for these images.  2. Identify matching features with Spill Tree algorithm.  3. Identify common regions shared between images by clustering the matched feature points.  4. Construct a similarity graph. If there is more than one cluster, select the best cluster based on its size and average similarity among the images.  5. From the chosen cluster, select the image with the most and highly connected edges.

 Image(1000) are resized to have a max dimension of 400 pixel  Resized image contains 300 to 800 SIFT  Algorithm: most matching features  Find nearest matches roughly half a million high dimensional features can be computationally expensive  Spill tree, an approximation to metric tree  Euclidian distance is less than a threshold, potential match

Common Object Verification  Similar local features can originate from different objects.  Clustering  Geometric verification  Group the matched points according to their corresponding image pairs.  Hough transform, object verification  A 4 dimensional histogram is used to store the “votes” the pose space(translation, scaling and rotation)  Final, we select the histogram entry with the most votes as the most consistent interpretation.

Image Selection  Similarity scores between two images  Matching points divided by their total number of interest points  Similarity graph  Images as nodes, similarity as weighted edges  Outliers, and removed  Multiple themes, the resulting graph usually contain several distinctive clusters of image

 How to select the image?  If similarity graph does not have a cluster, select the first image returned by google as the best image.  Why have no cluster? EX?  Lacks visually distinctive features  Object category is too vague or broad

Experiment & Results

Experiment  Environment  130 product query  Extract images(up to 1000) from Yahoo, MSN, Google  105 human evaluators  50 randomly selected sets of images, with randomly adjusted  Resize, maximum dimension of 130 pixel

 “Which of the following image best describes”  If it fails to find “common theme” among images  53/130  Each position receiving approximately 24%~26%

Analysis

 LC significantly outperforms Google, Yahoo and MSN. Analysis table 3

 Some images selected by search engines are relevant and appropriate, but better choices are available.  “Batman returns” screen shots  LC algorithm is able to improve image selection by identifying the common “theme” in the initial image set, and select images containing the most visually distinctive representation of that theme

 There are three reasons behind this result  People usually strive to take the best photos they can  Popularity images on the web. Relevant and good quality photo tend to be repeatedly used. Starbucks  Images contain a dominant view of the object usually have more matches. This is crucial in selecting not only relevant, but also high quality images. Mona Lisa

Conclusions & Futurework

Conclusions  Presented a method for selecting the best image among a group of images returned by a conventional text- based image search engine  Computationally expensive  Similarity measurements can only be generated off-line over a list of queries.  To explore methods to improve the efficiency Limiting the size of the image The number of interest points Reducing the dimensions of local features Use discriminative selecting features that are most related to the query we are interested in.

Future work  Expanding the range of queries  Further domains might require the use of other image features.  Face recognition methods may provide a useful similarity measure when a large portion of image results contain faces.  For queries where the results are an object category (eg “chair”), features typically used for content-based retrieval (color distributions) may be more fruitful.  The spanning trees illustrated in Figures 8 and 9 contain a great deal of information to be exploited.  The edges may be usable in the same way the web link structure is used to improve web page ranking.