Download presentation
Presentation is loading. Please wait.
Published byCorey Pierce Modified over 9 years ago
1
CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley
2
Outline Introduction Computation of Image Feature SIFT Canonical Image Selection Experiments & Results Analysis Conclusions and Future Work
3
Introduction
4
Image search has become a popular feature Most search engines just use text-based search. Image searches use very little image information Success of text-based search of web page Difficulty and expense using image-based signal Most search engines like Yahoo, MSN, Google, etc., exam the text of the pages from which the images are linked.
5
Example: Searching for Taipei 101 by text-based, rather than examining visual contents. Picture from: http://zh-yue.wikipedia.org/wiki/TAIPEI_101 Picture from: http://jerome.anyday.com.twhttp://jerome.anyday.com.tw
6
↑ Search results for “cayman” snapshot from Google. Search results for “coca-cola” →
7
Why yield the results? Difficulty in associating images with keywords Large variation in image quality User perceived semantic content Approach: Visual similarities among the images Rather than assuming that every user who get a good image result, the approach relies on the combined preference of many users.
8
Common “visual theme” best capture the visual themes returned to the user Content-based image retrieval is an actively explored area Analyzing the “coherence” of the top results from a traditional image search engine G. Park, Y. Baek, and H. Lee. Majority based ranking approach in web image retrieval. 2003 R. Fergus, P. Perona, and A. Zisserman. A visual category filter for google images. 2004 The approach is an logical extension of their
9
Global Feature like color histograms and curvature, only capture few information, has no distinctive information. Example: Given 1000 images from Google Search for “starbucks”, only color histogram is used.
10
Local features are more robust against image deformation, variations and noise They don’t check whether image-based system can improve the quality of search results when apply to a large set of query. Attempts to find the single most representative image for popular product using only image feature Experiment: Human evaluators
11
Product searches (i.e. “ipod”,“Coca Cola”, “polo shirt”, etc) for two reasons. This is an extremely popular category of searches. It provide a good set of queries from which to quantitatively evaluate our performance. Examining the single most representative image Importance and wide-applicability of this task. Froogle, NextTag.com, Shopping.com, to Amazon.com. Showing a single image next to a product listing.
12
Computation of Image Feature
13
Query on “golden gate” or “Starbucks”
14
The ability to identify similar sub-images. Global features are too restrictive for our task. Use local features: local information content Harris corners, Scale Invariant Feature Transform (SIFT), Shape Context, Spin Images and etc. K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. 2005 Demonstrated experimentally that SIFT gives the best matching results.
15
SIFT(Scale Invariant Feature Transform) Advantage Its ability to generate highly distinctive features that are invariant to image transformations (translation, rotation, scaling) and robust to illumination variation. SIFT algorithm’s main four stage: Scale-space extrema detection Accurate keypoint localization Orientation assignment Keypoint descriptor
16
convolution operation octave = s layer
17
Accurate keypoint localization
19
Canonical Image Selection
20
Local Coherence-based Image Selection Algorithm 1. Given a text query, retrieve the top 1000 images from Google image search and generate SIFT features for these images. 2. Identify matching features with Spill Tree algorithm. 3. Identify common regions shared between images by clustering the matched feature points. 4. Construct a similarity graph. If there is more than one cluster, select the best cluster based on its size and average similarity among the images. 5. From the chosen cluster, select the image with the most and highly connected edges.
21
Image(1000) are resized to have a max dimension of 400 pixel Resized image contains 300 to 800 SIFT Algorithm: most matching features Find nearest matches roughly half a million high dimensional features can be computationally expensive Spill tree, an approximation to metric tree Euclidian distance is less than a threshold, potential match
22
Common Object Verification Similar local features can originate from different objects. Clustering Geometric verification Group the matched points according to their corresponding image pairs. Hough transform, object verification A 4 dimensional histogram is used to store the “votes” the pose space(translation, scaling and rotation) Final, we select the histogram entry with the most votes as the most consistent interpretation.
24
Image Selection Similarity scores between two images Matching points divided by their total number of interest points Similarity graph Images as nodes, similarity as weighted edges Outliers, and removed Multiple themes, the resulting graph usually contain several distinctive clusters of image
26
How to select the image? If similarity graph does not have a cluster, select the first image returned by google as the best image. Why have no cluster? EX? Lacks visually distinctive features Object category is too vague or broad
27
Experiment & Results
28
Experiment Environment 130 product query Extract images(up to 1000) from Yahoo, MSN, Google 105 human evaluators 50 randomly selected sets of images, with randomly adjusted Resize, maximum dimension of 130 pixel
30
“Which of the following image best describes” If it fails to find “common theme” among images 53/130 Each position receiving approximately 24%~26%
32
Analysis
33
LC significantly outperforms Google, Yahoo and MSN. Analysis table 3
36
Some images selected by search engines are relevant and appropriate, but better choices are available. “Batman returns” screen shots LC algorithm is able to improve image selection by identifying the common “theme” in the initial image set, and select images containing the most visually distinctive representation of that theme
38
There are three reasons behind this result People usually strive to take the best photos they can Popularity images on the web. Relevant and good quality photo tend to be repeatedly used. Starbucks Images contain a dominant view of the object usually have more matches. This is crucial in selecting not only relevant, but also high quality images. Mona Lisa
39
Conclusions & Futurework
40
Conclusions Presented a method for selecting the best image among a group of images returned by a conventional text- based image search engine Computationally expensive Similarity measurements can only be generated off-line over a list of queries. To explore methods to improve the efficiency Limiting the size of the image The number of interest points Reducing the dimensions of local features Use discriminative selecting features that are most related to the query we are interested in.
41
Future work Expanding the range of queries Further domains might require the use of other image features. Face recognition methods may provide a useful similarity measure when a large portion of image results contain faces. For queries where the results are an object category (eg “chair”), features typically used for content-based retrieval (color distributions) may be more fruitful. The spanning trees illustrated in Figures 8 and 9 contain a great deal of information to be exploited. The edges may be usable in the same way the web link structure is used to improve web page ranking.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.