Presentation is loading. Please wait.

Presentation is loading. Please wait.

CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley.

Similar presentations


Presentation on theme: "CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley."— Presentation transcript:

1 CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley

2 Outline  Introduction  Computation of Image Feature  SIFT  Canonical Image Selection  Experiments & Results  Analysis  Conclusions and Future Work

3 Introduction

4  Image search has become a popular feature  Most search engines just use text-based search.  Image searches use very little image information  Success of text-based search of web page  Difficulty and expense using image-based signal  Most search engines like Yahoo, MSN, Google, etc., exam the text of the pages from which the images are linked.

5  Example: Searching for Taipei 101 by text-based, rather than examining visual contents. Picture from: http://zh-yue.wikipedia.org/wiki/TAIPEI_101 Picture from: http://jerome.anyday.com.twhttp://jerome.anyday.com.tw

6 ↑ Search results for “cayman” snapshot from Google. Search results for “coca-cola” →

7  Why yield the results?  Difficulty in associating images with keywords  Large variation in image quality  User perceived semantic content  Approach: Visual similarities among the images  Rather than assuming that every user who get a good image result, the approach relies on the combined preference of many users.

8  Common “visual theme”  best capture the visual themes returned to the user  Content-based image retrieval is an actively explored area  Analyzing the “coherence” of the top results from a traditional image search engine  G. Park, Y. Baek, and H. Lee. Majority based ranking approach in web image retrieval. 2003  R. Fergus, P. Perona, and A. Zisserman. A visual category filter for google images. 2004  The approach is an logical extension of their

9  Global Feature like color histograms and curvature, only capture few information, has no distinctive information.  Example: Given 1000 images from Google Search for “starbucks”, only color histogram is used.

10  Local features are more robust against image deformation, variations and noise  They don’t check whether image-based system can improve the quality of search results when apply to a large set of query.  Attempts to find the single most representative image for popular product using only image feature  Experiment: Human evaluators

11  Product searches (i.e. “ipod”,“Coca Cola”, “polo shirt”, etc) for two reasons.  This is an extremely popular category of searches.  It provide a good set of queries from which to quantitatively evaluate our performance.  Examining the single most representative image  Importance and wide-applicability of this task. Froogle, NextTag.com, Shopping.com, to Amazon.com. Showing a single image next to a product listing.

12 Computation of Image Feature

13 Query on “golden gate” or “Starbucks”

14  The ability to identify similar sub-images.  Global features are too restrictive for our task.  Use local features: local information content  Harris corners, Scale Invariant Feature Transform (SIFT), Shape Context, Spin Images and etc.  K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. 2005  Demonstrated experimentally that SIFT gives the best matching results.

15 SIFT(Scale Invariant Feature Transform)  Advantage  Its ability to generate highly distinctive features that are invariant to image transformations (translation, rotation, scaling) and robust to illumination variation.  SIFT algorithm’s main four stage:  Scale-space extrema detection  Accurate keypoint localization  Orientation assignment  Keypoint descriptor

16 convolution operation octave = s layer

17 Accurate keypoint localization

18

19 Canonical Image Selection

20 Local Coherence-based Image Selection Algorithm  1. Given a text query, retrieve the top 1000 images from Google image search and generate SIFT features for these images.  2. Identify matching features with Spill Tree algorithm.  3. Identify common regions shared between images by clustering the matched feature points.  4. Construct a similarity graph. If there is more than one cluster, select the best cluster based on its size and average similarity among the images.  5. From the chosen cluster, select the image with the most and highly connected edges.

21  Image(1000) are resized to have a max dimension of 400 pixel  Resized image contains 300 to 800 SIFT  Algorithm: most matching features  Find nearest matches roughly half a million high dimensional features can be computationally expensive  Spill tree, an approximation to metric tree  Euclidian distance is less than a threshold, potential match

22 Common Object Verification  Similar local features can originate from different objects.  Clustering  Geometric verification  Group the matched points according to their corresponding image pairs.  Hough transform, object verification  A 4 dimensional histogram is used to store the “votes” the pose space(translation, scaling and rotation)  Final, we select the histogram entry with the most votes as the most consistent interpretation.

23

24 Image Selection  Similarity scores between two images  Matching points divided by their total number of interest points  Similarity graph  Images as nodes, similarity as weighted edges  Outliers, and removed  Multiple themes, the resulting graph usually contain several distinctive clusters of image

25

26  How to select the image?  If similarity graph does not have a cluster, select the first image returned by google as the best image.  Why have no cluster? EX?  Lacks visually distinctive features  Object category is too vague or broad

27 Experiment & Results

28 Experiment  Environment  130 product query  Extract images(up to 1000) from Yahoo, MSN, Google  105 human evaluators  50 randomly selected sets of images, with randomly adjusted  Resize, maximum dimension of 130 pixel

29

30  “Which of the following image best describes”  If it fails to find “common theme” among images  53/130  Each position receiving approximately 24%~26%

31

32 Analysis

33  LC significantly outperforms Google, Yahoo and MSN. Analysis table 3

34

35

36  Some images selected by search engines are relevant and appropriate, but better choices are available.  “Batman returns” screen shots  LC algorithm is able to improve image selection by identifying the common “theme” in the initial image set, and select images containing the most visually distinctive representation of that theme

37

38  There are three reasons behind this result  People usually strive to take the best photos they can  Popularity images on the web. Relevant and good quality photo tend to be repeatedly used. Starbucks  Images contain a dominant view of the object usually have more matches. This is crucial in selecting not only relevant, but also high quality images. Mona Lisa

39 Conclusions & Futurework

40 Conclusions  Presented a method for selecting the best image among a group of images returned by a conventional text- based image search engine  Computationally expensive  Similarity measurements can only be generated off-line over a list of queries.  To explore methods to improve the efficiency Limiting the size of the image The number of interest points Reducing the dimensions of local features Use discriminative selecting features that are most related to the query we are interested in.

41 Future work  Expanding the range of queries  Further domains might require the use of other image features.  Face recognition methods may provide a useful similarity measure when a large portion of image results contain faces.  For queries where the results are an object category (eg “chair”), features typically used for content-based retrieval (color distributions) may be more fruitful.  The spanning trees illustrated in Figures 8 and 9 contain a great deal of information to be exploited.  The edges may be usable in the same way the web link structure is used to improve web page ranking.


Download ppt "CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley."

Similar presentations


Ads by Google