PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008 Shimin Chen Big Data Reading Group

Motivation Important part of Commercial Search Engines Based on the text of the pages from the images are linked. –Anchor Text –Quality of the anchor page –Etc.

Why? Text-based search is well studied. General object detection/recognition in images remains an open problem. Image processing is much more expensive than text processing Discussions (by shimin)

Search Result (Eiffel Tower)

Search Result (d80)

Search Result (McDonalds)

Search Result (coca cola)

Image Search (Integrated Results) Search quality is more important

Contribution Extending PageRank to image search Visual-hyperlinks estimated from local feature patches Most comprehensive experiment to date –Limited and Noisy real- world images –Large number of user evaluations –2000 queries

Limitations of prior works: Visual Category Recognition Filters (Fergus et al. ECCV 2004) –Probabilistic Graphical Model with hidden layers »Susceptible to data noise »Large number of parameters to estimate »High dimensionality in feature space »Limited training data. –Limited Experiment »11 hand-selected, hand-labeled queries (bottles, etc) –Can not handle multiple visual-concept –Computationally Expensive »

Our observation Due to the high dimensionality of feature space, learning feature correlations can be difficult with limited and noisy data Estimating image similarities is a slightly easier task. Visual Image Ranking != Object Category model –Modeling the relationship among images, instead of the features –As most users rarely look beyond the first page of results,

Outline VisualRank –Robust estimation of image similarities (Visual-Hyperlinks). –Random-walk on visual-hyperlinks to find “visual authority.” Experiments –2000 product queries –150 user evaluation –Click analysis

Idea Extract local features of an image Construct a graph with images as nodes, similarity as edge weights Use PageRank to generate the ranking Visual-hyperlinks discussions

Visual-hyperlinks Step 1) Generate Visual-hyperlinks via robust image similarity estimation Find similar patches (L2 distance) Geometric Verification (Affine Transformation) Interest point selection + descriptor representation SIFT: 128 dimensional vectors Similarity= (# similar patches)/ average # patches

Step 1) Generate Visual-hyperlinks via robust image similarity estimation Visual-hyperlinks

Query Dependent Ranking Too expensive to construct a graph for all images Construct a graph for images returned from a (text-based) search In other words, the purpose is to better rank images returned from a text-based search discussions

Visual-hyperlinks Generated from the top 1000 results of “mona-lisa”

Visual-hyperlinks Lincoln MemorialTop 5 Images with the highest weighted “neighbors.”

Visual-hyperlinks + PageRank Intuition Eigen-centrality Visual “authority” Random Surfer Principle Eigenvector of weighted similarity matrix

SPAM! Visual-hyperlinks + PageRank PageRankWithout PageRank

Outline VisualRank –Robust estimation of image similarities (Visual-Hyperlinks). –Random-walk on visual-hyperlinks to find “visual authority.” Experiments –2000 product queries –150 user evaluation –Click analysis

Experiment/Results Selection of queries –2000 most popular product search queries Product items are popular set of queries Well suited for the patch-based features we are studying. 153 user evaluation –Combined both results, and ask which images are irrelevant to the query? –User click analysis Back testing. Lower bound on the improvement Alternative experiment method considered –Mark our own Groundtruth data –Ask user to rank results –Ask users to compare groups of results

Experiment/Results wii picasso Microsoft zune ipod

Experiment/Results 1) 85% of the irrelevant images are removed. 2) 10% increase in user clicks on the top 20 results.

Mistakes Dell Playstation USB keychain

Click Study Idea: images clicked after a search are good Given click stats for top 40 images of 130 common product queries Examine: # of clicks of the first 20 images ImageRank: 17.5% more clicks than default ranking More results

Conclusion/Future Work Conclusion –Robust visual-hyperlinks + graph algorithms are pragmatic choice for web images Future work –How to make local feature matching efficient –Incorporate more features into the construction of visual-hyperlink. –Incorporate Google Initial ranking into PageRank

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

Similar presentations

Presentation on theme: "PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

Similar presentations

Presentation on theme: "PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008."— Presentation transcript:

Similar presentations

About project

Feedback