PageRank for Product Image Search Yushi Jing, Shumeet Baluja College of Computing, Georgia Institute of Technology Google, Inc. WWW 2008 Referred Track:

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.

Presented by Xinyu Chang

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

TrustRank Algorithm Srđan Luković 2010/3482

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE Shumeet Baluja, Member, IEEE IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.

1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.

Our purpose Giving a query on the Web, how can we find the most authoritative (relevant) pages?

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS

1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Link Analysis, PageRank and Search Engines on the Web

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

Link Structure and Web Mining Shuying Wang

1 COMP4332 Web Data Thanks for Raymond Wong’s slides.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.

PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University.

Presented By: - Chandrika B N

Adversarial Information Retrieval The Manipulation of Web Content.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

A Graph-based Friend Recommendation System Using Genetic Algorithm

Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley.

LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

Algorithmic Detection of Semantic Similarity WWW 2005.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.

Ranking Link-based Ranking (2° generation) Reading 21.

Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor ： Dr. Koh Jia-Ling Speaker ： Tu.

Enhancing Web Search by Promoting Multiple Search Engine Use Ryen W. W., Matthew R. Mikhail B. (Microsoft Research) Allison P. H (Rice University) SIGIR.

1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.

Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.

Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.

- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.

1 CS 430: Information Discovery Lecture 5 Ranking.

On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,

NN k Networks for browsing and clustering image collections Daniel Heesch Communications and Signal Processing Group Electrical and Electronic Engineering.

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.

CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.

1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.

Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Compact Query Term Selection Using Topically Related Text

Junghoo “John” Cho UCLA

Presentation transcript:

PageRank for Product Image Search Yushi Jing, Shumeet Baluja College of Computing, Georgia Institute of Technology Google, Inc. WWW 2008 Referred Track: Rich Media Summarized and Presented by Seungseok Kang, IDS Lab.

Copyright  2008 by CEBT Outline  Introduction  Background and Related Work  Approach and Algorithm Features Generation and Representation Query Dependent Ranking  Full Retrieval System Queries with Homogeneous Visual Concepts Queries with Heterogeneous Visual Concepts  Experimental Results  Conclusion

Copyright  2008 by CEBT Introduction  Image search has become a popular feature in search engines Yahoo, MSN, Google  The majority of common image search Based on the text on the pages in which the image is embedded – Text in the body of the page, Anchor-text, Image name, etc. Text-based search of web pages is a well studied problem Fundamental task of image analysis is yet unsolved Image processing required can be quite expensive  PageRank for the image search Analyzing the distribution of visual similarities among the images Finding the multiple visual themes and their relative strengths in a large set of images

Copyright  2008 by CEBT Eiffel Tower vs. McDonalds

Copyright  2008 by CEBT Challenge Issues  The concept of inferring common visual themes to creating a scalable and effective algorithm Image processing – The goal of query is to find what is common among the images – The common features may occur anywhere in the images – Local features Utilization of information – Simple counting will yield poor results – Inferring a graph between the images where images are linked to each other based on their similarity

Copyright  2008 by CEBT Background and Related Work  Object Category Model Trained from the top search results Re-rank images based on their fit to the model Lack of heterogeneous image search – E.g. “Apple” “Nemo” “Jaguar”  Intuitive graph-model Based on the content-based image ranking Using expected user behavior – Visual similarities Treating images as web documents Estimating the likelihood of images visited by a user traversing through the visual- hyperlinks  Similarity based graph For semi-supervised learning

Copyright  2008 by CEBT Contribution  Introducing a novel, simple algorithm to rank images based on their visual similarities  Introducing a system to re-rank current Google image search results Similarity score among images can be derived from a comparison of their local descriptors  Improving the image search result for queries that are of the most interest to a large set of people significantly

Copyright  2008 by CEBT Approach and Algorithm  Preliminaries Eigenvector Centrality – A sort of a square stochastic adjacency matrix – Providing a principled method to combine the “importance” of a vertex with those of its neighbors in ranking – PageRank pre-computes a rank vector to estimate the importance for all of the Web pages Random Walk explanation – The ranking scores correspond to the likelihood of arriving in each of the vertices by traversing through the graph with a random starting – If a user is viewing an image, other related (similar) images may also be of interest (jumping to other random image)

Copyright  2008 by CEBT Approach and Algorithm (cont’d)  Image rank (IR) S* is the column normalized, symmetrical adjacency matrix S where S u,v measures the visual similarities Iterative IR yields the dominant eigenvector of the matrix S*  Image rank with random walk d is damping factor (commonly d > 0.8 in practice) Considering a small probability for a random walk to go to some other images which is not connected in the graph  Then, how can we calculate visual similarities?

Copyright  2008 by CEBT Features generation and representation  A reliable measure of image similarity Global features are often too restrictive (“Prius”) – Color histograms – Shape analysis Local descriptors are useful – Contains a richer set of image information – Relatively stable under different transformation – A type of local descriptors Harris corners Scale Invariant Feature Transform (SIFT) Shape Context Spin Images

Copyright  2008 by CEBT Local Descriptors  SIFT with a Different of Gaussian (DoG) Extracting a features of images without regarding to scale and rotation Simple process – 1. Converting original image into grayscale – 2. Applying Gaussian Filter – 3. Finding DoG interest point (candidate keypoint) – 4. pruning the candidate keypoint – 5. deriving description vectors – 6. comparing the description vectors  The similarity is defined as the number interest point (keypoint) shared between two images divided by their average number of interest points

Copyright  2008 by CEBT Query Dependent Ranking  Generating the similarity graph S is computationally infeasible for the billions of images Need to reduce the computational cost  Query dependent ranking A practical method to obtain the initial set of candidates Rely on the existing commercial search engine for the initial grouping of semantically similar images – Ex) given the query “Eiffel Towers” – Extracting the top-N results from existing search engines – Creating the graph of visual similarity on the N images – Computing the image rank only on this subset  Then, how can this approach improve the relevancy and diversity of image search results? Query Dependent

Copyright  2008 by CEBT A Full Retrieval System  The goal of image-search engines Retrieving image results that are relevant to the query and diverse enough to cover variations of visual of semantic concepts – “Without the analyzing the content of images, there is no reliable way to actively promote the diversity of the results”  Queries with homogeneous visual concepts “Mona-Lisa”, “Eiffel Tower”, “Albert Einstein” Achieved by identifying the vertices that are located at the “center” of weighted similarity graph  Queries with heterogeneous visual concepts “Jaguar”, “Apple”, “Monet Painting” The approach is able to identify a relevant and diverse set of images as top ranking results Simple heuristics can help for analyzing the graph

Copyright  2008 by CEBT Homogeneous Concept: Example

Copyright  2008 by CEBT Heterogeneous Concept: Example

Copyright  2008 by CEBT Heterogeneous Concept: Example

Copyright  2008 by CEBT Experimental Results  Test set 2000 most popular product queries on Google – Google product search: “ipod”, “xbox”, “Picasso”, “Fabreze”, …… – Extracted the top 1000 search results from Google Image Search – Fewer than 5% of the images had at least 1 connection – Concentrated on the approximately 1000 remaining queries  Challenge issues Quantifying the quality of sets of image is very hard – User preference to an image is heavily influenced by a user’s personal tastes and biases – Asking the user to compare the quality of a set of images is difficult and time consuming task – Assessing the differences in ranking is error-prone and imprecise  Two evaluation strategies Minimizing irrelevant images Click studies

Copyright  2008 by CEBT Experimental Results (cont’d)  Minimizing irrelevant images For studying a conservative version of “relevancy” of the ranking results – Asking the user: “Which of the images are the least relevant?” IR>Google: 762 Google>IR: 70 Google=IR: 202  Click Study User satisfaction is not purely a function of relevance – Users usually click in the images they are interested in An effective way to measure search quality is to analyze the total number of “clicks” each image receives – Collect clocks for the top 40 images on 130 common queries – The image selected by IR to be in the top-20 would have received approximately 17.5% more clicks than those in the default ranking In case of Inflated logos Screenshots of Web pages In case of Inflated logos Screenshots of Web pages

Copyright  2008 by CEBT Conclusion  Proposed a simple mechanism to incorporate the advancement made in using link and network analysis for Web-document search into image search Image Ranking  Demonstrated an effective method to infer a graph in which the images could be embedded Visual similarities with visual-hyperlinks Rely on human knowledge and the intelligence of crowds  Proposed the ability to customize the similarity function based on the expected distribution of queries  Future work Determining the performance of the system under adversarial condition Studying about the role of duplicate and near-duplicate images in terms of the potential for biasing the approach and transitional probabilities