Relevance Propagation for Web Search Dr. Tie-Yan Liu Web Search and Mining Group Microsoft Research Asia Joint Work with Tao Qin, Tsinghua University.

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY.

1 Block-based Web Search Deng Cai *1, Shipeng Yu *2, Ji-Rong Wen * and Wei-Ying Ma * * Microsoft Research Asia 1 Tsinghua University 2 University of Munich.

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.

Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:

Link Analysis, PageRank and Search Engines on the Web

Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.

1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.

ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.

Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

1 PageSim: A Link-based Similarity Measure for the World Wide Web Zhenjiang Lin, Irwin King, and Michael, R., Lyu Computer Science & Engineering, The Chinese.

PageRank Identifying key users in social networks Student : Ivan Todorović, 3231/2014 Mentor : Prof. Dr Veljko Milutinović.

Overview of Search Engines

CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.

Adversarial Information Retrieval The Manipulation of Web Content.

User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB

Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.

Using Hyperlink structure information for web search.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak

CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.

« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.

April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.

Focused Crawling for both Topical Relevance and Quality of Medical Information By Tim Tang, David Hawking, Nick Craswell, Kathy Griffiths CIKM ’05 November,

윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.

Web Mining Class Nam Hoai Nguyen Hiep Tuan Nguyen Tri Survey on Web Structure Mining

Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.

The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.

A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.

Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.

BING!-Microsoft's new search engine Launched May 28, 2009 Appealing interface A “decision engine” not just a search engine *Shopping, health, travel, local.

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

Algorithmic Detection of Semantic Similarity WWW 2005.

Ranking CSCI 572: Information Retrieval and Search Engines Summer 2010.

Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 

Ranking Link-based Ranking (2° generation) Reading 21.

Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Hongbo Deng, Michael R. Lyu and Irwin King

Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR

Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.

Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.

DATA MINING Introductory and Advanced Topics Part III – Web Mining

A Comparative Study of Link Analysis Algorithms

The Recommendation Click Graph: Properties and Applications

Web Information retrieval (Web IR)

Navigation-Aided Retrieval

Feature Selection for Ranking

Junghoo “John” Cho UCLA

Web Information retrieval (Web IR)

Presentation transcript:

Relevance Propagation for Web Search Dr. Tie-Yan Liu Web Search and Mining Group Microsoft Research Asia Joint Work with Tao Qin, Tsinghua University.

DCWC Outline Introduction Generic framework for relevance propagation Evaluations ̵ Effectiveness analysis ̵ Complexity analysis Conclusions

DCWC Introduction Web Search ≠ Information Retrieval ̵ Beside the content relevance, various structure information also plays an important role in Web search Hyperlink graph Local sitemap Webpage layout

DCWC Introduction Three ways of utilizing the structure information for Web search ̵ Linear combination of content relevance and importance scores computed from hyperlink graph β∙Relevance + (1-β)∙ PageRank ̵ Enhance link analysis with the help of content relevance Query-dependent link graph in HITS Topic-sensitive PageRank ̵ Propagate content relevance along the Web structure The use of anchor text in Search Engines Hyperlink-based relevance score propagation (TREC 2003) Sitemap-based feature propagation (TREC 2004)

DCWC Hyperlink-based Relevance Score Propagation ( Zhai et al, TREC2003) Assumption ̵ Hyperlinked pages have correlated content links outlinks

DCWC Hyperlink-based Relevance Score Propagation ( Zhai et al, TREC2003) Assumption ̵ Hyperlinked pages have correlated content Propagation model ̵ Weighted inlink model ̵ Weighted outlink model ̵ Uniform outlink model Original relevance score Propagation from the inllinks Propagation from the outlinks

DCWC Sitemap-based Feature Propagation (Liu and Qin, TREC2004) Assumption ̵ Child pages are extensions of their parent page ̵ One should consider the contribution of the child pages while computing the relevance of the parent page to a query. Propagation model

DCWC Generic Relevance Propagation Framework Modification of the sitemap-based feature propagation model Reminder of the hyperlink-based propagation model A generic framework to cover both hyperlink-based and sitemap-based propagations

DCWC More Derived Propagation Models Score levelFeature level Hyperlink Hyperlink based score propagation model Sitemap Sitemap based feature propagation model Hyperlink-based Feature Propagation Model Weighted inlink model Weighted outlink model Uniform outlink model Sitemap-based Score Propagation Model

DCWC Summary: All Models Covered by the Generic Framework AlgorithmAbbreviation Weighted in-link case of hyperlink based score propagation modelHS-WI Weighted out-link case of hyperlink based score propagation modelHS-WO Uniform out-link case of hyperlink based score propagation modelHS-UO Weighted in-link case of hyperlink based feature propagation modelHF-WI Weighted out-link case of hyperlink based feature propagation modelHF-WO Uniform out-link case of hyperlink based feature propagation modelHF-UO Sitemap based score propagation modelSS Sitemap based feature propagation modelSF

DCWC Benchmark Datasets Corpora ̵.GOV 1M pages Queries: TD 2003, 2004 ̵ MSN 2M pages Query: 100 most popular queries from MSN query log Base Ranking function ̵ BM2500

DCWC Experimental Results (1) TREC 2003

DCWC Experimental Results (2) TREC 2004

DCWC Experimental Results (3) MSN

DCWC Conclusions on Effectiveness In general, relevance propagation can boost the search performance with proper parameter settings; The sitemap-based models are more effective than the hyperlink-based models; ̵ Hyperlinks ≠ Content Correlation, while the pages in the same sub site usually talk about correlated topics. Detailed comparisons ̵ The two sitemap-based models have similar performance. ̵ Among the hyperlink-based models, the HF-WI model performs best.

DCWC Online Complexity w is the size of the working set, q is the number of query terms, l is the average number of inlinks / outlinks, t is the number of iterations. For the SS model, the complexity is O(w), ̵ The SS model needs to propagate the relevance score of a page to its parent only once if we conduct the propagation from the leaf nodes in a bottom-up manner. For the SF model, the complexity is O(qw). For the HS models, the complexity is O(twl) ̵ In each step of t iterations of the HS models, we need to propagate the relevance score of a page along its in-link or outlink in the sub graph of the working set. For the HF models, the complexity is O(tqwl).

DCWC Online Complexity AlgorithmComplexity average waverage laverage taverage q CPU time HS-WIO(twl) HS-WOO(twl) HS-UOO(twl) HF-WIO(tqwl) HT-WOO(tqwl) HF-UOO(tqwl) SSO(w) SFO(qw) The sitemap-based models are more efficient than the hyperlink-based models The score-level propagation models are faster than feature-level models

DCWC Offline Complexity Score-level propagation is very difficult to implement offline ̵ The score can only be computed online w.r.t the query. For feature-level propagations, ̵ The time complexity of the SF model for offline implementation is acceptable; 62.2 hours, or 2.6 days to re-index 8 billion pages ̵ The time complexity of the HF model is out of tolerance hours, or 45 days to re-index 8 billion pages ̵ The ST model is easy for parallel implementation while the parallel implementation of the HF model is non-trivial

DCWC Conclusions of this Study Generally speaking, relevance propagation can boost the performance of web information retrieval. Sitemap-based propagation models outperform hyperlink-based propagation models in terms of both effectiveness and efficiency. Notably, sitemap-based propagation can be implemented in parallel. Score-level propagation and feature-level propagation have almost similar effectiveness. Although the former is more efficient in on-line implementations, it is not practical for real-world search engines because it can not be implemented offline. Overall speaking, sitemap-based feature propagation model is the best choice for real search engines.

Thanks!