Download presentation
Presentation is loading. Please wait.
1
Relevance Propagation for Web Search Dr. Tie-Yan Liu Web Search and Mining Group Microsoft Research Asia Joint Work with Tao Qin, Tsinghua University.
2
2006-3-13DCWC 2006 2 Outline Introduction Generic framework for relevance propagation Evaluations ̵ Effectiveness analysis ̵ Complexity analysis Conclusions
3
2006-3-13DCWC 2006 3 Introduction Web Search ≠ Information Retrieval ̵ Beside the content relevance, various structure information also plays an important role in Web search Hyperlink graph Local sitemap Webpage layout
4
2006-3-13DCWC 2006 4 Introduction Three ways of utilizing the structure information for Web search ̵ Linear combination of content relevance and importance scores computed from hyperlink graph β∙Relevance + (1-β)∙ PageRank ̵ Enhance link analysis with the help of content relevance Query-dependent link graph in HITS Topic-sensitive PageRank ̵ Propagate content relevance along the Web structure The use of anchor text in Search Engines Hyperlink-based relevance score propagation (TREC 2003) Sitemap-based feature propagation (TREC 2004)
5
2006-3-13DCWC 2006 5 Hyperlink-based Relevance Score Propagation ( Zhai et al, TREC2003) Assumption ̵ Hyperlinked pages have correlated content links outlinks
6
2006-3-13DCWC 2006 6 Hyperlink-based Relevance Score Propagation ( Zhai et al, TREC2003) Assumption ̵ Hyperlinked pages have correlated content Propagation model ̵ Weighted inlink model ̵ Weighted outlink model ̵ Uniform outlink model Original relevance score Propagation from the inllinks Propagation from the outlinks
7
2006-3-13DCWC 2006 7 Sitemap-based Feature Propagation (Liu and Qin, TREC2004) Assumption ̵ Child pages are extensions of their parent page ̵ One should consider the contribution of the child pages while computing the relevance of the parent page to a query. Propagation model
8
2006-3-13DCWC 2006 8 Generic Relevance Propagation Framework Modification of the sitemap-based feature propagation model Reminder of the hyperlink-based propagation model A generic framework to cover both hyperlink-based and sitemap-based propagations
9
2006-3-13DCWC 2006 9 More Derived Propagation Models Score levelFeature level Hyperlink Hyperlink based score propagation model Sitemap Sitemap based feature propagation model Hyperlink-based Feature Propagation Model Weighted inlink model Weighted outlink model Uniform outlink model Sitemap-based Score Propagation Model
10
2006-3-13DCWC 2006 10 Summary: All Models Covered by the Generic Framework AlgorithmAbbreviation Weighted in-link case of hyperlink based score propagation modelHS-WI Weighted out-link case of hyperlink based score propagation modelHS-WO Uniform out-link case of hyperlink based score propagation modelHS-UO Weighted in-link case of hyperlink based feature propagation modelHF-WI Weighted out-link case of hyperlink based feature propagation modelHF-WO Uniform out-link case of hyperlink based feature propagation modelHF-UO Sitemap based score propagation modelSS Sitemap based feature propagation modelSF
11
2006-3-13DCWC 2006 11 Benchmark Datasets Corpora ̵.GOV 1M pages Queries: TD 2003, 2004 ̵ MSN 2M pages Query: 100 most popular queries from MSN query log Base Ranking function ̵ BM2500
12
2006-3-13DCWC 2006 12 Experimental Results (1) TREC 2003
13
2006-3-13DCWC 2006 13 Experimental Results (2) TREC 2004
14
2006-3-13DCWC 2006 14 Experimental Results (3) MSN
15
2006-3-13DCWC 2006 15 Conclusions on Effectiveness In general, relevance propagation can boost the search performance with proper parameter settings; The sitemap-based models are more effective than the hyperlink-based models; ̵ Hyperlinks ≠ Content Correlation, while the pages in the same sub site usually talk about correlated topics. Detailed comparisons ̵ The two sitemap-based models have similar performance. ̵ Among the hyperlink-based models, the HF-WI model performs best.
16
2006-3-13DCWC 2006 16 Online Complexity w is the size of the working set, q is the number of query terms, l is the average number of inlinks / outlinks, t is the number of iterations. For the SS model, the complexity is O(w), ̵ The SS model needs to propagate the relevance score of a page to its parent only once if we conduct the propagation from the leaf nodes in a bottom-up manner. For the SF model, the complexity is O(qw). For the HS models, the complexity is O(twl) ̵ In each step of t iterations of the HS models, we need to propagate the relevance score of a page along its in-link or out- link in the sub graph of the working set. For the HF models, the complexity is O(tqwl).
17
2006-3-13DCWC 2006 17 Online Complexity AlgorithmComplexity average waverage laverage taverage q CPU time HS-WIO(twl)6796.511.07.4-47.9 HS-WOO(twl)6796.511.06.5-36.5 HS-UOO(twl)6796.511.06.6-39.8 HF-WIO(tqwl)6796.511.09.11.554.0 HT-WOO(tqwl)6796.511.011.11.563.3 HF-UOO(tqwl)6796.511.08.91.551.6 SSO(w)10000.0-1-1.9 SFO(qw)10000.0-138.3 The sitemap-based models are more efficient than the hyperlink-based models The score-level propagation models are faster than feature-level models
18
2006-3-13DCWC 2006 18 Offline Complexity Score-level propagation is very difficult to implement offline ̵ The score can only be computed online w.r.t the query. For feature-level propagations, ̵ The time complexity of the SF model for offline implementation is acceptable; 62.2 hours, or 2.6 days to re-index 8 billion pages ̵ The time complexity of the HF model is out of tolerance. 1083 hours, or 45 days to re-index 8 billion pages ̵ The ST model is easy for parallel implementation while the parallel implementation of the HF model is non-trivial
19
2006-3-13DCWC 2006 19 Conclusions of this Study Generally speaking, relevance propagation can boost the performance of web information retrieval. Sitemap-based propagation models outperform hyperlink-based propagation models in terms of both effectiveness and efficiency. Notably, sitemap-based propagation can be implemented in parallel. Score-level propagation and feature-level propagation have almost similar effectiveness. Although the former is more efficient in on-line implementations, it is not practical for real-world search engines because it can not be implemented offline. Overall speaking, sitemap-based feature propagation model is the best choice for real search engines.
20
Thanks! tyliu@microsoft.com http://research.microsoft.com/users/tyliu/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.