Web Information retrieval (Web IR)

Slides:

Advertisements

Similar presentations

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

Advertisements

Application of Ensemble Models in Web Ranking

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

LIS618 lecture 9 Web retrieval Thomas Krichel

Information Retrieval Models: Probabilistic Models

More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.

1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft

Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.

ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.

Relevance Propagation for Web Search Dr. Tie-Yan Liu Web Search and Mining Group Microsoft Research Asia Joint Work with Tao Qin, Tsinghua University.

Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.

Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.

Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Google and the Page Rank Algorithm Székely Endre

Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.

Using Hyperlink structure information for web search.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

CPSC 534L Notes based on the Data Mining book by A. Rajaraman and J. Ullman: Ch. 5.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.

CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.

Scaling Personalized Web Search Authors: Glen Jeh, Jennfier Widom Stanford University Written in: 2003 Cited by: 923 articles Presented by Sugandha Agrawal.

Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.

Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University

1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)

CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.

Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 

© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

Hongbo Deng, Michael R. Lyu and Irwin King

Post-Ranking query suggestion by diversifying search Chao Wang.

Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.

Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002.ACM. Improvement of HITS-based Algorithms.

Web Information retrieval (Web IR)

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Autumn Web Information retrieval (Web IR) Handout #11:FICA: A Fast Intelligent Crawling Algorithm Ali Mohammad Zareh Bidoki ECE Department, Yazd.

Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.

IR Theory: Web Information Retrieval. Web IRFusion IR Search Engine 2.

Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.

Evaluation Anisio Lacerda.

HITS Hypertext-Induced Topic Selection

PageRank and Markov Chains

CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.

Web Information retrieval (Web IR)

Panagiotis G. Ipeirotis Luis Gravano

Feature Selection for Ranking

Web Information retrieval (Web IR)

Relevance and Reinforcement in Interactive Browsing

PageRank PAGE RANK (determines the importance of webpages based on link structure) Solves a complex system of score equations PageRank is a probability.

Learning to Rank with Ties

Presentation transcript:

Web Information retrieval (Web IR) Handout #12: Combinational Ranking Ali Mohammad Zareh Bidoki ECE Department, Yazd University alizareh@yaduni.ac.ir Autumn 2011

Ranking Algorithm Problems Rich-get- richer (Connectivity based) Low precision (at most 0.30) Each ranking algorithm operates well in some situations Autumn 2011

Combinational Ranking Content + connectivity +??? How can we combine these features? R=f( query, content, connectivity) Autumn 2011

Relevance propagation Model (by Shakery) A hyper score (h) is computed for each document. WI and WO are weighting functions for in-link and out-link pages, respectively. S (p) is similarity between query q and page p(self relevance): Autumn 2011

Three Iterative Models Weighted In-Link Weighted Out-Link Uniform Out-Link Autumn 2011

Weighted In-Link This model of user behavior is quite similar to Random surfer, except that it is not query-independent. The probability that the random surfer visits a page is its hyper-relevance score. Autumn 2011

Weighted Out-Link In this model, we assume that given a page to a user, he reads the content of the page with probability alpha and he traverses the outgoing edges with probability (1-alpha). The pages that are linked from a page do not have the same impact on its weight. Pages whose contents are more similar to the query are assumed to have more impact on the score of the page than those which are less similar. Autumn 2011

Uniform Out-Link In this special case, they assume that at each page, the user reads the content of the page, and with probability (1-alpha) he reads all the pages that are linked from the page. Autumn 2011

Algorithm Implementation Algorithm is run on a working set Working set construction: They first find the top 100000 pages which have the highest content similarity to the query From these 100000 pages, a small number (about 200) of the most similar pages are selected to be the core set of pages. They then expand the core set to the working set by adding the pages that are among the 100000 pages and which point to the pages in the core set or are pointed to by the pages in the core set Autumn 2011

Algorithm Properties It is Online?? Recursive Query independent It is shown on TREC Weighted In-Link outperforms others Autumn 2011

Frequency Propagation (By Song) Instead of Propagation of score, frequency of query terms are propagated We can use it online It is used based on site structure Autumn 2011

Propagation Formula ft(p) is the frequency of tem t in page p f’t(p) is the frequency of tem t in page p after propagation Autumn 2011

Overall Framework for propagation SS is the best ST & HT-WI are similar Autumn 2011

Combinational Ranking Algorithms Based on learning (Learning to Rank) Autumn 2011

Combination Framework Learning System q1:{(x11,4),(x12,3),…(x1m,0)} q2:{(x21,3),(x22,2),…(x2m,1)} …. qn:{(xn1,4),(xn2,3),…(xnm,2)} Training Set Ranking Model g(x,w) Ranking System (x1,?), (x2,?),… Test Set (x1,g(x1,w)) (x2,g(x2,w)) (x3,g(x2,w)) … Labels (Relevance judgments or click orders) Autumn 2011

Three learning categories Point wise Pair wise List wise Autumn 2011