Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.

Slides:



Advertisements
Similar presentations
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Advertisements

Natural Language Processing WEB SEARCH ENGINES August, 2002.
Web Search – Summer Term 2006 IV. Web Search - Crawling (part 2) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Link Analysis David Kauchak cs160 Fall 2009 adapted from:
Web Search – Summer Term 2006 VI. Web Search - Indexing (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VI. Web Search - Indexing (c) Wolfgang Hürst, Albert-Ludwigs-University.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Architecture of the 1st Google Search Engine SEARCHER URL SERVER CRAWLERS STORE SERVER REPOSITORY INDEXER D UMP L EXICON SORTERS ANCHORS URL RESOLVER (CF.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
The PageRank Citation Ranking “Bringing Order to the Web”
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
Web Search – Summer Term 2006 IV. Web Search - Crawling (c) Wolfgang Hürst, Albert-Ludwigs-University.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page.
Link Analysis, PageRank and Search Engines on the Web
Web Search – Summer Term 2006 VII. Selected Topics - The Hilltop Algorithm (c) Wolfgang Hürst, Albert-Ludwigs-University.
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Link Structure and Web Mining Shuying Wang
Web Search – Summer Term 2006 VII. Selected Topics - Metasearch Engines [1] (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )
Information Retrieval
Web Search – Summer Term 2006 V. Web Search - Page Repository (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.
1 Intelligent Crawling Junghoo Cho Hector Garcia-Molina Stanford InfoLab.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University.
Presented By: - Chandrika B N
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Web Spam Detection with Anti- Trust Rank Vijay Krishnan Rashmi Raj Computer Science Department Stanford University.
Graph-based Algorithms in Large Scale Information Retrieval Fatemeh Kaveh-Yazdy Computer Engineering Department School of Electrical and Computer Engineering.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Anatomy of a search engine Design criteria of a search engine Architecture Data structures.
CSM06 Information Retrieval Lecture 4: Web IR part 1 Dr Andrew Salway
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Web Mining Class Nam Hoai Nguyen Hiep Tuan Nguyen Tri Survey on Web Structure Mining
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Overview of Web Ranking Algorithms: HITS and PageRank
1 Efficient Crawling Through URL Ordering by Junghoo Cho, Hector Garcia-Molina, and Lawrence Page appearing in Computer Networks and ISDN Systems, vol.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Search Engine Architecture
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Categories of Presented Papers Papers Ranking Results – S. Brin and L. Page. The Page Rank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.
Search Engine-Crawler Symbiosis: Adapting to Community Interests
Web Search – Summer Term 2006 VII. Web Search - Indexing: Structure Index (c) Wolfgang Hürst, Albert-Ludwigs-University.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
1 CS 430: Information Discovery Lecture 5 Ranking.
CSE326: Data Structures World Wide What? Hannah Tang and Brian Tjaden Summer Quarter 2002.
The anatomy of a Large-Scale Hypertextual Web Search Engine.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
1 Efficient Crawling Through URL Ordering Junghoo Cho Hector Garcia-Molina Lawrence Page Stanford InfoLab.
Search Engine Architecture
Lecture #11 PageRank (II)
Information Retrieval
Junghoo “John” Cho UCLA
Information Retrieval and Web Design
Presentation transcript:

Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University

Personalized and Topic-Sensitive PageRank Uniform distribution (1-d) to model jumps to random web pages is not realistic (bookmarks, known URLs,...) Idea : Use jump probability for personalization, i.e. - Instead of identity matrix: Weighting based on personal preferences (e.g. bookmarks) - Problem: Pre-calculation impossible (personal data!) and online calculation to expensive Another characteristic of PageRank: query independent - Might be critical if page with high PageRank accidentally gets selected as relevant - Idea: Create a topic-sensitive PageRank

Topic-Sensitive PageRank Basic idea : 1. Identify topic that might be interesting for the user (e.g. via classification of the query, eval. of context,...) 2. Use pre-calculated, topic-sensitive PageRank Similarities to personalization but - Fixed, pre-specified topics (can be pre-calculated!) - Depending on the actual situation (more flexible) Topic specific PageRank rank jd : Normally: Identity matrix (a ij = 1 or 1/N) Now: Topics c 1,..., c n, e.g. the 16 top-level categories from the Open directory project Topic dependent weighting (1/|T i |) Advantage: Can be calculated in advance SOURCE: [5]

Topic-Sensitive PageRank (cont.) Question : Which one to select during run time? Idea : Automatic classification of the topic based on the query q given by the user Extension: Consider context q' of query q, e.g. - surrounding text if query was entered via highlighting - based on the history (if available) etc. Calculation (e.g.) using a unigram language model: SOURCE: [5]

Topic-Sensitive PageRank (cont.) Alternative approach : Use probabilities, i.e. - Weighted summation of all topic specific PageRanks for one document - Weights: Depending on the probability of a particular topic being relevant given the query q - Definition: Query-Sensitive Importance Score s qd In practice: Usually just take the three topic-sensitive PageRanks with highest probability Disadvantages: - Fixed set of topics - Depends on training set SOURCE: [5]

References - Indexing [1] A. ARASU, J. CHO, H. GARCIA-MOLINA, A. PAEPCKE, S. RAGHAVAN: "SEARCHING THE WEB", ACM TRANSACTIONS ON INTERNET TECHNOLOGY, VOL 1/1, AUG Chapter 5 (Ranking and Link Analysis) [2] S. BRIN, L. PAGE: "THE ANATOMY OF A LARGE-SCALE HYPERTEXTUAL WEB SEARCH ENGINE", WWW 1998 Chapter 2 and [3] BORDER, KUMAR, MAGHOUL, RAGHAVAN, RAJAGOPALAN, STATA, TOMKINS, WIENER: "GRAPH STRUCTURE IN THE WEB", WWW 2000 [4] PAGE, BRIN, MOTWANI, WINOGRAD: "THE PAGERANK CITATION RANKING: BRINGING ORDER TO THE WEB", STANFORD TECHNICAL REPORT [5] HAVELIWALA: "TOPIC-SENSITIVE PAGERANK", WWW 2002

General Web Search Engine Architecture CLIENT QUERY ENGINE RANKING CRAWL CONTROL CRAWLER(S) USAGE FEEDBACK RESULTS QUERIES WWW COLLECTION ANALYSIS MOD. INDEXER MODULE PAGE REPOSITORY INDEXES STRUCTUREUTILITYTEXT (CF. [1] FIG. 1)