Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.

Similar presentations


Presentation on theme: "Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University."— Presentation transcript:

1 Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University

2 The evolution of search engines [1] 1st generation : Use only "on page", text data - Word frequency, language 2nd gen. : Use off-page, web-specific data - Link (or connectivity) analysis - Click-through data (what results people click on) - Anchor-text (how people refer to a page) 3rd gen. : Answer the need behind the query - Semantic analysis - What is it about? - Focus on user need, rather than on query - Context determination - Helping the user - Integration of search and text analysis

3 Ranking based on link analysis (recap) First web search engines : 1. Search: Termprocessing to find pages 2. Ranking: Analysis of term distribution (and some other things) Current web search engines : 1. Search: Termprocessing to find pages (including anchor text) 2. Ranking: Link analysis (and some other things) Examples for link-based ranking : - PageRank - HITS

4 PageRank (recap) PageRank : Measures quality / importance of a web page using a recursive formula: - Content and query independent - Harder to manipulate than (e.g.) backlinks - Usage for ranking: Higher PageRank -> higher quality -> higher relevance - In practice: Combined with IR score - Intuitive interpretation: Random Surfer Model

5 PageRank: Further comments Other applications for PageRank exist (e.g. crawling, cf. the respective lecture) In addition, newer work exists that - extends the original work by Brin and Page - takes a closer look at its characteristics (with practical experiments and theoretical evaluations) In the following: A few selected examples

6 Communities and energy [2] Community = subgraph G I of somehow related web pages (e.g. web site, homepage, pages of a particular topic, etc.) where I is the set of all pages of G I and |I| is their number Energy E I of a community = sum of the PageRank of all pages in I: out(I) = set of all pages with links from I to pages outside of the community in(I) = set of all pages with links from outside of the community pointing to pages in I dp(I) = set of all pages with no links

7 Energy of a community Theorem : If E I in, E I out, and E I dp is the energy of all pages from in(I), out(I), and dp(I), respectively, then: with fi being the fraction of links pointing from page i to pages in I and d being the damping factor from the PageRank formula Proof: see [2] (not relevant for the exam) and

8 Energy of a community - Examples

9 How your own links can influence the energy of your site Distributing the content over several web pages increases the energy (note: not necessarily the PageRank of a single page!)

10 How your own links can influence the energy of your site Leaks drain energy from the community. This loss can be limited if pages pointing to them have a low PageRank and many links to other pages of the community.

11 How your own links can influence the energy of your site Links to the outside drain energy from the community. This loss can be limited, if the respective pages have a low PageRank and many links to other pages of the community.

12 Newer work related to PageRank More efficient calculation Storage (and speed) issues Updating Stability and sensitivity, influence of link structure, influence of different parameters Personalization, specialized PageRank Spam detection Better consideration of the web structure Consider continuous change of the web (e.g. other media types, change in structure,...)

13 Example for the influence of diff. param. PageRank as a function of the damping factor d (see [3]): 01 3 24 5 6789

14 References [1] A. BORDER: A TAXONOMY OF WEB SEARCH. SIGIR FORUM 36(2), 3-10, 2002 [2] BIANCHINI, GORI, SCARSELLI: INSIDE PAGERANK. ACM TRANSACTIONS ON INTERNET TECHNOLOGY, FEBRUARY 2005 [3] BOLDI, SANTINI, VIGNA: PAGERANK AS A FUNCTION OF THE DAMPING FACTOR. WWW 2005 CONFERENCE, MAY 2005


Download ppt "Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University."

Similar presentations


Ads by Google