Scaling Personalized Web Search Authors: Glen Jeh, Jennfier Widom Stanford University Written in: 2003 Cited by: 923 articles Presented by Sugandha Agrawal.

Slides:

Advertisements

Similar presentations

Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:

Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

TrustRank Algorithm Srđan Luković 2010/3482

1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.

CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.

Link Analysis: PageRank

CS345 Data Mining Page Rank Variants. Review Page Rank  Web graph encoded by matrix M N £ N matrix (N = number of web pages) M ij = 1/|O(j)| iff there.

More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.

Experiments with MATLAB Experiments with MATLAB Google PageRank Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan

DATA MINING LECTURE 12 Link Analysis Ranking Random walks.

Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou

CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.

Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.

© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Anatomy of a Large-Scale Hypertextual Web Search Engine ECE 7995: Term.

1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.

Link Analysis, PageRank and Search Engines on the Web

Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.

Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.

CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.

(hyperlink-induced topic search)

1 COMP4332 Web Data Thanks for Raymond Wong’s slides.

Chapter 8 Web Structure Mining Part-1 1. Web Structure Mining Deals mainly with discovering the model underlying the link structure of the web Deals with.

Google and the Page Rank Algorithm Székely Endre

More Algorithms for Trees and Graphs Eric Roberts CS 106B March 11, 2013.

CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:

Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.

Google’s PageRank: The Math Behind the Search Engine Author:Rebecca S. Wills, 2006 Instructor: Dr. Yuan Presenter: Wayne.

Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα

Presented By: - Chandrika B N

X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.

The Technology Behind. The World Wide Web In July 2008, Google announced that they found 1 trillion unique webpages! Billions of new web pages appear.

Page Rank Done by: Asem Battah Supervised by: Dr. Samir Tartir Done by: Asem Battah Supervised by: Dr. Samir Tartir.

Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,

The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.

1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:

CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.

The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

Overview of Web Ranking Algorithms: HITS and PageRank

Web Search Algorithms By Matt Richard and Kyle Krueger.

Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.

PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.

1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.

A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.

Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.

- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.

1 CS 430: Information Discovery Lecture 5 Ranking.

Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.

Glen Jeh & Jennifer Widom KDD  Many applications require a measure of “similarity” between objects.  Web search  Shopping Recommendations  Search.

Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.

SimRank: A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom Stanford University ACM SIGKDD 2002 January 19, 2011 Taikyoung Kim SNU.

A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.

Mathematics of the Web Prof. Sara Billey University of Washington.

Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.

Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)

Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.

PageRank & Random Walk “The important of a Web page is depends on the readers interest, knowledge and attitudes…” –By Larry Page, Co-Founder of Google.

The PageRank Citation Ranking: Bringing Order to the Web

HITS Hypertext-Induced Topic Selection

Search Engines and Link Analysis on the Web

Link Analysis 2 Page Rank Variants

PageRank and Markov Chains

CSE 454 Advanced Internet Systems University of Washington

CSE 454 Advanced Internet Systems University of Washington

The Anatomy of a Large-Scale Hypertextual Web Search Engine

PageRank & Random Walk “The important of a Web page is depends on the readers interest, knowledge and attitudes…” –By Larry Page, Co-Founder of Google.

Author: Kazunari Sugiyama, etc. (WWW2004)

Web Information retrieval (Web IR)

Presentation transcript:

Scaling Personalized Web Search Authors: Glen Jeh, Jennfier Widom Stanford University Written in: 2003 Cited by: 923 articles Presented by Sugandha Agrawal

Topics  How PageRank works  Personal PageRank Vector (PPV)  Algorithms to scale effectively computation of PPV  Experimental results

Brief introduction to PageRank  At the time of its conception by Larry Page and Sergey Brin, search engines usually employed highest keyword density algorithms.  Linked web structure used to score importance of a web page  Recursive notion that important pages are those linked-to by many important pages.  Simple PageRank does not incorporate user preferences when displaying search results.

Brief introduction to PageRank  Random surfer  Random surfer model – Imagine trillions of surfers browsing web. The model finds the expected % of surfers expected to be looking at page p at any one time. The convergence is independent of the distribution of starting points. Reflects a “democratic” importance with no preference for any particular pages. Hmmm…how can we incorporate user preferences??

Personalized PageRank Vector (PPV)

Assume every page has at least 1 out neighbor!

How to solve computing PPV

Not quite solved yet

Decomposition of hub vectors  In order to compute and store the hub vectors efficiently, we can further break them down into… Partial vector Partial vector –unique component Hubs skeleton Hubs skeleton –encode interrelationships among hub vectors Construct into full hub vector during query time  Saves computation time and storage due to sharing of components among hub vectors

Inverse P-distance  Hub vector r p can be represented as inverse P-distance vector l(t) – the number of edges in path t P(t) – the probability of traveling on path t  We will use r p (q) to denote both inverse P-distance and the personalized PageRank score.

Partial Vectors Partial Vector Paths that going through some page

Still not good enough…

Partial Vectors Hubs skeleton Handling the case p or q is itself in H Paths that go through some page

Hubs vectors = partial vectors + hubs skeleton

Overview of the whole process Pre- computed of partial vectors Hubs skeleton may be deferred to query time

Choice of H

Algorithms  Decomposition theorem  Basic dynamic programming algorithm  Partial vectors - Selective expansion algorithm  Hubs skeleton - Repeated squaring algorithm

Decomposition theorem

Basic Dynamic programming algorithm

Selective Expansion Algorithm

Repeated Squaring Algorithms  The error is squared on each iteration – reduces error much faster.

Experiments  Perform experiments using real web data from Stanford’s WebBase, containing 80 million pages after removing leaf pages  Experiments were run using a 1.4 gigahertz CPU on a machine with 3.5 gigabytes of memory  Partial vector approach is much more effective when H contains high-PageRank pages  H was taken from the top 1000 to the top 100,000 pages with the highest PageRank

Experiments  Compute hubs skeleton for |H|=10,000  Average size is 9021 entries, much less than dimensions of full hub vectors Instead of using the entire set rp(H), using only the highest m enteries Hub vector containing 14 million nonzero entries can be constructed from partial vectors in 6 seconds

The End