Chapter IV: Link Analysis Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2011/12.

Slides:



Advertisements
Similar presentations
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Advertisements

The math behind PageRank A detailed analysis of the mathematical aspects of PageRank Computational Mathematics class presentation Ravi S Sinha LIT lab,
Information Networks Link Analysis Ranking Lecture 8.
Graphs, Node importance, Link Analysis Ranking, Random walks
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Link Analysis: PageRank
How Does a Search Engine Work? Part 2 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial-
CS345 Data Mining Page Rank Variants. Review Page Rank  Web graph encoded by matrix M N £ N matrix (N = number of web pages) M ij = 1/|O(j)| iff there.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
ICS 278: Data Mining Lecture 15: Mining Web Link Structure
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Link Analysis, PageRank and Search Engines on the Web
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
1 Hyperlink Analysis A Survey (In Progress). 2 Overview of This Talk  Introduction to Hyperlink Analysis  Classification of Hyperlink Analysis  Two.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Link Structure and Web Mining Shuying Wang
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.
Prestige (Seeley, 1949; Brin & Page, 1997; Kleinberg,1997) Use edge-weighted, directed graphs to model social networks Status/Prestige In-degree is a good.
Link Analysis HITS Algorithm PageRank Algorithm.
Information Retrieval Link-based Ranking. Ranking is crucial… “.. From our experimental data, we could observe that the top 20% of the pages with the.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Presented By: - Chandrika B N
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Link Analysis Hongning Wang
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Web Intelligence Web Communities and Dissemination of Information and Culture on the www.
How Does a Search Engine Work? Part 2 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial-
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Overview of Web Ranking Algorithms: HITS and PageRank
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Chapter 5: Link Analysis for Authority Scoring
Algorithmic Detection of Semantic Similarity WWW 2005.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Ranking Link-based Ranking (2° generation) Reading 21.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
Chapter 14: Link Analysis IRDM WS 2015 Money isn't everything... but it ranks right up there with oxygen. -- Rita Davenport We didn't know exactly what.
1 CS 430: Information Discovery Lecture 5 Ranking.
Winter Semester 2003/2004Selected Topics in Web IR and Mining1-1 1 Topic-specific Authority Ranking 1.1 Page Rank Method and HITS Method 1.2 Towards a.
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Search Engines and Link Analysis on the Web
Link Analysis 2 Page Rank Variants
Link-Based Ranking Seminar Social Media Mining University UC3M
DTMC Applications Ranking Web Pages & Slotted ALOHA
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Link Structure Analysis
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Presentation transcript:

Chapter IV: Link Analysis Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2011/12

Chapter IV: Link Analysis* IV.1 Page Rank IV.2 HITS Algorithm IV.3 Extensions & Comparisons IV.4 Personalized Link Analysis IV.5 Link Spam Detection IV.6 Distributed Link Analysis *mostly following Chapter 21 from Manning/Raghavan/Schütze, with additions from other sources November 22, 2011IR&DM, WS'11/12IV.2

IV.2 HITS: Hyperlink-Induced Topic Search Idea: Determine good content sources: Authorities (high indegree) good link sources: Hubs (high outdegree) Find better authorities that have good hubs as predecessors better hubs that have good authorities as successors For Web graph G = (V, E) define for nodes p, q  V Authority score and Hub score “Mutual reinforcement” between hubs & authorities November 22, 2011IR&DM, WS'11/12IV.3 [Kleinberg: JACM’99]

HITS as Eigenvector Computation November 22, 2011IR&DM, WS'11/12IV.4 Iteration with adjacency matrix E: → a and h are Eigenvectors of E T E and E E T, respectively Authority and hub scores in matrix notation: Intuitive interpretation: is the cocitation matrix: M (auth) ij is the number of nodes that point to both i and j is the bibliographic-coupling matrix: M (hub) ij is the number of nodes to which both i and j point with constants , 

HITS Algorithm November 22, 2011IR&DM, WS'11/12IV.5 Compute fixpoint solution by Iteration (until convergence) with length normalization: Initialization: a (0) = (1, 1,..., 1) T, h (0) = (1, 1,..., 1) T Repeat until sufficient convergence h (i+1) := E a (i) h (i+1) := h (i+1) / ||h (i+1) || 1 a (i+1) := E T h (i) a (i+1) := a (i+1) / ||a (i+1) || 1 Convergence is guaranteed under fairly general conditions: -For a symmetric n x n matrix M and a vector v that is not orthogonal to the principal Eigenvector w(M), the unit vector in the direction of M k v converges to w(M), for k → ∞.

Implementation & Tuning of HITS November 22, 2011IR&DM, WS'11/12IV.6 1)Determine a sufficient number r of “root pages” (e.g., pages) via relevance ranking (e.g., using TF*IDF ranking) 2)Add all successors of root pages 3)For each root page add up to d predecessors 4)Compute iteratively authority and hub scores of this “expansion set” (e.g., pages) with initialization a i := h i := 1 / |expansion set| and L 1 normalization after each iteration  converges to principal Eigenvectors of EE T and E T E, resp. 5)Return pages in descending order of authority scores (e.g., the 10 largest elements of vector a) “Drawback” of HITS algorithm:  relevance ranking (by keywords) within root set is not considered

expansion set Example: Construction of HITS Graph November 22, 2011IR&DM, WS'11/12IV root set keyword query

Improved HITS Algorithm [Bharat, Henzinger: SIGIR’98] November 22, 2011IR&DM, WS'11/12IV.8 Potential weakness of the HITS algorithm: irritating links (automatically generated links, spam, etc.) topic drift (e.g., from “Jaguar car” to “car” in general) Improvement: Introduce edge weights: 0 for links within the same host, 1/k with k links from k URLs of the same host to 1 URL (aweight) 1/m with m links from 1 URL to m URLs on the same host (hweight) Consider relevance weights w.r.t. query topic (e.g., TF*IDF)  Iterative computation of authority score hub score

Finding Related URLs [Bharat, Henzinger: WWW’99] November 22, 2011IR&DM, WS'11/12IV.9 Cocitation algorithm: Determine up to B predecessors of given URL u For each predecessor p determine up to BF successors  u Determine among all siblings s of u those with the largest number of predecessors that point to both s and u (degree of cocitation) Companion algorithm (next slide): Determine appropriate base set for URL u (“vicinity” of u) Apply HITS algorithm to this base set

Companion Algorithm for Finding Related URLs [Bharat, Henzinger: WWW’99] November 22, 2011IR&DM, WS'11/12IV.10 Companion algorithm: 1) Determine expansion set: u plus up to B predecessors of u and for each predecessor p up to BF successors  u plus up to F successors of u and for each successor c up to FB predecessors  u with elimination of stop URLs (e.g., 2) Duplicate elimination: Merge nodes both of which have more than 10 successors and have 95 % or more overlap among their successors 3) Compute authority scores using the improved HITS algorithm (previous slide)

Application of HITS for “Community Detection” November 22, 2011IR&DM, WS'11/12IV.11 Root set may contain multiple topics or communities, e.g., for queries “jaguar”, “Java”, or “randomized algorithm”. Approach: Compute k largest Eigenvalues of E T E and the corresponding Eigenvectors a (authority scores) (e.g., using SVD on E) For each of these k Eigenvectors a, the largest authority scores indicate a densely connected “community”

IV.3 Comparison/Extensions of PR & HITS PRHITS Matrix constructionstaticquery time Matrix sizehugemoderate Stochastic matrixyesno Dampening by random jumpsyesno Outdegree normalizationyesno Preference for bipartite coresnoyes Score stability to perturbationsyesno Resilience to topic driftn/ano Resilience to spamnono But: features of PR can be incorporated into HITS extensions, and vice versa November 22, 2011IR&DM, WS'11/12IV.12

Comparison of PR/HITS with In/Out-Degree [Najork, Zaragoza, Taylor: HITS on the Web: How does it Compare? SIGIR’07] November 22, 2011IR&DM, WS'11/12IV.13 Comparative study of link analysis methods: PageRank, HITS, in/out-degree in combination with field-weighted BM25F all – all links id – only inter-domain links ih – only inter-host links (based on a crawl of 463 million web pages containing 17.6 billion hyperlinks and referencing 2.9 billion distinct URLs; and a set of 28,043 queries sampled from a query log)

November 22, 2011IR&DM, WS'11/12IV.14 Comparison of PR/HITS with In/Out-Degree [Najork, Zaragoza, Taylor: HITS on the Web: How does it Compare? SIGIR’07]

SALSA: Random Walk on Hubs and Authorities [Lempel/Moran, TOIS 2001] View each node v of the link graph as two nodes v h and v a Construct bipartite undirected graph G’(V’,E’) from link graph G(V,E): V’ = {v h | v  V and outdegree(v)>0}  {v a | v  V and indegree(v)>0} E’ = {(v h, w a ) | (v,w)  E} November 22, 2011IR&DM, WS'11/12IV G(V,E) 1h 2h 4h 5h 2a 3a 4a 6a G’(V’,E’)

November 22, 2011IR&DM, WS'11/12IV.16 SALSA: Random Walk on Hubs and Authorities [Lempel/Moran: TOIS 2001] Stochastic hub matrix H: for i, j and k ranging over all nodes with (i h,k a ), (j h,k a )  E’ Stochastic authority matrix A: for i, j and k ranging over all nodes with (k h,i a ), (k h,j a )  E’ The corresponding Markov chains are ergodic in each connected component. Stationary solution:  [v h ] ~ outdegree(v) for H,  [v a ] ~ indegree(v) for A Further extension with random jumps: PHITS (Probabilistic HITS) → Perform PageRank-like random walks for both Hubs and Authorities:

More Link Analysis Variants Hub-Averaging: Authority-Threshold (only k best authorities per hub): Max-Authority (authority threshold with k=1): with Breadth-First-Search (transitive citations up to depth k): where N (j) (q) are nodes that have a path to q by alternating o OUT and i IN steps with j=o+i November 22, 2011IR&DM, WS'11/12IV.17 and

Comparison of Link Analysis Methods: Queries Source: Borodin et al., ACM TOIT 2005 Experimental setup: 34 queries rootsets of 200 pages each obtained from Google base sets computed using Google with first 50 predecessors per page November 22, 2011IR&DM, WS'11/12IV.18

Comparison of Link Analysis Methods: November 22, 2011IR&DM, WS'11/12IV.19 Source: Borodin et al., ACM TOIT 2005

So who’s the winner? November 22, 2011IR&DM, WS'11/12IV.20 Source: Borodin et al., ACM TOIT 2005 Comparison of Link Analysis Methods: Key Authorities

Link Analysis for Query “Classical Guitar” (1) November 22, 2011IR&DM, WS'11/12IV.21 Source: Borodin et al., ACM TOIT 2005

Link Analysis for Query “Classical Guitar” (2) November 22, 2011IR&DM, WS'11/12IV.22 Source: Borodin et al., ACM TOIT 2005

Link Analysis for Query “Classical Guitar” (3) November 22, 2011IR&DM, WS'11/12IV.23 Source: Borodin et al., ACM TOIT 2005

Efficiency of PageRank Computation [Kamvar/Haveliwala/Manning/Golub 2003] November 22, 2011IR&DM, WS'11/12IV.24 Exploit block structure of the link graph: 1)Partition link graph by domain names 2) Compute local PR vector of pages within each block  LPR(i) for page i 3) Compute block rank of each block: a) block link graph B with b) run PR computation on B, yielding BR(I) for block I 4) Approximate global PR vector using LPR and BR: a) set x j (0) := LPR(j)  BR(J) where J is the block that contains j b) run PR computation on A Speeds up convergence by factor of 2 in good “block cases”, however unclear how effective it is in general. Dotplot of the Web graph: Point (i,j) for link between pages i and j (pages sorted by URL)

Efficiency of Storing PageRank Vectors [T. Haveliwala: Internet Computing 2003] November 22, 2011IR&DM, WS'11/12IV.25 Memory-efficient encoding of PR vectors (important for large number of topic-specific vectors)  100 topics * 20 Bio. pages * 4 Bytes would cost 8 TB Key idea: Map real PR scores to n cells and encode cell no into ceil(log 2 n) bits Approx. PR score of page i is the mean score of the cell that contains i Should use non-uniform partitioning of score values to form cells Possible encoding schemes: Equi-depth partitioning: choose cell boundaries such that is the same for each cell Equi-width partitioning with log values: first transform all PR values into log PR, then choose equi-width boundaries Cell no. could be variable-length encoded (e.g., using Huffman code)

SimRank: Measuring Structural Context Similarity [Jeh, Widom: KDD 02] November 22, 2011IR&DM, WS'11/12IV.26 {A,A} {A,B} {B,B} {sugar, frosting} {sugar, flour} {frosting, frosting} {frosting, eggs} {frosting, flour} {eggs, eggs} {eggs, flour} {sugar, eggs} O i (a) – i-th object in a’s inlink neighborhood I i (a) – i-th object in a’s outlink neighborhood and constants c 1,c 2  [0,1] Idea: Two objects are similar if they relate to objects that are similar objects! B sugar frosting eggs flour Person A Person B

IV.4 Topic-specific and Personalized PageRank random walk: uniformly random choice of links + biased jumps to personal favorites (or trusted pages, or...) Idea: random jumps favor designated high-quality pages (B) such as personal bookmarks, frequently visited pages, etc. with Authority (page q) = stationary prob. of visiting q [see also: Haveliwala 2002, Jeh 2003, Benczur 2004, Gyöngyi 2004, Guha 2004] November 22, 2011IR&DM, WS'11/12IV.27

Topic-specific PageRank [Haveliwala: TKDE 2003] November 22, 2011IR&DM, WS'11/12IV.28 Given: A (small) set of topics c k, each with a set T k of authorities (taken from a directory such as ODP ( or bookmark collection) Idea : Change the PageRank random walk by biasing the random-jump probabilities to the topic authorities T k : p k =  C p +(1  ) r k with usual transition matrix C and (r k ) j = 1/|T k | for j  T k, 0 else (instead of r j = 1/n) Method: 1) Precompute topic-specific PageRank vectors p k 2) Classify user query q (incl. query context) w.r.t. each topic c k  probability w k := P[c k | q] 3) Total authority score of doc d is

Performance of Topic-Specific PageRank [Haveliwala: TKDE 2003] November 22, 2011IR&DM, WS'11/12IV.29

Personalized PageRank November 22, 2011IR&DM, WS'11/12IV.30 Linearity Theorem: Let r 1 and r 2 be personal preference vectors for random-jump targets, and let p 1 and p 2 denote the corresponding PPR vectors. Then for all  1,  2  0 with  1 +  2 = 1 the following holds:  1 p 1 +  2 p 2 =  C (  1 p 1 +  2 p 2 ) + (1  ) (  1 r 1 +  2 r 2 ) Corollary: For preference vector r with m non-zero components and base vectors e k (k=1..m) with (e k ) i =1 for i=k, 0 for i  k, we obtain: with constants  1...  m and for PPR vector p with p k =  C p k +(1  ) e k PageRank equation: p =  C p +(1  ) r Goal: Efficient computation and efficient storage of user-specific personalized PageRank vectors (PPR) [for further optimizations see Jeh/Widom: WWW 2003]

Exploiting Click Streams November 22, 2011IR&DM, WS'11/12IV.31 Simple idea: Modify HITS or Page-Rank algorithm by weighting edges with the relative frequency of users clicking on a link (as observed by DirectHit, Alexa, etc.). More sophisticated approach: [Chen et al.: WISE 2002] Consider link graph A and link-visit matrix V (V ij =1 if user i visits page j, 0 else): Define authority score vector:a =  A T h + (1-  )V T u hub score vector: h =  Aa + (1-  )V T u user importance vector: u = (1-  )V(a+h) with a tunable parameter  (with  =1: HITS,  =0: DirectHit)

Link Analysis based on Implicit Links November 22, 2011IR&DM, WS'11/12IV.32 Apply simple data mining to browsing sessions of many users, where each session i is a sequence (pi 1, pi 2,...) of visited pages: consider all pairs (pi j, pi j+1 ) of successively visited pages, compute their total frequency f, and select those with f above some min-support threshold. Construct implicit-link graph with the selected page pairs as edges and their normalized total frequencies f as edge weights. [Xue et al.: SIGIR 2003] Apply edge-weighted Page-Rank for authority scoring, and linear combination of query relevance and link authority for overall scoring.

Exploiting Query Logs and Click Streams [Luxenburger et al.: WISE 2004, WebDB 2006] November 22, 2011IR&DM, WS'11/12IV.33 QRank : links+ query-doc transitions + query-query transitions + doc-doc transitions on implicit links (w/ thesaurus) with probabilities estimated from log statistics max planck gesellschaft max planck mpg budget A.M. MPII with

QRank Experiments November 22, 2011IR&DM, WS'11/12IV.34 Setup: Wikipedia docs, 18 volunteers posing Trivial-Pursuit queries ca. 500 queries, ca. 300 refinements, ca positive clicks ca implicit links based on doc-doc similarity Results (assessment by blind-test users): QRank top-10 result preferred over PageRank in 81% of all cases QRank has 50.3% PageRank has 33.9% Untrained example query “philosophy”: PageRankQRank 1. PhilosophyPhilosophy 2. GNU free doc. licenseGNU free doc. license 3. Free software foundationEarly modern philosophy 4. Richard StallmanMysticism 5. DebianAristotle

Additional Literature for Sections IV.1-2 November 22, 2011IR&DM, WS'11/12IV.35 Web Graph Modeling in General: A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J.L. Wiener: Graph structure in the Web, WWW 2000 G.W. Flake, S. Lawrence, C.L. Giles, F. Coetzee: Self-Organization and Identification of Web Communities. IEEE Computer 35(3), 2002 Link Analysis Principles & Algorithms: J.M. Kleinberg: Authoritative Sources in a Hyperlinked Environment, JACM 46(5), 1999 S Brin, L. Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine, WWW 1998 K. Bharat, M. Henzinger: Improved Algorithms for Topic Distillation in a Hyperlinked Environment, SIGIR 1998 J. Dean, M. Henzinger: Finding Related Pages in the World Wide Web, WWW 1999 C. Ding, X. He, P. Husbands, H. Zha, H. Simon: PageRank, HITS, and a Unified Framework for Link Analysis, SIAM Int. Conf. on Data Mining, A. Borodin, G.O. Roberts, J.S. Rosenthal, P. Tsaparas: Link analysis ranking: algorithms, theory, and experiments. ACM TOIT 5(1), 2005 A.N. Langville, C.D. Meyer: Deeper inside PageRank. Internet Math., 1(3), 2004 A.N. Langville, C.D. Meyer: Google‘s PageRank and Beyond – The Science of Search Engine Rankings, Princeton University Press, 2006 M. Najork, H. Zaragoza, M. Taylor: HITS on the Web: How does it Compare?, SIGIR 2007

Comparison and Extensions of Link Analysis Algorithms: R. Lempel, S. Moran: SALSA: The Stochastic Approach for Link-Structure Analysis, ACM TOIS 19(2), G. Jeh, J. Widom: SimRank: a Measure of Structural-Context Similarity, KDD 2002 A. Borodin, G.O. Roberts, J.S. Rosenthal, P. Tsaparas: Finding Authorities and Hubs from Link Structures on the World Wide Web, WWW 2001 Matt Richardson, Pedro Domingos: The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank, NIPS 2001 C. Ding, X. He, P. Husbands, H. Zha, H. Simon: PageRank, HITS, and a Unified Framework for Link Analysis, SIAM Int. Conf. on Data Mining, M. Bianchini, M. Gori, F. Scarselli: Inside PageRank.ACM TOIT 5(1), 2005 A.N. Langville, C.D. Meyer: Deeper inside PageRank. Internet Math., 1(3), 2004 A.X. Zheng, A.Y. Ng, M.I.: Stable Algorithms for Link Analysis, SIGIR 2001 November 22, 2011IR&DM, WS'11/12IV.36 Additional Literature for Sections IV.3

Efficient PageRank Computation: Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, Gene H. Golub: Extrapolation methods for accelerating PageRank computations. WWW 2003: Taher H. Haveliwala: Efficient Encodings for Document Ranking Vectors (Extended Abstract). International Conference on Internet Computing 2003: 3-9 Personalized Authority Scoring: Taher Haveliwala: Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search, IEEE Trans. on Knowledge and Data Engineering, G. Jeh, J. Widom: Scaling personalized web search, WWW D. Fogaras, B. Racz, K. Csalogany, A. Benczur: Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments, Internet Mathematics 2(3): , J. Luxenburger, G. Weikum: Query-Log Based Authority Analysis for Web Information Search, WISE 2004 J. Luxenburger, G. Weikum: Exploiting Community Behavior for Enhanced Link Analysis and Web Search, WebDB 2006 November 22, 2011IR&DM, WS'11/12IV.37 Additional Literature for Sections IV.4