Fan Chung University of California, San Diego The PageRank of a graph.

Slides:



Advertisements
Similar presentations
05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.
Advertisements

Partitional Algorithms to Detect Complex Clusters
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Approximation Algorithms for Unique Games Luca Trevisan Slides by Avi Eyal.
Interchanging distance and capacity in probabilistic mappings Uriel Feige Weizmann Institute.
Artur Czumaj Dept of Computer Science & DIMAP University of Warwick Testing Expansion in Bounded Degree Graphs Joint work with Christian Sohler.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Random Walks Ben Hescott CS591a1 November 18, 2002.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
Analysis of Network Diffusion and Distributed Network Algorithms Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.
Semidefinite Programming
Fan Chung Graham University of California, San Diego.
Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
15-853Page :Algorithms in the Real World Indexing and Searching III (well actually II) – Link Analysis – Near duplicate removal.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
22C:19 Discrete Math Graphs Spring 2014 Sukumar Ghosh.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Spectral Graph Theory and Applications Advanced Course WS2011/2012 Thomas Sauerwald He Sun Max Planck Institute for Informatics.
Institute for Advanced Study, April Sushant Sachdeva Princeton University Joint work with Lorenzo Orecchia, Nisheeth K. Vishnoi Linear Time Graph.
1 Random Walks on Graphs: An Overview Purnamrita Sarkar, CMU Shortened and modified by Longin Jan Latecki.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Spectral graph theory Network & Matrix Computations CS NMC David F. Gleich.
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
Data Structures & Algorithms Graphs
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
L – Modelling and Simulating Social Systems with MATLAB Lesson 6 – Graphs (Networks) Anders Johansson and Wenjian Yu (with S. Lozano.
Complex Networks: Models Lecture 2 Slides by Panayiotis TsaparasPanayiotis Tsaparas.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
When Affinity Meets Resistance On the Topological Centrality of Edges in Complex Networks Gyan Ranjan University of Minnesota, MN [Collaborators: Zhi-Li.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Models and Algorithms for Complex Networks Introduction and Background Lecture 1.
Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.
Miniconference on the Mathematics of Computation
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Multi-way spectral partitioning and higher-order Cheeger inequalities University of Washington James R. Lee Stanford University Luca Trevisan Shayan Oveis.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Random Walk for Similarity Testing in Complex Networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Intrinsic Data Geometry from a Training Set
Minimum Spanning Tree 8/7/2018 4:26 AM
Link-Based Ranking Seminar Social Media Mining University UC3M
Eigenvalues of a Graph Scott Grayson.
Density Independent Algorithms for Sparsifying
Structural Properties of Low Threshold Rank Graphs
Degree and Eigenvector Centrality
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Graph Theory What is a graph?.
Properties and applications of spectra for networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Presentation transcript:

Fan Chung University of California, San Diego The PageRank of a graph

What is PageRank? Ranking the vertices of a graph Partially ordered sets + graph theory -Applications --web search,ITdata sets, xxxxxxxxxxxxxpartitioning algorithms,… -Theory --- correlations among vertices A new graph invariant for dealing with:

fan

Outline of the talk Motivation Local cuts and the Cheeger constant using:  eigenvectors  random walks  PageRank  heat kernel Define PageRank Four versions of the Cheeger inequality Four partitioning algorithms Green’s functions and hitting time

What is PageRank? What is Rank?

What is PageRank? PageRank is defined on any graph.

An induced subgraph of the collaboration graph with authors of Erdös number ≤ 2.

A subgraph of the Hollywood graph.

A subgraph of a BGP graph

Yahoo IM graph Reid Andersen 2005 The Octopus graph

Graph Theory has 250 years of history. Leonhard Euler The Bridges of Königsburg Is it possible to walk over every bridge once and only once?

Geometric graphs Algebraic graphs General graphs Topological graphs

Massive data Massive graphs WWW-graphs Call graphs Acquaintance graphs Graphs from any data a.base protein interaction network Jawoong Jeong

Big and bigger graphsNew directions in graph theory

Many basic questions: Correlation among vertices? The `geometry’ of a network ? Quantitative analysis? distance, flow, cut, … eigenvalues, rapid mixing, … Local versus global?

Google’s answer: The definition for PageRank?

A measure for the “importance” of a website The “importance” of a website is proportional to the sum of the importance of all the sites that link to it.

A solution for the “importance” of a website Solvefor x

A solution for the “importance” of a website Solvefor x x = ρ A x Adjacency matrix

A solution for the “importance” of a website Solvefor x x = ρ A x

Graph models directed graphs weighted graphs (undirected) graphs

Graph models directed graphs weighted graphs (undirected) graphs

Graph models directed graphs weighted graphs (undirected) graphs \

In a directed graph, authority there are two types of “importance”: hub Jon Kleinberg 1998

Two types of the “importance” of a website Solveand x x = r A y Importance as Authorities : y Importance as Hubs : y = s A x T x = rs A A x T y = rs A A y T

Eigenvalue problem for n x n matrix:. n ≈ 30 billion websites Hard to compute eigenvalues Even harder to compute eigenvectors

In the old days, compute for a given (whole) graph. In reality, can only afford to compute “locally”.

A traditional algorithm Input: a given graph on n vertices. New algorithmic paradigm Input: access to a (huge) graph Efficient algorithm means polynomial algorithms n 3, n 2, n log n, n (e.g., for a vertex v, find its neighbors) Bounded number of access.

A traditional algorithm Input: a given graph on n vertices. New algorithmic paradigm Input: access to a (huge) graph Efficient algorithm means polynomial algorithms n 3, n 2, n log n, n (e.g., for a vertex v, find its neighbors) Bounded number of access. Exponential polynomial Infinity finite

The definition of PageRank given by Brin and Page is based on random walks.

Random walks in a graph. G : a graph P : transition probability matrix A lazy walk: := the degree of u.

A (bored) surfer either surf a random webpage with probability 1- α Original definition of PageRank α : the jumping constant or surf a linked webpage with probability α

Two equivalent ways to define PageRank pr(α,s) (1) s: the seed as a row vector Definition of personalized PageRank α : the jumping constant s

Two equivalent ways to define PageRank p=pr(α,s) (1) (2) s = Definition of PageRank the (original) PageRank some “seed”, e.g., s = personalized PageRank

Depends on the applications? How good is PageRank as a measure of correlationship? Isoperimetric properties How “good” is the cut?

Isoperimetric properties “What is the shortest curve enclosing a unit area?” In a graph G and an integer m, what is the minimum cut disconnecting a subgraph of ≥ m vertices? In a graph G, what is the minimum cut e(S,V-S) so that e(S,V-S) is the smallest? _____ Vol S

Two types of cuts: Vertex cut edge cut How “good” is the cut? S E(S,V-S)

e(S,V-S) _____ Vol S e(S,V-S) _____ |S| Vol S = Σ deg(v) v ε S |S| = Σ 1 v ε S S V-S

The Cheeger constant The Cheeger constant for graphs h G and its variations are sometimes called “conductance”, “isoperimetric number”, … The volume of S is

The Cheeger constant The Cheeger inequality : the first nontrivial eigenvalue of the xx(normalized) Laplacian of a connected graph.

The spectrum of a graph Many ways to define the spectrum of a graph. How are the eigenvalues related to properties of graphs? properties of graphs? Adjacency matrix

Combinatorial Laplacian diagonal degree matrix adjacency matrix Gustav Robert Kirchhoff The spectrum of a graph Adjacency matrix

Combinatorial Laplacian diagonal degree matrix adjacency matrix Gustav Robert Kirchhoff The spectrum of a graph Adjacency matrix Matrix tree theorem #spanning. trees

Combinatorial Laplacian diagonal degree matrix adjacency matrix Gustav Robert Kirchhoff The spectrum of a graph Adjacency matrix Normalized Laplacian Random walks Rate of convergence

The spectrum of a graph not symmetric in general Normalized Laplacian symmetric normalized { Discrete Laplace operator with eigenvalues { loopless, simple

The spectrum of a graph Normalized Laplacian symmetric normalized Discrete Laplace operator with eigenvalues = { { not symmetric in general

dictates many properties of a graph. expander diameter discrepancy subgraph containment …. Spectral implications for finding good cuts?

Finding a cut by a sweep For order the vertices Consider sets and the Cheeger constant of Define

Using a sweep by the eigenvector, can reduce the exponential number of choices of subsets to a linear number. Finding a cut by a sweep

Using a sweep by the eigenvector, can reduce the exponential number of choices of subsets to a linear number. Finding a cut by a sweep Still, there is a lower bound guarantee by using the Cheeger inequality.

Four types of Cheeger inequalities.  eigenvectors  random walks  PageRank  heat kernel Four proofs using: Leading to four different one-sweep partitioning algorithms.

graph spectral method random walks PageRank heat kernel spectral partition algorithm local partition algorithms Four proofs of Cheeger inequalities

Graph partitioning Local graph partitioning

What is a local graph partitioning algorithm? A local graph partitioning algorithm finds a small cut near the given seed(s) with running time depending only on the size of the output.

Examples of local partitioning

graph spectral method random walks PageRank heat kernel spectral partition algorithm local partition algorithms Four proofs of Cheeger inequalities

graph spectral method random walks PageRank heat kernel Lovasz, Simonovits, 90, 93 Spielman, Teng, 04 Andersen, Chung, Lang, 06 Chung, PNAS, 08. Four proofs of Cheeger inequalities Cheeger 60’s, Fiedler ’73 Alon 86’, Jerrum+Sinclair 89’

where is the first non-trivial eigenvalue of the Laplacian and is the minimum Cheeger ratio in a sweep using the eigenvector. Using eigenvector the Cheeger inequality can be stated as The Cheeger inequalityPartition algorithm,

Proof of the Cheeger inequality: from definition by Cauchy-Schwarz ineq. summation by parts. from the definition.

where is the minimum Cheeger ratio over sweeps by using a lazy walk of k steps from every vertex for an appropriate range of k. A Cheeger inequality using random walks Lovász, Simonovits, 90, 93 Leads to a Cheeger inequality:

Using the PageRank vector. A Cheeger inequality using PageRank Recall the definition of PageRank p=pr(α,s): (1) (2) Organize the random walks by a scalar α.

Random walks versus PageRank How fast is the convergence to the stationary distribution? Choose α to satisfy the required property. For what k, can one have ?

with seed as a subset S Using the PageRank vector and a Cheeger inequality can be obtained : where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheeger ratio over sweeps by using the appropriate personalized PageRank with seeds S. A Cheeger inequality using PageRank

over all f satisfying the Dirichlet boundary condition: Dirichlet eigenvalues for a subset for all

Local Cheeger constant for a subset

with seed as a subset S Using the PageRank vector and a Cheeger inequality can be obtained : where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheeger ratio over sweeps by using personalized PageRank with seed S. A Cheeger inequality using PageRank

Algorithmic aspects of PageRank Fast approximation algorithm for x personalized PageRank Can use the jumping constant to approximate PageRank with a support of the desired size. greedy type algorithm, almost linear complexity Errors can be effectively bounded.

A graph partition algorithm using PageRank Given a set S with randomly choose a vertex v in S. With probability at least the one-sweep algorithm using has an initial segment with the Cheeger constant at most

Graph partitioning using PageRank vector. 198,430 nodes and 1,133,512 edges

Kevin Lang 2007

graph spectral method random walks PageRank heat kernel Lovasz, Simonovits, 90, 93 Spielman, Teng, 04 Andersen, Chung, Lang, 06 Chung, PNAS, 08. Four proofs of Cheeger inequalities Fiedler ’73, Cheeger, 60’s Alon 86’

PageRank versus heat kernel Geometric sumExponential sum

PageRank versus heat kernel Geometric sumExponential sum recurrenceHeat equation

A Cheeger inequality using the heat kernel Theorem: where is the minimum Cheeger ratio over sweeps by using heat kernel pagerank over all u in S. Theorem: For

Definition of heat kernel

Using the upper and lower bounds, a Cheeger inequality can be obtained : where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheeger ratio over sweeps by using heat kernel with seeds S for appropriate t. A Cheeger inequality using the heat kernel

Covering and packing using PageRank Many applications of PageRank for problems in Graph Theory Pebbing and routing using PageRank Graph embedding using PageRank Relating graph invariants of subgraphs to the host graph using PageRank Your favorite old problem using PageRank? Graph drawing using PageRank

Random graphs with general degrees pageranks Algorithmic game theory, graphical games New Directions in Graph Theory for information networks Topics: Using: Spectral methods Probabilistic methods Quasirandom …