Sublinear Algorithms for Personalized PageRank, with Applications

Slides:



Advertisements
Similar presentations
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Advertisements

1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Link Analysis: PageRank
A Probabilistic Model for Road Selection in Mobile Maps Thomas C. van Dijk Jan-Henrik Haunert W2GIS, 5 April 2013.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Entropy Rates of a Stochastic Process
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Link Analysis, PageRank and Search Engines on the Web
1 Fast Incremental Proximity Search in Large Graphs Purnamrita Sarkar Andrew W. Moore Amit Prakash.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Efficient Search Engine Measurements Maxim Gurevich Technion Ziv Bar-Yossef Technion and Google.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
FAST-PPR: Personalized PageRank Estimation for Large Graphs Peter Lofgren (Stanford) Joint work with Siddhartha Banerjee (Stanford), Ashish Goel (Stanford),
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
Purnamrita Sarkar (Carnegie Mellon) Andrew W. Moore (Google, Inc.)
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Hidden Markov Models BMI/CS 576
Introduction to Sampling based inference and MCMC
Spatial Data Management
Learning Deep Generative Models by Ruslan Salakhutdinov
Stochastic tree search and stochastic games
Sequential Algorithms for Generating Random Graphs
Search Engines and Link Analysis on the Web
Intro to Sampling Methods
FORA: Simple and Effective Approximate Single­-Source Personalized PageRank Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, Yin Yang School of Information.
Link-Based Ranking Seminar Social Media Mining University UC3M
DTMC Applications Ranking Web Pages & Slotted ALOHA
Gephi Gephi is a tool for exploring and understanding graphs. Like Photoshop (but for graphs), the user interacts with the representation, manipulate the.
Probabilistic Robotics
CS 4/527: Artificial Intelligence
Markov chain monte carlo
Density Independent Algorithms for Sparsifying
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Spring 2007
Haim Kaplan and Uri Zwick
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Randomized Algorithms CS648
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
CS 188: Artificial Intelligence
Asymmetric Transitivity Preserving Graph Embedding
Active Learning with Unbalanced Classes & Example-Generation Queries
Topological Signatures For Fast Mobility Analysis
Sequential Learning with Dependency Nets
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Interesting Algorithms for Real World Problems
Presentation transcript:

Sublinear Algorithms for Personalized PageRank, with Applications Ashish Goel Joint work with Peter Lofgren; Sid Banerjee; C Seshadhri

Personalized PageRank Assume a directed graph with n nodes and m edges ----- Meeting Notes (7/23/15 17:56) ----- Animate

Motivation: Personalized Search Sentence defining PPR

Motivation: Personalized Search Re-ranked by PPR

Applications Personalized Web Search [Haveliwala, 2003] Product Recommendation [Baluja, et al, 2008] Friend Recommendation (SALSA) [Gupta et al, 2013] ----- Meeting Notes (7/23/15 17:56) ----- Another example

This screenshot shows my twitter home page This screenshot shows my twitter home page. The main view is a collection of tweets from people that I choose to follow. On the left sidebar is a prominent module called ‘Who to follow’ which displays other users that I might want to follow. This module is present in a variety of places throughout Twitter. In fact, we present user recommendations contextually in many other places in the Twitter product.

A Dark Test for Twitter’s People Recommendation System Run various algorithms to predict follows, but don’t display the results. Instead, just observe how many of the top predictions get followed organically (Money = Personalized PageRank on a bipartite graph; Love = HITS) [Bahmani, Chowdhury, Goel; 2010]

Promoted Tweets and Promoted Accounts

Applications Community Detection Personalized PageRank [Yang, Lescovec 2015],[Andersen, Chung, Lang 2006] Say Heat Kernel is a random walk score like PageRank, but it uses a Poisson length distribution rather than Geometric length distribution.

Estimation Goal Transition:

The Challenge Every user has different score vector: Full pre-computation: O(n2) Computing from scratch previously took Ω(n) time—several minutes on Twitter-2010

Previous Algorithms Summary Monte-Carlo: Sample random walks. (Local) Power-Iteration: Iteratively improve estimates based on recursive equation

Results Preview (Experiment) Runtime on Twitter-2010 (1.5 billion edges) 10+ min 6 min Running Time per Estimate (s) Say 120x 3s Monte-Carlo Power Iteration Bidirectional Mean relative error set to ≈10% for all algorithms.

Results Preview (Theory) Task: estimate of size within relative error Previous Algorithms: Monte Carlo: Power Iteration/ Local Update: Bidirectional Estimator for average target: # Nodes # Edges ----- Meeting Notes (7/23/15 17:56) ----- Merge into challenge? On Twitter-2010, n=40M, m=1.5B, =40K

Generalizations Arbitrary starting distributions. Uniform ⇒ Global PageRank in average time Other Walk Length Distributions like Heat Kernel (used in community detection [Kloster, Gleich 2014],[Chung 2007]): Our estimator is 100x faster on 4 graphs Arbitrary Discrete Markov Chain Transpose O(m) time ----- Meeting Notes (7/23/15 17:56) ----- Formulas don't matter Think of this as more applications

Previous Algorithm: Monte-Carlo [Avrachenkov, et al 2007]

Previous Algorithm: Local Update [Andersen, et al 2007] Computes from all s to a single t Works from t backwards along edges, updating Personalized PageRank estimates locally. Running time for average t: Average Degree Additive Error

Local Update Background 0.2 0.8 State they’re equivalent because stationary distribution.

Local Update Example Consider pre-adding alpha*r to p

Local Update Example ----- Meeting Notes (7/23/15 17:56) ----- One per slide. Write equations. Add box saying r and p are .... Use previous graph?

Local Update Example

Local Update Example

Local Update Example

Local Update Example

Analogy: Bidirectional Shortest Path For PPR, using path length doesn’t seem to work.

Bidirectional Estimation The estimates p and residuals r satisfy a loop invariant [Anderson, et al 2007]: Describe trade-off for r_max Reinterpret the residuals as an expectation!

Bidirectional-PPR Algorithm Draw Picture

Number of samples Every walk gives a sample, with Maximum value rmax Expected value at least ± Number of walks needed to get a (1 +/- ²)-approximation with high probability =

Bidirectional-PPR Example

Theoretical Results Given any graph $G$, teleport probability $\alpha$, source node (or distribution) $s$, target $t$, FAST-PPR estimates $\pi_s(t)$ of size $\delta$ within relative error $\epsilon$ (with high probability).

Forward vs Reverse Work Trade-off u More Reverse Push Fewer Walks More Walks Fewer Reverse Pushes Forward Walks Reverse Pushes 31

Theoretical Results Time-Space Trade-off Given any graph $G$, teleport probability $\alpha$, source node (or distribution) $s$, target $t$, FAST-PPR estimates $\pi_s(t)$ of size $\delta$ within relative error $\epsilon$ (with high probability). ----- Meeting Notes (7/29/15 18:01) ----- d bar delta to n Time-Space Trade-off [Lofgren, Banerjee, Goel, Seshadhri 2014; Lofgren, Banerjee, Goel 2015]

Problem: Unbalanced Forward and Reverse Runtime 1000 s Reverse 0.5 s Forward Runtime (s) 0.1 ms Reverse Global PageRank of Target

Heuristic: Balancing Forward and Reverse Runtime 1000 s Reverse 50s Forward and Reverse 0.5 s Forward Runtime (s) Median is 25 ms Forward and Reverse 0.1 ms Reverse Global PageRank of Target

Experiments

Experimental Results: 70x Faster Running Time (Targets PageRank) 5min 20s 3s Running Time per Estimate (ms) 50ms We’re 70x faster on all graphs! On UK 40s -> 0.6s. Now I’m going to describe how to set an important parameter… Mean relative error set to ≈10% for all algorithms.

Alternative Estimator for Undirected Graphs Key property: We push forwards from s, and take random walks from t. Picture ----- Meeting Notes (7/29/15 18:01) ----- O~(m/epsilon^2)

Alternative Algorithm for Undirected Graphs Loop Invariant of push-forward algorithm [Andersen, Chung, Lang, 2006] Use symmetry, and then interpret as expectation Push forward, random walks backwards

Alternative Algorithm for Undirected Graphs Loop Invariant of push-forward algorithm [Andersen, Chung, Lang, 2006] rmax Use symmetry, and then interpret as expectation Push forward, random walks backwards

Running time for Undirected Graphs Push forward, random walks backwards

Open Problems Get rid of the dependence on degree, to get an amortized bound of O(±1/2) Get a worst-case bound of O(m1/2) for directed graphs under the condition that the target has a high global PageRank Find sharding and sampling algorithms that preserve Personalized PageRank (eg. a sparsifier for Personalized PageRank?) Build an index around Personalized PageRank to enable network based Personalized Search

Open Problems Get rid of the dependence on degree, to get an amortized bound of O(±1/2) Get a worst-case bound of O(m1/2) for directed graphs under the condition that the target has a high global PageRank Find sharding and sampling algorithms that preserve Personalized PageRank (eg. a sparsifier for Personalized PageRank?) Build an index around Personalized PageRank to enable network based Personalized Search

Personalized Search Problem Given A network with nodes (with keywords) and edges (weighted, directed)—Twitter A query, filtering nodes to a set T— “People named Adam” A user s (or distribution over nodes) —me Rank the approximate top-k targets by

Personalized Search Problem Baselines: Monte Carlo: Needs many walks to find enough samples within T unless T is very large Bidirectional-PPR to each t: Slow unless T is small Challenge: Can we efficiently find top-k for any size of T? Idea: Modify Bidirectional-PPR to sample in proportion to

Personalized Search Example b t1 c People Named Adam Searching User s t2 t3 Expand targets Random walks To sample a target: layer 1: sample (a,b,c) w.p. (0, 10%, 90%) Layer 2: b→ t1 c→ sample (t1, t2, t3) w.p (56%, 22%, 22%)

Personalized Search Running Time Runtime on Twitter-2010 (1.5 billion edges) 1000 100 Running Time per Search (s) 10 1 ----- Meeting Notes (7/29/15 18:01) ----- log scale point 0.1 10 100 1000 10,000 Target Set Size Precision@3 set to 90% for all algorithms. Significant Pre-computation (3-30MB per keyword)

Personalized Search Result

Demo

Demo personalizedsearchdemo.com Task: Find applications of entropy in computer networking. personalizedsearchdemo.com

Distributed PageRank Problem: Computing PageRank on graph too large for one machine. Algorithm: Shard edges randomly, compute on each machine average results Basic idea: Duplicate edges from low-degree nodes. Gives an unbiased* estimator. ----- Meeting Notes (7/27/15 17:50) ----- Explain why communication is expensive.

Sharded Global PageRank Accuracy on Twitter-2010 L1 Error Mean relative error? Min Degree Parameter Average 2.1x Duplication No Edge Duplication