Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)

Slides:



Advertisements
Similar presentations
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Summarizing Distributed Data Ke Yi HKUST += ?. Small summaries for BIG data  Allow approximate computation with guarantees and small space – save space,
Analysis and Modeling of Social Networks Foudalis Ilias.
On RAM PRIORITY QUEUES MIKKEL THORUP. Objective Sorting is a basic technique for a lot of algorithms. e.g. find the minimum edge of the graph, scheduling,
1 CS 361 Lecture 5 Approximate Quantiles and Histograms 9 Oct 2002 Gurmeet Singh Manku
Algorithms for data streams Foundations of Data Science 2014 Indian Institute of Science Navin Goyal.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Routing, Anycast, and Multicast for Mesh and Sensor Networks Roland Flury Roger Wattenhofer RAM Distributed Computing Group.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo Department of Combinatorics and Optimization Joint work with Isaac.
Graph Sparsifiers: A Survey Nick Harvey Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,
Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.
Network Design Adam Meyerson Carnegie-Mellon University.
Convex Hull Algorithms for Dynamic Data Kanat Tangwongsan Joint work with Guy Blelloch and Umut Acar (TTI-C)
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Link Analysis, PageRank and Search Engines on the Web
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
CS 591 A11 Algorithms for Data Streams Dhiman Barman CS 591 A1 Algorithms for the New Age 2 nd Dec, 2002.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Distributed Verification and Hardness of Distributed Approximation Atish Das Sarma Stephan Holzer Danupon Nanongkai Gopal Pandurangan David Peleg 1 Weizmann.
Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Near Optimal Streaming algorithms for Graph Spanners Surender Baswana IIT Kanpur.
Graph Sparsifiers Nick Harvey Joint work with Isaac Fung TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
Spanning and Sparsifying Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopSpanning and Sparsifying1.
3.3 Complexity of Algorithms
Subsampling Graphs 1. RECAP OF PAGERANK-NIBBLE 2.
Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)
Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.
Graph Partitioning using Single Commodity Flows
Breadth First Search and Depth First Search. Greatest problem in Computer Science Has lead to a lot of new ideas and data structures Search engines before.
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Sampling in Graphs Alexandr Andoni (Microsoft Research)
Sampling Based Range Partition for Big Data Analytics + Some Extras Milan Vojnović Microsoft Research Cambridge, United Kingdom Joint work with Charalampos.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Purnamrita Sarkar (Carnegie Mellon) Andrew W. Moore (Google, Inc.)
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.
New Characterizations in Turnstile Streams with Applications
Algorithmic Efficency
Sequential Algorithms for Generating Random Graphs
Approximating the MST Weight in Sublinear Time
Open Problems in Streaming
Finding Frequent Items in Data Streams
European Symposium on Algorithms – ESA
Sublinear Algorithms for Personalized PageRank, with Applications
Density Independent Algorithms for Sparsifying
CIS 700: “algorithms for Big Data”
Haim Kaplan and Uri Zwick
Objective of This Course
Randomized Algorithms CS648
Matrix Martingales in Randomized Numerical Linear Algebra
CSCI B609: “Foundations of Data Science”
Chapter 2.
Range-Efficient Computation of F0 over Massive Data Streams
Lecture 6: Counting triangles Dynamic graphs & sampling
Presentation transcript:

Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)

PageRank – Determine Ranking of nodes in graphs Typically large graphs - WWW, Social Networks Run daily by commercial search engines

PageRank computation u a b c

PageRank Computation Our Approach: No Matrix-Vector Multiplication! u a b c

Our Result Many Random Walk Samples Efficiently. Approximate PageRank u

Other results from Random Walks We can estimate: Mixing Time Conductance Using Streams G u

Streaming 7 e 1, e 2, e 3, e 4, e 5, e 6, e 7, …. Input is a “stream” Small RAM working memory Few Passes Frequency moments, quantiles Graphs: Edges, arbitrary order

Related Work Sparsifiers (Benczur-Karger 96, Spielman-Teng 01, Spielman-Srivastava 08) – Given an undirected graph, produces a sparse one – approximately preserves x’Lx – Can be used to compute sparse cuts Streaming version of BK96 (Ahn, Guha 09) – Sparse cuts in 1 pass and O(n) space. Accelarated Page Rank (McSherry 08) – heuristics 8 ~

Key Idea One walk from u length l efficiently Later extend to Many walks u v l

Single Random Walk - Naive Algo. One Step with every Pass! Constant Space Passes s

Second Naive Algo Single Pass Sample sufficient edges! If, then sample 2 out-edges from each node. (store order) s

Comparison Naive (single walk): Our Result: In fact walks! u l Automatically:

Insight: Merge Short Walks Sample fraction of nodes (centers) passes - length walks Merge and extend short walks! Two problems: End up at node second time End up at non-sampled node s w w w w w w w a b

Stuck Nodes Sample an edge from stuck. Again. And again... Slow? If new nodes, good in passes! s w w w w w w w

Stuck nodes Stuck on same Nodes? Sample s edges from each s progress OR new node! Must include to set previous seen centers s w w w w w w w w w s s s ss s

Summary s w w w w w w w w w s s s ss s Perform short walks from sampled centers Concatenate walks until stuck Sample edges from stuck Make local progress until new node Local progress = s New node : center with prob Amortized progress, every pass

Summary s w w w w w w w w w s s s ss s Total number of passes : Total Space :

Summary s w w w w w w w w w s s s ss s Set Number of passes = Space =

Many Walks Naive Space Bound: Observation: Many short walks not used in Single RW. We show:

Many Random Walks : probability node ’s short walk used in single RW. If known : save lot of space! Perform K random walks Total number of short walks required is about Don’t know. But can estimate.

Estimating Run K =  (log n) walks of length Gives a crude estimate of Sufficient to double K Continue doubling K Gives K walks in space Passes u l

Distributions samples Distribution: u Space Passes

Mixing Time, Conductance Undirected graphs: Compare Distribution with Steady State. Estimating difference: samples. [Batu et. al.’ 01] – approximate mixing time. Directed, till distribution “stabilizes”: samples. Conductance: Recall space for walks:

Results recap - Mixing Time for Undirected Graphs : Quadratic Approximation to Conductance PageRank to accuracy

Open Questions? Improve passes for random walks. In particular, sub-linear space and constant passes. Graph Cuts and Graph Sparsification for directed graphs Better (streaming) algorithms for computing eigenvectors

Thank You!

Summary Perform short walks from sampled centers Concatenate walks until stuck Sample edges from stuck Make local progress until new node Local progress = s New node = nodes gives center Amortized, every pass -

Summary Perform short walks from sampled centers Concatenate walks until stuck Sample edges from stuck Make local progress until new node Local progress = s New node = nodes gives center Amortized, every pass -

Analysis Total number of passes : Total Space : Set Number of passes = Space =