1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

Slides:



Advertisements
Similar presentations
Md. Mahbub Hasan University of California, Riverside.
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
Jure Leskovec, CMU Lars Backstrom, Cornell Ravi Kumar, Yahoo! Research Andrew Tomkins, Yahoo! Research.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
The Connectivity and Fault-Tolerance of the Internet Topology
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2013 Lecture 4.
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.
Mining and Searching Massive Graphs (Networks)
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
Discussion #36 Spanning Trees
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Computer Science 1 Web as a graph Anna Karpovsky.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.
Chapter 5 Algorithm Analysis 1CSCI 3333 Data Structures.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Log Dimension Hypothesis1 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University MITACS International Problem Solving Workshop July 2012.
GRAPH Learning Outcomes Students should be able to:
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 9.1 Chapter 9 : Social Networks What is a social.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Graph Partitioning Problem Kernighan and Lin Algorithm
Equality Function Computation (How to make simple things complicated) Nitin Vaidya University of Illinois at Urbana-Champaign Joint work with Guanfeng.
Exponential and Logarithmic Functions Exponents and Exponential Functions Exponential and Logarithmic Functions Objectives Review the laws of exponents.
Mining Social Network Graphs Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Complex Networks Measures and deterministic models Philippe Giabbanelli.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
How Do “Real” Networks Look?
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Anonymous communication over social networks Shishir Nagaraja and Ross Anderson Security Group Computer Laboratory.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
 2004 SDU 1 Algorithm Informally speaking, an algorithm is a collection of simple instructions for carrying out a task. Example:  Elementary arithmetic.
Models of Web-Like Graphs: Integrated Approach
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Graphs David Kauchak cs302 Spring Admin HW 12 and 13 (and likely 14) You can submit revised solutions to any problem you missed Also submit your.
Cohesive Subgraph Computation over Large Graphs
CSC317 Graph algorithms Why bother?
Minimum Spanning Tree 8/7/2018 4:26 AM
Introduction to Web Mining
Lectures on Network Flows
How Do “Real” Networks Look?
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Why Social Graphs Are Different Communities Finding Triangles
How Do “Real” Networks Look?
Lectures on Graph Algorithms: searching, testing and sorting
Modelling and Searching Networks Lecture 2 – Complex Networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Introduction to Web Mining
Lecture 10 Graph Algorithms
Presentation transcript:

1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins

2 The fundamental question… Given graph with millions/billions of nodes, how do we understand it?

3 Macroscopic Success Stories Given graph with millions/billions of nodes, how do we understand it? Spectral Graph Analysis –Eigenvalues reveal intuition for mixing time, connectivity Conductance of a graph Degree distribution

4 Macroscopic models of graphs: Understanding connectivity Bow tie model [Broder et al] Web graph Jellyfish model [Faloutsos et al] Internet AS graph No equivalent model for bipartite graphs

5 Our Goals Develop macroscopic tools to analyze social networks –Massive networks –What are simple, easy-to-understand properties? –Today: KNC-plot for bipartite graphs Given implicit graph representation, do something smarter than explicitly building graph –Bipartite representation gives an implicit graph –Our algorithms never build actual graph –Same spirit as work of [Feder, Motwani 95]

6 Outline Definition of the KNC-plot –k-neighborhood graph Analysis of real social networks using the KNC-plot Description of algorithm

7 The k-neighborhood graph, G k Given bipartite graph B, users on left, interests on right Connect two users if they share at least k interests in common

8 The k-neighborhood graph, G k Given bipartite graph B, users on left, interests on right Connect two users if they share at least k interests in common G1G1

9 Given bipartite graph B, users on left, interests on right Connect two users if they share at least k interests in common The k-neighborhood graph, G k G2G2

10 Given bipartite graph B, users on left, interests on right Connect two users if they share at least k interests in common The k-neighborhood graph, G k G3G3

11 Illustration k=1

12 Illustration k=2

13 Illustration k=3

14 Illustration k=4

15 Illustration k=5

16 The KNC-plot The k-neighbor connectivity plot –How many connected components does G k have? –What is the size of the largest component? Answers the question: how many shared interests are meaningful? –Communities, Cuts

17 Analysis Four graphs: –LiveJournal Blogging site, users can specify interests –Y! query logs (interests = queries) Queries issued for Yahoo! Search (Try it at –Content match (users = web pages, interests = ads) Ads shown on web pages –Flickr photo tags (users = photos, interests = tags) All data anonymized, sanitized, downsampled –Graphs have 100s of thousands to a million users

18 Examples — Largest component — Number of components At k=5, all connected. At k=6, interesting! At k=6, nobody connected

19 Examples — Largest component — Number of components At k=5, all connected. At k=6, interesting! At k=6, nobody connected Content match Web pages = “users” Ads = “interests” Flickr Photos = “users” Tags = “interests”

20 Examples — Largest component — Number of components Connectivity smoothly varies “Heavy-tailed” At k=14, 10% connected At k=36, 1% connected

21 Examples — Largest component — Number of components Connectivity smoothly varies “Heavy-tailed” At k=14, 10% connected At k=36, 1% connected Y! queries Users = users Queries = “interests” LiveJournal Users = users Interests = interests

22 Algorithms Naïve implementation takes O(mn) time –Impractical for large graphs — Naïve — Ours For k = 2

23 Algorithms Naïve implementation takes O(mn) time –Impractical for large graphs Our implementation takes O(m 2-1/k ) time –Social networks are generally sparse –Faster for power-law distribution (no change in the algorithm) –Very fast for k=2, can trim graph for k=3, etc. Space O(km) — Naïve — Ours For k = 2

24 Alg-Intersect Roughly speaking, for every pair of users, determine whether they have k interests in common For each node u, record its neighborhood –For each node v, see if u’s and v’s neighborhoods intersect in at least k nodes –If so, connect them, otherwise don’t Takes O(nm) time (n= # nodes, m = # edges) Space = O(m)

25 Alg-Intersect Roughly speaking, for every pair of users, determine whether they have k interests in common For each node u  S, record its neighborhood –For each node v, see if u’s and v’s neighborhoods intersect in at least k nodes –If so, connect them, otherwise don’t Takes O(nm) time (n= # nodes, m = # edges) BUT: May explore only nodes in set S. –Takes O(|S|m) time Space = O(m)

26 Alg-Tuples Consider k=2. Suppose user 1 has interests {A,B,C} user 2 has interests {A,C,D} Create “virtual nodes” Connect user 1 to {AB}, {AC}, {BC} Connect user 2 to {AC}, {AD}, {CD} There is an edge between user 1 and user 2 in G k iff there is a virtual node that both are connected to.

27 Alg-Tuples For each node u, –Create virtual nodes for u (if not already created) –Connect u to those virtual nodes // (note: there are O( deg(u) k ) of them) Figure out connectivity of G k using virtual graph Runtime O(  u deg(u) k ) –Uses Union-Set structure –Edges not actually explicitly computed Space O (  u deg(u) k )

28 Combining them Run Alg-Intersect for some subset S of nodes –We know all edges in G k that go from u  S to any node v –Runtime O(|S|m) S Other nodes High degree nodes

29 Combining them Run Alg-Intersect for some subset S of nodes –We know all edges in G k that go from u  S to any node v –Runtime O(|S|m) Run Alg-Tuple on the rest of the nodes –We “know” all edges in G k that go from u  S to v  S –Runtime O(  u  S deg(u) k ) S Other nodes

30 Order u 1, u 2, … by decreasing deg(u i ) Initialize b=1. Increase b until  i≥b deg(u i ) k ≤ bm Let S = {u 1, u 2 …, u b } Run Alg-Intersect on nodes in S Run Alg-Tuple on nodes not in S –Connect the two Runtime is O(bm) + O(  i≥b deg(u i ) k ) = O(2bm) Finding S High degree nodes

31 Combining them Runtime is O(bm) + O(  i≥b deg(u i ) k ) But, for any graph, deg(u i ) ≤ m/i (by Markov) –Do not need power-law Hence, bm =  i≥b deg(u i ) k ≤  i≥b m k /i k = O( m k /b k ) So b = O(m 1-1/k )  Runtime is O(m 2-1/k )

32 Extensions Power-law distributed provably faster –O(m 1+(1-1/k)/  ) for power law with exponent  –Algorithm works exactly the same –No need to know whether power-law ahead of time When set of interests is logarithmic, can get quasi-linear time algorithms –Different algorithm –In paper

33 Conclusion KNC-plot useful tool –Exposes how meaningful shared interests are The k-neighborhood graph defined implicitly –Efficient algorithm for implicit graph –Other algorithms for G k, given bipartite representation Find additional social graph properties that are meaningful, computable –Describe macroscopic structure of social networks