1 Dimension matching in Facebook and LinkedIn networks Anthony Bonato Ryerson University Seminar on Social Networks, Big Data, Influence, and Decision-Making.

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Week 4 – Random Graphs Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Universal Random Semi-Directed Graphs
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
CS728 Lecture 5 Generative Graph Models and the Web.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
Mining and Searching Massive Graphs (Networks)
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Advanced Topics in Data Mining Special focus: Social Networks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Computer Science 1 Web as a graph Anna Karpovsky.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
Networks - Bonato1 Modelling, Mining, and Searching Networks Anthony Bonato Ryerson University Master’s Seminar November 2012.
The Erdös-Rényi models
Lecture 6 - Models of Complex Networks II Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Log Dimension Hypothesis1 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University MITACS International Problem Solving Workshop July 2012.
Complex networks - Bonato1 Complex networks and their models Anthony Bonato Ryerson University Graduate Seminar October 2011.
1 Vertex-pursuit in heirarchical social networks Anthony Bonato Ryerson University TAMC’12 Complex Networks.
Week 3 - Complex Networks and their Properties
1 Burning a graph as a model of social contagion Anthony Bonato Ryerson University Institute of Software Chinese Academy of Sciences.
Random-Graph Theory The Erdos-Renyi model. G={P,E}, PNP 1,P 2,...,P N E In mathematical terms a network is represented by a graph. A graph is a pair of.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
Random Dot Product Graphs Ed Scheinerman Applied Mathematics & Statistics Johns Hopkins University IPAM Intelligent Extraction of Information from Graphs.
Week 1 – Introduction to Graph Theory I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
1 Modelling, Mining, and Searching Networks Anthony Bonato Ryerson University Graduate Seminar October 2015.
KPS 2007 (April 19, 2007) On spectral density of scale-free networks Doochul Kim (Department of Physics and Astronomy, Seoul National University) Collaborators:
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
How Do “Real” Networks Look?
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Miniconference on the Mathematics of Computation
1 How to burn a graph Anthony Bonato Ryerson University GRASCan 2015.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-line Social Networks Anthony Bonato Ryerson University ICMCM’09 December, 2009.
A short course on complex networks
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
1 Dimension matching in Facebook and LinkedIn networks Anthony Bonato Ryerson University Toronto, Canada ICMCE 2015.
Modelling, Mining, and Searching Networks Anthony Bonato Ryerson University Graduate Seminar September 2016.
Mining and Modeling Character Networks Anthony Bonato Ryerson University Complex Systems Seminar Queen Mary University of London.
Network (graph) Models
Lecture 23: Structure of Networks
Cohesive Subgraph Computation over Large Graphs
Modelling, Mining, and Searching Networks
Topics In Social Computing (67810)
Miniconference on the Mathematics of Computation
Peer-to-Peer and Social Networks
How Do “Real” Networks Look?
Lecture 23: Structure of Networks
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Lecture 13 Network evolution
How Do “Real” Networks Look?
Discrete Mathematics and its Applications Lecture 1 – Graph Theory
Modelling and Searching Networks Lecture 3 – ILT model
Lecture 23: Structure of Networks
Discrete Mathematics and its Applications Lecture 3 – ILT model
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Modelling and Searching Networks Lecture 5 – Random graphs
Modelling and Searching Networks Lecture 6 – PA models
18th Ontario Combinatorics Workshop On-line Social Networks
Discrete Mathematics and its Applications Lecture 5 – Random graphs
Network Models Michael Goodrich Some slides adapted from:
Discrete Mathematics and its Applications Lecture 6 – PA models
Presentation transcript:

1 Dimension matching in Facebook and LinkedIn networks Anthony Bonato Ryerson University Seminar on Social Networks, Big Data, Influence, and Decision-Making University of Toronto

Friendship networks network of on- and off-line friends form a large web of interconnected links 2Dimension matching in OSNs

6 degrees of separation 3 (Stanley Milgram, 67): famous chain letter experiment Dimension matching in OSNs

6 Degrees in Facebook? 1.31 billion users (Backstrom et al., 2012) –4 degrees of separation in Facebook –when considering another person in the world, a friend of your friend knows a friend of their friend, on average similar results for Twitter and other OSNs 4Dimension matching in OSNs

5 Complex networks in the era of Big Data web graph, social networks, biological networks, internet networks, … Dimension matching in OSNs

What is a complex network? no precise definition however, there is general consensus on the following observed properties 1.large scale 2.evolving over time 3.power law degree distributions 4.small world properties 6Dimension matching in OSNs

Examples of complex networks technological/informational: web graph, router graph, AS graph, call graph, graph social: on-line social networks (Facebook, Twitter, LinkedIn,…), collaboration graphs, co- actor graph biological networks: protein interaction networks, gene regulatory networks, food networks 7Dimension matching in OSNs

Properties of complex networks 1.Large scale: relative to order and size web graph: order > trillion –some sense infinite: number of strings entered into Google Facebook: > 1 billion nodes; Twitter: > 270 million nodes –much denser (ie higher average degree) than the web graph protein interaction networks: order in thousands 8Dimension matching in OSNs

Properties of complex networks 2.Evolving: networks change over time web graph: billions of nodes and links appear and disappear each day Facebook: grew to 1 billion users –denser than the web graph protein interaction networks: order in the thousands –evolves much more slowly 9Dimension matching in OSNs

10 Properties of Complex Networks 3.Power law degree distribution for a graph G of order n and i a positive integer, let N i,n denote the number of nodes of degree i in G we say that G follows a power law degree distribution if for some range of i and some b > 2, b is called the exponent of the power law Dimension matching in OSNs

Power laws in OSNs 11Dimension matching in OSNs

12 Graph parameters average distance: clustering coefficient: Dimension matching in OSNs

13 Properties of Complex Networks 4.Small world property introduced by Watts & Strogatz in 1998: –low distances diam(G) = O(log n) L(G) = O(loglog n) –higher clustering coefficient than random graph with same expected degree Dimension matching in OSNs

14 Sample data: Flickr, YouTube, LiveJournal, Orkut (Mislove et al,07): short average distances and high clustering coefficients Dimension matching in OSNs

15 Other properties of complex networks many complex networks (including on-line social networks) obey two additional laws: Densification power law (Leskovec, Kleinberg, Faloutsos,05): –networks are becoming more dense over time; i.e. average degree is increasing |(E(G t )| ≈ |V(G t )| a where 1 < a ≤ 2: densification exponent Dimension matching in OSNs

16 Decreasing distances (Leskovec, Kleinberg, Faloutsos,05): –distances (diameter and/or average distances) decrease with time (Kumar et al,06): Dimension matching in OSNs

Other properties Connected component structure: emergence of components; giant components Spectral properties: adjacency matrix and Laplacian matrices, spectral gap, eigenvalue distribution Small community phenomenon: most nodes belong to small communities (ie subgraphs with more internal than external links) … 17Dimension matching in OSNs

Blau space OSNs live in social space or Blau space: –each user identified with a point in a multi-dimensional space –coordinates correspond to socio- demographic variables/attributes homophily principle: the flow of information between users is a declining function of distance in Blau space 18Dimension matching in OSNs

Underlying geometry Feature space thesis: every complex network is naturally associated with an underlying feature space. For eg: –web graph: topic space –OSNs: Blau space –PPIs: biochemical space 19Dimension matching in OSNs

Dimensionality Question: What is the dimension of the Blau space of OSNs? what is a credible mathematical formula for the dimension of an OSN? 20Dimension matching in OSNs

Six dimensions of separation21

22 Why model complex networks? uncover and explain the generative mechanisms underlying complex networks predict the future nice mathematical challenges models can uncover the hidden reality of networks Dimension matching in OSNs

Networks - Bonato23

24 “All models are wrong, but some are more useful.” – G.P.E. Box Dimension matching in OSNs

25 Random geometric graphs n nodes are randomly placed in the unit square each node has a constant sphere of influence, radius r nodes are joined if their Euclidean distance is at most r G(n,r), r = r(n) Dimension matching in OSNs

Some properties of G(n,r) Theorem (Penrose,97) Let μ = nexp(-πr 2 n). 1.If μ = o(1), then asymptotically almost surely (a.a.s.) G is connected. 2.If μ = Θ(1), then a.a.s. G has a component of order Θ(n). 3.If μ →∞, then a.a.s. G is disconnected. many other properties studied of G(n,r): chromatic number, clique number, Hamiltonicity, random walks, … 26Dimension matching in OSNs

Spatially Preferred Attachment (SPA) model (Aiello, Bonato, Cooper, Janssen, Prałat,08), (Cooper, Frieze, Prałat,12) 27 volume of sphere of influence proportional to in- degree nodes are added and spheres of influence shrink over time a.a.s. leads to power laws graphs, low directed diameter, and small separators Dimension matching in OSNs

Ranking models (Fortunato, Flammini, Menczer,06), (Łuczak, Prałat, 06), (Janssen, Prałat,09) parameter: α in (0,1) each node is ranked 1,2, …, n by some function r –1 is best, n is worst at each time-step, one new node is born, one randomly node chosen dies (and ranking is updated) link probability r -α many ranking schemes a.a.s. lead to power law graphs: random initial ranking, degree, age, etc. 28Dimension matching in OSNs

Geometric model for OSNs we consider a geometric model of OSNs, where –nodes are in m- dimensional Euclidean space –volume of spheres of influence variable: a function of ranking of nodes 29Dimension matching in OSNs

Geometric Protean (GEO-P) Model (Bonato, Janssen, Prałat, 12) parameters: α, β in (0,1), α+β < 1; positive integer m nodes live in an m-dimensional hypercube each node is ranked 1,2, …, n by some function r –1 is best, n is worst –we use random initial ranking at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated) each existing node u has a region of influence with volume add edge uv if v is in the region of influence of u 30Dimension matching in OSNs

Notes on GEO-P model models uses both geometry and ranking number of nodes is static: fixed at n –order of OSNs at most number of people (roughly…) top ranked nodes have larger regions of influence 31Dimension matching in OSNs

Simulation with 5000 nodes 32Dimension matching in OSNs

Simulation with 5000 nodes 33 random geometric GEO-P Dimension matching in OSNs

Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012) a.a.s. the GEO-P model generates graphs with the following properties: –power law degree distribution with exponent b = 1+1/α –average degree d = (1+o(1))n (1-α-β) /2 1-α densification –diameter D = n Θ(1/m) small world: constant order if m = Clog n –bad spectral expansion and high clustering coefficient 34Dimension matching in OSNs

Dimension of OSNs given the order of the network n and diameter D, we can calculate m gives formula for dimension of OSN: 35Dimension matching in OSNs

Logarithmic Dimension Hypothesis In an OSN of order n and diameter D, the dimension of its Blau space is posed independently by (Leskovec,Kim,11), (Frieze, Tsourakakis,11) 36Dimension matching in OSNs

Uncovering the hidden reality reverse engineering approach –given network data (n, D), dimension of an OSN gives smallest number of attributes needed to identify users that is, given the graph structure, we can (theoretically) recover the social space 37Dimension matching in OSNs

6 Dimensions of Separation OSNDimension Facebook7 YouTube6 Twitter4 Flickr4 Cyworld7 38Dimension matching in OSNs

MITACS team, UBC L to R: Amanda Tian, David Gleich, Myughwan Kim, Me, Stephen Young, Dieter Mitsche Dimension matching in OSNs

40

MGEO-P (Bonato, Gleich, Mitsche, Prałat, Tian, Young,14) time-steps in GEO-P form a computational bottleneck consider a GEO-P where we forget the history of ranks –memoryless GEO-P (MGEO-P) place n points u.a.r. in the hypercube assign ranks from via a random permutation σ for each pair i > j, ij is an edge if j is in the ball of volume σ(i) –α n -β 41Dimension matching in OSNs

Contrasting the models by considering the evolution of ranks in GEO-P, the probability that an edge is present in GEO-P and not in MGEO-P is: intuitively, the models generate similar graphs many a.a.s properties hold in MGEO-P with similar parameters 42Dimension matching in OSNs

Properties of the MGEO-P model (BGMPTY,14) a.a.s. the MGEO-P model generates graphs with the following properties: –power law degree distribution with exponent b = 1+1/α –average degree d = (1+o(1))n (1-α-β) /2 1-α densification –diameter D = n Θ(1/m) 43Dimension matching in OSNs

Proof sketch: diameter eminent node: –highly ranked: ranking greater than some fixed R partition hypercube into small hypercubes choose size of hypercubes and R so that –each hypercube contains at least log 2 n eminent nodes –sphere of influence of each eminent node covers each hypercube and all neighbouring hypercubes choose eminent node in each hypercube: backbone show all nodes in hypercube distance at most 2 from backbone 44Dimension matching in OSNs

Back to question… How would we measure the dimensionality of Blau space? 45Dimension matching in OSNs

Aside: machine learning machine learning is a branch of AI where computers make decisions and answer questions based on data sets examples: –spam filters –Netflix recommender systems especially useful when the data or number of decisions are too large for humans to process 46Dimension matching in OSNs

Facebook100 Dimension matching in OSNs47

Validating the LDH we tested the dimensionality of large-scale samples from real OSN data –Facebook100 and LinkedIn (sampled over time) IDEA: use machine learning (SVM) to predict dimensions –features: small subgraph counts (3- and 4- vertex subgraphs) –compared sampled data vs simulations of MGEO-P with dimensions 1 through 12 48Dimension matching in OSNs

Graphlets Dimension matching in OSNs49

Experimental design Dimension matching in OSNs50

Sample: Michigan Dimension matching in OSNs51

52 Stanford3: n: edges: avgdeg: plexp: GeoP parameters alphabeta: alpha: beta: python geop_dim_experiment.py --logcount -s 50 -t 0 --mmax 12 --prob Stanford M-GeoP dimensions: LADTree: 2 J48: 3 Logistic: 5 SVM: 5 Dimension matching in OSNs

FB and LinkedIn - SVM Dimension matching in OSNs53

FB and LinkedIn - Eigenvalues 54Dimension matching in OSNs

Figure 6. For three of the Facebook networks, we show the eigenvalue histogram in red, the eigenvalue histogram from the best fit MGEO-P network in blue, and the eigenvalue histograms for samples from the other dimensions in grey. Bonato A, Gleich DF, Kim M, Mitsche D, et al. (2014) Dimensionality of Social Networks Using Motifs and Eigenvalues. PLoS ONE 9(9): e doi: /journal.pone

Future directions Other data sets Fractal dimension What are the attributes? What implications does LDH have for OSNs or social networks in general? 56Dimension matching in OSNs