Presentation is loading. Please wait.

Presentation is loading. Please wait.

Log Dimension Hypothesis1 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University MITACS International Problem Solving Workshop July 2012.

Similar presentations


Presentation on theme: "Log Dimension Hypothesis1 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University MITACS International Problem Solving Workshop July 2012."— Presentation transcript:

1 Log Dimension Hypothesis1 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University MITACS International Problem Solving Workshop July 2012

2 Workshop team David Gleich (Purdue) Dieter Mitsche (Ryerson) Stephen Young (UCSD) Myunghwan Kim (Stanford) Amanda Tian (York) Log Dimension Hypothesis2

3 Friendship networks network of friends (some real, some virtual) form a large web of interconnected links Log Dimension Hypothesis3

4 6 degrees of separation Log Dimension Hypothesis4 (Stanley Milgram, 67): famous chain letter experiment

5 6 Degrees in Facebook? 900 million users, > 70 billion friendship links (Backstrom et al., 2012) –4 degrees of separation in Facebook –when considering another person in the world, a friend of your friend knows a friend of their friend, on average similar results for Twitter and other OSNs Log Dimension Hypothesis5

6 6 Complex Networks web graph, social networks, biological networks, internet networks, …

7 Log Dimension Hypothesis7 The web graph nodes: web pages edges: links over 1 trillion nodes, with billions of nodes added each day

8 Log Dimension Hypothesis8 On-line Social Networks (OSNs) Facebook, Twitter, LinkedIn, Google+…

9 Log Dimension Hypothesis9 Key parameters degree distribution: average distance: clustering coefficient:

10 Log Dimension Hypothesis10 Properties of Complex Networks power law degree distribution (Broder et al, 01)

11 Power laws in OSNs (Mislove et al,07): Log Dimension Hypothesis11

12 Log Dimension Hypothesis12 Small World Property small world networks (Watts & Strogatz,98) –low distances diam(G) = O(log n) L(G) = O(loglog n) –higher clustering coefficient than random graph with same expected degree

13 Log Dimension Hypothesis13 Sample data: Flickr, YouTube, LiveJournal, Orkut (Mislove et al,07): short average distances and high clustering coefficients

14 Log Dimension Hypothesis14 (Zachary, 72) (Mason et al, 09) (Fortunato, 10) (Li, Peng, 11): small community property Community structure

15 Log Dimension Hypothesis15 (Leskovec, Kleinberg, Faloutsos,05): densification power law: average degree is increasing with time decreasing distances (Kumar et al, 06): observed in Flickr, Yahoo! 360

16 Geometry of OSNs? OSNs live in social space: proximity of nodes depends on common attributes (such as geography, gender, age, etc.) IDEA: embed OSN in 2-, 3- or higher dimensional space Log Dimension Hypothesis16

17 Dimension of an OSN dimension of OSN: minimum number of attributes needed to classify nodes like game of “20 Questions”: each question narrows range of possibilities what is a credible mathematical formula for the dimension of an OSN? Log Dimension Hypothesis17

18 Geometric model for OSNs we consider a geometric model of OSNs, where –nodes are in m- dimensional Euclidean space –threshold value variable: a function of ranking of nodes Log Dimension Hypothesis18

19 Geometric Protean (GEO-P) Model (Bonato, Janssen, Prałat, 12) parameters: α, β in (0,1), α+β < 1; positive integer m nodes live in m-dimensional hypercube (torus metric) each node is ranked 1,2, …, n by some function r –1 is best, n is worst –we use random initial ranking at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated) each existing node u has a region of influence with volume add edge uv if v is in the region of influence of u Log Dimension Hypothesis19

20 Notes on GEO-P model models uses both geometry and ranking number of nodes is static: fixed at n –order of OSNs at most number of people (roughly…) top ranked nodes have larger regions of influence Log Dimension Hypothesis20

21 Simulation with 5000 nodes Log Dimension Hypothesis21

22 Simulation with 5000 nodes Log Dimension Hypothesis22 random geometricGEO-P

23 Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012) a.a.s. the GEO-P model generates graphs with the following properties: –power law degree distribution with exponent b = 1+1/α –average degree d = (1+o(1))n (1-α-β) /2 1-α densification –diameter D = O(n β/(1-α)m log 2α/(1-α)m n) small world: constant order if m = Clog n –clustering coefficient larger than in comparable random graph Log Dimension Hypothesis23

24 Log Dimension Hypothesis24 Spectral properties the spectral gap λ of G is defined by the difference between the two largest eigenvalues of the adjacency matrix of G for G(n,p) random graphs, λ tends to 0 as order grows in the GEO-P model, λ is close to 1 (Estrada, 06): bad spectral expansion in real OSN data

25 Dimension of OSNs given the order of the network n, power law exponent b, average degree d, and diameter D, we can calculate m gives formula for dimension of OSN: Log Dimension Hypothesis25

26 6 Dimensions of Separation OSNDimension Facebook7 YouTube6 Twitter4 Flickr4 Cyworld7 Log Dimension Hypothesis26

27 Uncovering the hidden reality reverse engineering approach –given network data (n, b, d, D), dimension of an OSN gives smallest number of attributes needed to identify users that is, given the graph structure, we can (theoretically) recover the social space Log Dimension Hypothesis27

28 Logarithmic Dimension Hypothesis Logarithmic Dimension Hypothesis (LDH): the dimension of an OSN is best fit by about log n, where n is the number of users OSN –theoretical evidence GEO-P and MAG (Leskovec, Kim,12) models –empirical evidence? –(Sweeney, 2001) Log Dimension Hypothesis28

29 Experimental design supervised machine learning – Alternating Decision Trees (ADT) – approach of (Janssen et al, 12+) based on earlier work on PIN by (Middendorf et al, 05) classify OSN data vs simulated graphs from GEO-P model in various dimensions develop a feature vector (graphlets, degree distribution percentiles, average distance, etc) to classify the correct dimension ADT will classify which dimension best fits the data –cross-validation and robustness testing Log Dimension Hypothesis29

30 Example Log Dimension Hypothesis30

31 Log Dimension Hypothesis31 preprints, reprints, contact: search: “Anthony Bonato”

32 Log Dimension Hypothesis32


Download ppt "Log Dimension Hypothesis1 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University MITACS International Problem Solving Workshop July 2012."

Similar presentations


Ads by Google