Download presentation
Presentation is loading. Please wait.
Published byAlvin Benson Modified over 9 years ago
1
Eurecom, Sophia-Antipolis Thrasyvoulos Spyropoulos / spyropoul@eurecom.fr Part II: Complex Networks Empirical Properties and Metrics
2
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 2 “Networks, Crowds, and Markets: Reasoning About a Highly Connected World” by D. Easley and T. Kleinberg (“NCM”: publicly available online) · “Networks: An Introduction” by M. Newman – (“Networks”: shared copies in library) Networked Life: 20 Questions and Answers by M.Chiang (some chapters - shared copies in library)
3
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis A set of “nodes” Humans, routers, web pages, telephone switches, airports, proteins, scientific articles … Relations between these nodes humans: friendship/relation or online friendship routers, switches: connected by a communication link web pages: hyperlinks from one to other airports: direct flights between them articles: one citing the other proteins: link if chemically interacting Network often represented as a graph: vertex = node link relation (weight strength) 3
4
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 4 The social network of friendships within a 34-person karate club provides clues to the fault lines that eventually split the club apart (Zachary, 1977)
5
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 5 High school dating Peter S. Bearman, James Moody and Katherine Stovel Chains of affection: The structure of adolescent romantic and sexual networks American Journal of Sociology 110 44-91 (2004) Image drawn by Mark Newman
6
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Mostly done by Social Scientists Interested in Human (Social) Networks Spread of Diseases, Influence, etc. Methodology: Questionnaires cumbersome, (lots of) bias Network Size: 10s or at most 100s 6
7
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 7 Email flows amongst a large project team. Colors denote each participant’s department
8
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 8
9
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 9
10
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The study of large networks coming from all sorts of diverse areas We will focus on technological (e.g. Internet) and information networks (e.g. Web, Facebook) Cannot visually observe such networks (as in the case of old social networks of few 10s of nodes) need ways to measure them, and quantify their properties The field is often called Social Networks or Network Science or Network Theory Question 1: What are the statistical properties of real networks? Connectivity, paths lengths, degree distributions How do we measure such huge networks sampling Question 2: Why do these properties arise? Models of large networks: random graphs Deterministic ways too complex/restrictive Question 3: How can we take advantage of these properties? Connectivity (epidemiology, resilience) Spread (information, disease) Search (Web page, person) 10
11
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis There are a lot of different properties we might be interested in also depends on application But there are some commonly studied properties for 2 reasons: 1. These properties are important for key applications 2. The majority of networks exhibit surprising similarities with respect to these properties. 1. Degree distribution (“scale free structure”) 2. Path length (“small world phenomena”) 3. Clustering (“community structure”) 11
12
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Problem: find the probability distribution that best fits the observed data degree frequency k fkfk f k = fraction of nodes with degree k = probability of a randomly selected node to have degree k
13
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Probability of having k neighbors Identified by a line in the log-linear plot p(k) = λe -λk log p(k) = - λk + log λ degree log frequency λ
14
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Right-skewed/Heavy-tail distribution there is a non-negligible fraction of nodes that has very high degree (hubs) scale-free: f(ax) = bf(x), no characteristic scale, average is not informative p(k) = Ck -α Power-law distribution gives a line in the log-log plot α : power-law exponent (typically 2 ≤ α ≤ 3) log p(k) = -α logk + logC
15
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 10 0 1 2 3 -4 10 -3 10 -2 10 10 0 loglog This difference is particularly obvious if we plot them on a log vertical scale: for large x there are orders of magnitude differences between the two functions. Network Science: Scale-Free Property February 7, 2011
16
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 16 Internet backbone and regional connectivity Multi-tier AS topology Gateway Routers inside ASs
17
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 17 Holds for both AS and Router topologies
18
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 18
19
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 19 α : power-law exponent (typically 2 ≤ α ≤ 3)
20
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis d ij = shortest path between i and j Diameter: Average path length: Also of interest: distribution of all shortest paths
21
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis A total of n nodes arranged in a grid Only neighbors (up,down,left,right) connected Q: What is the diameter of the network? A: 2 -1 Q: What is the avg. distance? i.e. picking two nodes randomly A: It is in the order of (i.e. c ) 21
22
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis n wireless nodes in an area of 1x1 Each transmits at distance R R must be at least for connectivity Q: Choose two random nodes: What is the expected hop count (distance) between them? A: 22
23
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Letters were handed out to people in Nebraska to be sent to a target in Boston People were instructed to pass on the letters to someone they knew on first-name basis ~60 letters, only about 35% delivered The letters that reached the destination followed paths of length around 6 Six degrees of separation: (play of John Guare)
24
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis In 2001, Duncan Watts, a professor at Columbia University, recreated Milgram's experiment using an e-mail message as the “package" that needed to be delivered. Surprisingly, after reviewing the data collected by 48,000 senders and 19 targets in 157 different countries, Watts found that again the average number of intermediaries was 6.
25
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis A Few Good Men Robert Wagner Austin Powers: The spy who shagged me Wild Things Let’s make it legal Barry Norton What Price Glory Monsieur Verdoux
26
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis (statistics from IMDB) ~740000 linkable actors Average (path length) = 3 99% of actors less than 6 hops Try your own actor here: http://www.cs.virginia.edu/oracle/ 26
27
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Legendary mathematician Paul Erdos, around 1500 papers and 509 collaborators Collaboration Graph: link between two authors who wrote a paper together Erdos number of X: hop count between Erdos and author X in collaboration graph ~260,000 in connected component 27 Kostas Psounis T. Spyropoulos
28
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 28 Number of AS traversed by an email message ~35000 nodes Avg. path ~ 5! Number of routers traversed by an email message >200000 Avg. path ~ 15 plots taken from R. V. Hofstad
29
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 29
30
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Milgram’s experiment => Small World Phenomenon Short paths exist between most nodes: Path length l << total nodes N (e.g line network: path length l = O(N)) 30 “Small world” = avg. path length l is at most logN
31
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Measures the density of triangles (local clusters) in the graph Two different ways to measure it: The ratio of the means 1 2 3 4 5
32
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 2 3 4 5
33
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Clustering coefficient for node i The mean of the ratios
34
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The two clustering coefficients give different measures C (2) increases with nodes with low degree 1 2 3 4 5
35
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
36
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 36
37
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Most real networks have… 1. Short paths between nodes (“small world”) 2. Transitivity/Clustering coefficient that is finite > 0 3. Degree distribution that follows a power law 37 Q1. Can we design graph models that exhibit similar characteristics? Q2. Can we explain how/why these phenomena occur in the first place? Q3. Can we take advantage of these properties (e.g. searching, advertising, viral infection/immunization, etc.)?
38
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Graph G=(V,E) V = set of vertices E = set of edges 1 2 3 4 5 undirected graph E={(1,2),(1,3),(2,3),(3,4),(4,5)}
39
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Graph G=(V,E) V = set of vertices E = set of edges 1 2 3 4 5 directed graph E={‹1,2›, ‹2,1› ‹1,3›, ‹3,2›, ‹3,4›, ‹4,5›}
40
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Edges have / do not have a weight associated with them weightedunweighted 4 813 5
41
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 2 3 4 5 degree d(i) of node i number of edges incident on node i degree distribution 1 node with degree 1 3 nodes with degree 2 1 node with degree 3 P(1) = 1/5, P(2) = 3/5, P(3) = 1/5 2 3 1 degree 12 3
42
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis k P(k) 1234 0.1 0.2 0.3 0.4 0.5 0.6 Network Science: Graph Theory January 24, 2011
43
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 2 3 4 5 in-degree d in (i) of node i number of edges pointing to node i out-degree d out (i) of node i number of edges leaving node i in-degree sequence [1,2,1,1,1] out-degree sequence [2,1,2,1,0]
44
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Path from node i to node j: a sequence of edges (directed or undirected from node i to node j) path length: number of edges on the path nodes i and j are connected cycle: a path that starts and ends at the same node 1 2 3 4 5 1 2 3 4 5
45
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Shortest Path from node i to node j also known as BFS path, or geodesic path 1 2 3 4 5 1 2 3 4 5
46
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The longest shortest path in the graph 1 2 3 4 5 1 2 3 4 5
47
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 2 3 4 5 Connected graph: a graph where every pair of nodes is connected Disconnected graph: a graph that is not connected Connected Components: subsets of vertices that are connected
48
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Clique K n A graph that has all possible n(n-1)/2 edges 1 2 3 4 5
49
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 2 3 4 5 Strongly connected graph: there exists a path from every i to every j Weakly connected graph: If edges are made to be undirected the graph is connected
50
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Adjacency Matrix symmetric matrix for undirected graphs 1 2 3 4 5
51
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Adjacency Matrix non-symmetric matrix for undirected graphs 1 2 3 4 5
52
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis G1G1 G2G2 G3G3 0 12 3 0 1 2 1 0 2 3 4 5 6 7 symmetric undirected: n 2 /2 directed: n 2
53
Eurecom, Sophia-Antipolis Thrasyvoulos Spyropoulos / spyropoul@eurecom.fr Random Graph Models: Create/Explain Complex Network Properties
54
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The networks discussed are quite large! Impossible to describe or visualize explicitly. Consider this example: You have a new Internet routing algorithm You want to evaluate it, but do not have a trace of the Internet topology You decide to create an “Internet-like” graph on which you will run your algorithm How do you describe/create this graph?? Random graphs: local and probabilistic rules by which vertices are connected Goal: from simple probabilistic rules to observed complexity Q: Which rules gives us (most of) the observed properties? 54
55
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 55
56
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis This is “Conway’s game of life” (many other automata) http://www.youtube.com/watch?v=ma7dwLIEiYU&feature=relat ed (demo) http://www.youtube.com/watch?v=ma7dwLIEiYU&feature=relat ed http://www.bitstorm.org/gameoflife/ (try your own) http://www.bitstorm.org/gameoflife/ 56 Local Rules Each cell either white or blue Each cell interacts with its 8 neighbors Time is discrete (rounds) 1. Any blue cell with fewer than two live neighbors becomes white 2. Any blue cell with two or three blue neighbors lives on to the round 3. Any blue cell with more than three blue neighbors becomes white 4. Any white cell with exactly three blue neighbors become blue
57
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis A very (very!) simple local rule: (any) two vertices are connected with probability p Only inputs: number of vertices n and probability p Denote this class of graphs as G(n,p) 57 Erdös-Rényi model (1960) Connect with probability p p=1/6 N=10 k ~ 1.5
58
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis N and p do not uniquely define the network– we can have many different realizations of it. How many? G(10,1/6) N=10 p=1/6 G(N,L): a graph with N nodes and L links The probability to form a particular graph G(N,L) is That is, each graph G(N,L) appears with probability P(G(N,L)).
59
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis P(L): the probability to have exactly L links in a network of N nodes and probability p: The maximum number of links in a network of N nodes. Number of different ways we can choose L links among all potential links. Binomial distribution...
60
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis P(L): the probability to have a network of exactly L links The average number of links in a random graph The standard deviation Average node degree
61
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis As the network size increases, the distribution becomes increasingly narrow—which means that we are increasingly confident that the number of links the graph has is in the vicinity of.
62
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The degree distribution follows a binomial average degree is = p(N-1) variance σ 2 = p(1-p)(N-1) Assuming z=Np is fixed, as N → ∞, B(N,k,p) is approximated by a Poisson distribution As N → ∞ Highly concentrated around the mean Probability of very high node degrees is exponentially small Very different from power law! 62
63
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The secret behind the small world effect – Looking at the network volume
64
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The secret behind the small world effect – Looking at the network volume Polynomial growth
65
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The secret behind the small world effect – Looking at the network volume Polynomial growth
66
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The secret behind the small world effect – Looking at the network volume Polynomial growthExponential growth
67
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Given the huge differences in scope, size, and average degree, the agreement is excellent!
68
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Consider a random graph G(n,p) Q: What is the probability that two of your neighbors are also neighbors? A: It is equal to p, independent of local structure clustering coefficient C = p when z is fixed (sparse networks): C = z/n =O(1/n) 68
69
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Given the huge differences in scope, size, and average degree, there is a clear disagreement.
70
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Erdos-Renyi Graphs are “small world” path lengths are O(logn) Erdos-Renyi Graphs are not “scale-free” Degree distribution binomial and highly-concentrated (no power- law) Exponentially small probability to have “hubs” (no heavy-tail) Erdos-Renyi Graphs are not “clustered” C 0, as N becomes larger Conclusion: ER random graphs are not a good model of real networks BUT: still provide a great deal of insight! 70 √ X X
71
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Some of your neighbors neighbors are also your own Exponential growth: Clustering inhibits the small-worldness
72
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Short paths must be combined with High clustering coefficient Watts and Strogatz model [WS98] Start with a ring, where every node is connected to the next k nodes With probability p, rewire every edge (or, add a shortcut) to a random node 72 order randomness p = 0 p = 1 0 < p < 1
73
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The Watts Strogatz Model : It takes a lot of randomness to ruin the clustering, but a very small amount to overcome locality 73 log-scale in p When p = 0, C = 3(k-2)/4(k-1) ~ ¾ L = n/k For small p, C ~ ¾ L ~ logn Clustering Coefficient – Characteristic Path Length
74
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Nodes: online user Links: email contact, tweet, or friendship Alan Mislove, Measurement and Analysis of Online Social Networks All distributions show a fat-tail behavior: there are orders of magnitude spread in the degrees
75
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
76
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The configuration model input: the degree sequence [d 1,d 2,…,d n ] process: -Create d i copies of node i; link them randomly -Take a random matching (pairing) of the copies self-loops and multiple edges are allowed 76 4 132 But: Too artificial!
77
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Networks continuously expand by the addition of new nodes Barabási & Albert, Science 286, 509 (1999) ER, WS models: the number of nodes, N, is fixed (static models)
78
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis (1) Networks continuously expand by the addition of new nodes Add a new node with m links Barabási & Albert, Science 286, 509 (1999)
79
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Barabási & Albert, Science 286, 509 (1999) PREFERENTIAL ATTACHMENT: the probability that a node connects to a node with k links is proportional to k. A: New nodes prefer to link to highly connected nodes. Q: Where will the new node link to? ER, WS models: choose randomly.
80
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis “The rich get richer” First considered by [Price 65] as a model for citation networks each new paper is generated with m citations (on average) new papers cite previous papers with probability proportional to their indegree (citations) what about papers without any citations? -each paper is considered to have a “default” citation -probability of citing a paper with degree k, proportional to k+1 Power law with exponent α = 2+1/m 80
81
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis The BA model (undirected graph) input: some initial subgraph G 0, and m the number of edges per new node the process: -nodes arrive one at the time -each node connects to m other nodes selecting them with probability proportional to their degree -if [d 1,…,d t ] is the degree sequence at time t, the node t+1 links to node i with probability Results in power-law with exponent α = 3 Various Problems: cannot account for every power law observed (Web), correlates age with degree, etc. 81
82
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis As p increases, so does the density of the graph For small p (<0.2) notice that not all nodes are connected For p = 0.2 only one isolated node 82 p = 0p = 0.1p = 0.2
83
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis We saw that increasing p denser networks In the large N case we increase z = Np the average degree But what really happens as p (or z) increases? 83 A random network on 50 nodes: p = 0.01 disconnected, largest component = 3
84
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis p = 0.03 large component appears But almost 40% of nodes still disconnected 84
85
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis p = 0.05 “giant” component emerges Only 3 nodes disconnected Giant component the graph “percolates” 85
86
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis p = 0.10 all nodes connected 86
87
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis S: the fraction of nodes in the giant component, S=N GC /N there is a phase transition at =1: for < 1 there is no giant component for > 1 there is a giant component for large the giant component contains all nodes (S=1) http://linbaba.files.wordpress.com/20 10/10/erdos-renyi.png S
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.