851-0585-04L – Modeling and Simulating Social Systems with MATLAB 13.06.2018 851-0585-04L – Modeling and Simulating Social Systems with MATLAB Lecture 6 – Graphs (Networks) Karsten Donnay and Stefano Balietti Chair of Sociology, in particular of Modeling and Simulation © ETH Zürich | © ETH Zürich |
Schedule of the course Introduction to MATLAB 13.06.2018 Schedule of the course Introduction to MATLAB 21.02. 28.02. 07.03. 14.03. 21.03. 28.03. 04.04. 18.04. 02.05. 09.05. 16.05. 23.05. 30.05. Introduction to social-science modeling and simulation Working on projects (seminar theses) Handing in seminar thesis and giving a presentation (final deadlines to be communicated)
Schedule of the course Different ways of Introduction to MATLAB 13.06.2018 Schedule of the course Different ways of Representing space Introduction to MATLAB 21.02. 28.02. 07.03. 14.03. 21.03. 28.03. 04.04. 18.04. 02.05. 09.05. 16.05. 23.05. 30.05. Dynamical Systems (no-space) Cellular Automata (grid) Networks (graphs) Working on projects (seminar theses) Continuous Space (…) There are different ways to model the space in which social interaction takes place, e.g.: No space, e.g. Dynamical Systems Discrete space, e.g. Cellular Automata Continuous space (Next week) Graphs Handing in seminar thesis and giving a presentation (final deadlines to be communicated)
Goals of Lecture 6: students will Consolidate knowledge acquired during lecture 5, through brief repetition of the main points and exploration of the additional material (Prisoner Dilemma Tournament). Discover the origin of the Graph Theory Learn how to define rigorously a Graph in terms of its main mathematical properties Apprehend how the degree distribution characterizes different network topologies Get a firm grasp in algorithms to generate different network topologies Be introduced to software for visualizing complex graphs.
Repetition: Game Theory Game Theory: powerful analytical framework to formalize the decision process of interacting individuals Nash equilibrium (NA): Every player adopt his ‘best strategy’ given the strategy of the other players. NA does not mean that the solution found is (Pareto) efficient. Players can get stuck in local optima (eg. Prisoner Dilemma) with low payoff for all of them. Evolutionary Learning: In iterated games mechanisms of Selection, Replication, Mutation creates evolutionary dynamics.
Repetition: Programming a Simulator 5 phases (at least). They can be separated in the code with the %% (double percent) which creates an executable cell tournament.m -> B iteratedpd.m -> A play.m -> C Time and Agents loops are inverted Initialization A Time loop end Agents loop end B Update state C Save data Ah, if you played the game you noticed that TIT for TAT always wins.
Seven Bridges of Königsberg 13.06.2018 Seven Bridges of Königsberg Graph Theory was born in 1736, when Euler posted the following problem: Is it possible to have a walk in the city of Königsberg, that crosses each of the seven bridges only once? The city of Königsberg in Prussia (now Kaliningrad, Russia) was set on both sides of the Pregel River, and included two large islands which were connected to each other and the mainland by seven bridges.
Seven Bridges of Königsberg (II) 13.06.2018 Seven Bridges of Königsberg (II) In order to approach the problem, Euler represented the important information as a graph: Next, Euler observed that (except at the endpoints of the walk), whenever one enters a vertex by a bridge, one leaves the vertex by a bridge. In other words, during any walk in the graph, the number of times one enters a non-terminal vertex equals the number of times one leaves it. Now, if every bridge is traversed exactly once, it follows that, for each land mass (except possibly for the ones chosen for the start and finish), the number of bridges touching that land mass is even (half of them, in the particular traversal, will be traversed "toward" the landmass; the other half, "away" from it). However, all four of the land masses in the original problem are touched by an odd number of bridges (one is touched by 5 bridges, and each of the other three is touched by 3). Since, at most, two land masses can serve as the endpoints of a putative walk, the proposition of a walk traversing each bridge once leads to a contradiction. Euler shows that the possibility of a walk through a graph, traversing each edge exactly once, depends on the degrees of the nodes. The degree of a node is the number of edges touching it. Euler's argument shows that a necessary condition for the walk of the desired form is that the graph be connected and have exactly zero or two nodes of odd degree. Source: wikipedia.org
Definition of Graph A graph consists of two entities: 13.06.2018 Definition of Graph A graph consists of two entities: Nodes (vertices): N Links: L Edge: undirected link Arc: directed link The graph is defined as G = (N,L) Source: Batagelj
Properties of Links and Nodes 13.06.2018 Properties of Links and Nodes A link can either be encoded as a: boolean flag (connection vs. no connection), or value or weight (distance, traveling time, etc.) A node can also contain information (“attributes”) When a Graph is enriched with extra information encoded either in the nodes or in the links, we call it Network.
Graphs - Examples NODES LINKS Protein interaction Proteins 13.06.2018 Graphs - Examples NODES LINKS Protein interaction Proteins Metabolic reactions Internet Routers Communication channels Social networks Individuals Social relations WWW Webpages Hyperlinks Scientific Coauthorship Networks Authors Papers
Graphs - examples Internet Map [lumeta.com] Food Web [Martinez ’91] (C) 2010, C. Faloutsos Graphs - examples Internet Map [lumeta.com] Food Web [Martinez ’91] Friendship Network [Moody ’01] Protein Interactions [genomebiology.com]
Mathematical Description of a Graph 13.06.2018 Mathematical Description of a Graph A node can be characterized by: Degree k: Number of connections. Importance: Degree, Betweenness centrality, Closeness, Eigenvector centrality (e.g. PageRank). (More measures later on in this lecture and in the course) A graph can be characterized by: Degree distribution P(k): Fraction of nodes with k connections.
13.06.2018 Degree Distribution Graphs can be classified by their topology, by measuring the degree-distribution function P(k), of the number of connections k per node: Random graph: P(k) = binomial distribution Scale-free graph: P(k) = k-γ (power law) Source: www.computerworld.com
The Small World Phenomenon 13.06.2018 The Small World Phenomenon Graphs are useful for modeling social networks, disease spreading, transportation, and so on … One of the most famous graph studies is the Small World Experiment (S. Milgram), which shows that the minimum distance between any two persons in the world is almost never longer than through 5 friends.
Small World Example: Oracle of Bacon 13.06.2018 Small World Example: Oracle of Bacon There is a web page http://oracleofbacon.org/ finding the path from any actor at any time to the Hollywood actor Kevin Bacon. It can also be used to find the shortest path between any two actors.
Paths Path of length n = ordered collection of 13.06.2018 Paths Path of length n = ordered collection of n+1 nodes. Eg: A,C,D,E in G =(N,L) n links. Eg: (A,C), (C,D),(D,E) in G =(N,L) Circuit = closed path (last node = first node)
Paths and connectedness 13.06.2018 Paths and connectedness A graph G=(N,L)is connected if and only if there exists a path connecting any two nodes in G •is not connected Connected (Tree) Not Connected (Forest) Connected with loops
13.06.2018 Giant Component The giant component connects the vast majority of the nodes of a Graph.
13.06.2018 Shortest paths The shortest path between i and j is minimum number of traversed edges A A I J I J B B D H D H X X Distance l(i,j) = shortest path between i and j Diameter D of the graph = max(l(i,j)) Over time D is shrinking/constant
Centrality Measures: Betweeness Centrality 13.06.2018 Centrality Measures: Betweeness Centrality Express the number of shortest paths passing through a node v. Namely, v Example of a node v with high betweeness centrality
MATLAB Implementation 13.06.2018 MATLAB Implementation A graph can be implemented in MATLAB via its adjacency matrix, i.e. an N x N matrix, defining how N nodes are connected to the other N-1 nodes: N = 10; A = zeros(N, N); A(1,2) = 1; A(10,4) = 1; …
13.06.2018 Graphs If the nodes are cities and the links define connections and travel times for the SBB network it looks like this: Basel 2 1 3 Zurich Bern 4 Geneva
13.06.2018 Graphs If the nodes are cities and the links define connections and travel times for the SBB network it looks like this: A = [0 1 1 0; 1 0 1 0; 1 1 0 1; 0 0 1 0]; Basel 1 2 3 4 2 1 1 1 2 A = 3 Zurich 3 4 Bern 4 Geneva
13.06.2018 Graphs If the nodes are cities and the links define connections and travel times for the SBB network it looks like this: Basel 2 0:54 0:55 1 3 Zurich 1:41 0:57 Bern 4 Geneva
13.06.2018 Graphs If the nodes are cities and the links define connections and travel times for the SBB network it looks like this: Basel 1 2 3 4 2 0:54 54 57 55 101 1 0:55 1 2 A = 3 Zurich 3 1:41 0:57 4 Bern 4 Geneva
Alternatives Ways to Store Network Data 13.06.2018 Alternatives Ways to Store Network Data Edge/Arc lists can easily stored to a file and loaded when needed Basel 1 2 1 3 2 1 2 3 3 1 3 2 3 4 4 3 2 1 3 Zurich Bern 4 Geneva
Alternatives Ways to Store Network Data 13.06.2018 Alternatives Ways to Store Network Data Cell arrays can contain vectors of different size Basel >> A = [2 3]; >> B = [1 3]; >> C = [1 2 4]; >> D = [3]; >> Net = {A;B;C;D}; >> Net{1}(1) >> ans = 2 2 1 3 Zurich Bern 4 Geneva
Alternatives Ways to Store Network Data 13.06.2018 Alternatives Ways to Store Network Data Cell arrays grants more freedom in representing data structures, in spite of loosing the simplicity and clarity of the matrix notation. 1 2 3 4 >> A = [2,54; 3,57]; >> B = [1,54; 3,55]; >> C = [1,57; 2,55; 4,101]; >> D = [3,101]; >> Net = {A;B;C;D}; 54 57 55 101 1 2 3 4
Alternatives Ways to Store Network Data 13.06.2018 Alternatives Ways to Store Network Data Cell arrays grants more freedom in representing data structures, in spite of loosing the simplicity and clarity of the matrix notation. 1 2 3 4 >> A = [2,54; 3,57]; >> B = [1,54; 3,55]; >> C = [1,57; 2,55, 4,101]; >> D = [3,101]; >> Net = {A;B;C;D}; 54 57 55 101 1 Warning: you must validate your own data structure ! 2 3 4
How to generate random, realistic graphs? Generators How to generate random, realistic graphs? Probabilistic generators Degree-based generators Process-based generators Recursive/self-similar generators
Probabilistic Generators: Erdos-Renyi Algorithm: Start with a number of nodes n (fully not connected) Define probability of connection P For all the possible couples of nodes a link is created with probability P The average number of links is given by: p*n*(n-1)/2 The greater P the higher the average degree of the network
Probabilistic Generators: Erdos-Renyi random graph – 100 nodes, avg degree = 2 Fascinating properties (phase transition) But: unrealistic (Poisson degree distribution != power law)
E-R model & Percolation Pc The formation of the Giant Component is not a smooth process. It emerge all of sudden when p > 1/n This phenomenon is called 1st order phase-transition 1 N->infty K K0 K = avg(k) Pc = Prob( there is a giant connected component)
Graphs: Laws and patterns (C) 2010, C. Faloutsos Graphs: Laws and patterns Are real graphs random? If we look at the data the answer most of the time is: NO!! count count k degree degree k
Degree-based generators Figure out the degree distribution (eg., ‘Zipf’) Assign degrees to nodes Put edges, so that they match the original degree distribution
Process-based: Preferential attachment Algorithm: Start with a random connected graph At each time step create a new node and attach it to the others with probability: Ki = degree of node i That is: if a node has many links, it will get more in the future…
Process-based: Preferential attachment Generates power-law tails (richer-get-richer) The degree distribution is a power law of the form: P(K) ~ k-3
Process-based: Preferential attachment Generates power-law tails (richer-get-richer) The degree distribution is a power law of the form: P(K) ~ k-3 But still, it does present the property of shrinking diameter in evolving networks…
Recursive/Self-similar Generators: Kronecker product (intuition of) The main idea is to create self-similar graphs, recursively. 1 2 3 4 1 2 3 4 1 1 A 1 2 2 A = Kr = 3 3 4 4
Recursive/Self-similar Generators: Kronecker product (intuition of) The main idea is to create self-similar graphs, recursively. 1 2 3 4 1 2 3 4 1 1 A 1 2 2 A = Kr = 3 3 4 4
13.06.2018 Projects Today, there are no exercises. Instead, you can work on your projects and we will supervise you.
References: Software Packages The following programs are valuable tools for representing and and visualizing networks: Pajek (http://pajek.imfm.si/doku.php) -> Easy to use NWB (http://nwb.cns.iu.edu/) -> Good for Analysis Gephi (http://gephi.org/) -> New Visone (http://visone.info/) -> made in Konstanz JUNG (http://jung.sourceforge.net/) -> library Net Draw (http://www.analytictech.com/netdraw/netdraw.htm) Pegasus (http://www.cs.cmu.edu/~pegasus/) -> for huge data Use them!!
13.06.2018 References Handbook of graphs and networks: from the Genome to the Internet, edited by S. Bornholdt, H. G. Schuster. John Wiley and Sons, 2003. Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award). Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, Christos Faloutsos: Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. PKDD 2005: 133-145 Kleinberg, Jon (1999)."Authoritative sources in a hyperlinked environment" (PDF). Journal of the ACM 46 (5): 604–632