Dynamics of interactions in a large communication network Márton Karsai, Mikko Kivelä, Raj Pan, Jari Saramäki, Kimmo Kaski, Albert-László Barabási János Kertész Budapest University of Technology and Economics and Aalto University, Helsinki Support: EU FP7 FET-Open , FiDiPro, OTKA In collaboration with Temporal aspects of social behavior from mobile phone data
Transmission on a linear chain VERY slow
May 29, 2003: calls to the Hungarian Central Office for Combatting Catastrophies that people obtained sms messages like: „Large nuclear accident in Paks Stay home, close doors-windows, don’t eat lettuce” In reality nothing happened. Police investigation revealed that a nurse had heared children talking in the kindergarden about such things (who heared them from their parents as a rumor); she called her relatives, the „information” reached a journalist, who started an sms campaign... Was spreading fast or slow?
Outline - Spreading phenomena in complex networks, small world - Mobile phone call network: A proxy for the social network. Nodes, links and weights - The Granovetterian structure of the society - Event list and modeling spreading - Different sources of correlations: - topology - weight-topology - daily (weekly) patterns - burstiness - link-link - Differentiating between contributions to spreading - Burst statistics - Summary
Spreading phenomena in networks - epidemics (bio- and computer) - rumors, information, opinion - innovations - etc. Nodes of a network can be: - Susceptible - Infected - Recovered (immune) Corresponding models: SI, SIR, SIS... Important: speed of spreading (SLOW)
Spreading curve (SI) Early Late Intermediate m(t)=N inf /N tot
Spreading in the society Small world property; “Six Degrees of Separation”, Erdős number, WWW, collaboration network, Kevin Bacon game etc. Not only social nw-s: Internet, genetic transcription, etc. In many networks the average distance btw two arbitrary nodes is small (grows at most log with system size). Distance: length of shortest path btw two nodes
Impossible to know – use a proxy: The usage of mobile phones in the adult population is close to 100% All interactions are recorded – use call network as a proxy of the network at the societal level Small world: fast spreading? There are short, efficient paths. Are they used? Needed information: - Structure of the society: Network at the societal level - Local transmission dynamics: Detailed description, how information (rumor, opinions etc) is transmitted
Over 7 million private mobile phone subscriptionsOver 7 million private mobile phone subscriptions Focus: voice calls within the home operatorFocus: voice calls within the home operator Data aggregated from a period of 18 weeksData aggregated from a period of 18 weeks Require reciprocity (X Y AND Y X) for a linkRequire reciprocity (X Y AND Y X) for a link Customers are anonymous (hash codes)Customers are anonymous (hash codes) Data from an European mobile operator (20% market share)Data from an European mobile operator (20% market share) Weights: either call duration or number of callsWeights: either call duration or number of calls Constructing social network from mobile phone data Y X 15 min 5 min 20 min X Y J.-P. Onnela, et al. PNAS 104, (2007) J.-P. Onnela, et al. New J. Phys. 9, 179 (2007)
Huge network: proxy for network at societal level Small world
The strength of weak ties (M.Granovetter, 1973) Hypothesis about the small scale (micro-) structure of the society: 1. “The strength of a tie is a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services which characterize the tie.” 2. “The stronger the tie between A and B, the larger the proportion of individuals S to whom both are tied.” Consequences on large (macro-) scale: Society consists of strongly wired communities linked by weak ties. The latter hold the society together. Granovetter, Mark S. (May 1973), "The Strength of Weak Ties", American Journal of Sociology 78 (6): 1360–1380
Overlap Definition: relative neighborhood overlap (topological) where the number of triangles around edge ( v i, v j ) is n ij Illustration of the concept:
Empirical Verification Let w denote O ij averaged over a bin of w-values Use cumulative link weight distribution: (the fraction of links with weights less than w’) Relative neighbourhood overlap increases as a function of link weight Verifies Granovetter’s hypothesis (~95%) (Exception: Top 5% of weights) Blue curve: empirical network Red curve: weight randomised network
High Weight Links? Weak links: Strengh of both adjacent nodes (min & max) considerably higher than link weight Strong links: Strength of both adjacent nodes (min & max) about as high as the link weight Indication: High weight relationships clearly dominate on-air time of both, others negligible Time ratio spent communicating with one other person converges to 1 at roughly w ≈ 10 4 Consequence: Less time to interact with others Explaining onset of decreasing trend for w w ij sisi sjsj s i =Σ j w ij
Possible to ask unprecedented questions and even find the answers to them Study revealed the structure of the network, the interplay btw weigths and communities, the relations btw local, mesoscopic and global structure
Thresholding Initial connected network ( f=0 ) All links are intact, i.e. the network is in its initial stage
Thresholding Increasing weight thresholded network ( f=0.8 ) 80% of the weakest links removed, strongest 20% remain
Thresholding Initial connected network ( f=0 ) All links are intact, i.e. the network is in its initial stage
Thresholding Decreasing weight thresholded network ( f=0.8 ) 80% of the strongest links removed, weakest 20% remain
Percolation aspects WeightsOverlap The local relationship between weights and topology has global consequences
Spreading of information Knowledge of information diffusion based on unweighted networks Use the present network to study diffusion on a weighted network: Does the local relationship between topology and tie strength have an effect? Spreading simulation: infect one node with new information (1) Granovetterian: p ij w ij (2) Reference: p ij Spreading significantly faster on the reference (average weight) network Information gets trapped in communities in the real network Reference Granovetterian
Small but slow world We have data about - who called whom, voice, SMS, MMS - when - how long they talked (+ metadata – gender, age, postal code + mostly used tower,…) 306 million mobile call records of 4.9 million individuals during 4 months with 1s resolution M.Karsai et al. voice calls SMS
Time sequence is made periodic Is this fast or slow? What to compare with? The problem of null models More accurate study of spreading is possible: Infect (info, gossip, etc.) a node at time t 0 =0. Transmission, whenever call with uninfected takes place (SI model). Watch m(t)=, the ratio of infected nodes with an average over initiators.
Correlations influence spreading speed -Topology (community structure) - Weight-topology (Granovetter-structure) - Bursty dynamics - Daily pattern - Link-link dynamic correlations
A.-L. Barabási, Nature 207, 435 (2005) Bursty dynamics: inhomogeneous activity patterns Poissonian Bursty
Average user Busy user Note the different scales Bursty call patterns for individual users
Daily pattern of call density. Weekly pattern too (here disregarded)
Calls are non-Poissonian Scaled inter-event time distr. Binned according to weights (here: number of calls) Inset: time shuffled
Dynamic link-link correlations triggered calls, cascades, etc. How to identify the effect of the different correlations on the spreading? Introduce different null models by appropriate shuffling of the data.
Time shuffling Link1Link2Link3...LinkN t 11 t 21 t 31...t N1 t 12 t 22 t 32...t N t 3n_ t 1n_1.. t 2n_2. t Nn_N Destroyes burstiness (and link-link correlations) but keeps weight and daily pattern
Link1Link2Link3...LinkN t 11 t 21 t 31...t N1 t 12 t 22 t 32...t N t 3n_3.... t 1n_1.. t 2n_2. t Nn_N Link sequence shuffling Select random pairs of links sequences and exchange Destroys topology-weight and link-link correlation, keeps burstiness
Link1 w Link2 w Link3 w..LinkN w t 1’1 t 2’1 t 3’1...t N’1 t 1’2 t 2’2 t 3’2...t N’ t 1’n_1’. t 2’n_2’. t 3’n_3’.... t N’n_N’ Equal weight link sequence shuffling Destroyes link-link correlations Keeps weight-topology correlations and bursty dynamics
Long time behavior
The role of the daily pattern Model calculation: Take the empirical topology, with weights Compare homogeneous amd imhomogeneous Poissonians Little effect Slowing down mainly due to Granovetterian structure and bursty character of human activity
A closer look to burstiness Define a bursty period (BP) In a series of signals a BP(Δt) is a sequence of signals with an empty period of length Δt both at the beginning and at the end of the sequence { Δt The end is measured from the end of the talk.
Bursty dynamics: a closer look What is a burst? Define it relative to a window Δt: A bursty period (BP) is a sequence of events separated from the rest by empty periods of at least Δt lengths. Δt
Statistics of bursts: length of BP Frequency of length of BP Δt Δt=10 is too small
Distribution of number of events E in bursts -4.2
Autocorrelation of events
Modeling: 1. Independent events: Whatever the distribution of inter-event times is, we get for P(E = n) ~ exp(-An) in contrast to the observed power law
Queuing model (Barabási, 2005) Qualitatively good (power law waiting times and number of events in BP but wrong exponents We have a to do list, which contain the tasks in a hierarchical order. Always the highest priority task is executed. Tasks arrive at random and get a hierarchy paramater at random.
Summary - Mobile phone call network used as a proxy for the human network at the societal level - Structure of the society follows Granovetter’s picture (up to 95%) - Micro and macro structures are related - Several different types of correlations - Spreading slowed down mainly by weight-topology correlations and burstiness of human activities - Bursty are highly correlated events (not explainable by circadic patterns) - Strong short time correlations, no model – new explanation needed
J.-P. Onnela, et al. PNAS 104, (2007) J.-P. Onnela, et al. New J. Phys. 9, 179 (2007) J. Kumpula et al. PRL 99, (2007) M. Karsai et al. arXiv: