Presentation is loading. Please wait.

Presentation is loading. Please wait.

Social networks from the perspective of Physics János Kertész 1,2 Jukka-Pekka Onnela 2, Jari Saramäki 2, Jörkki Hyvönen 2, Kimmo Kaski 2, Jussi Kumpula.

Similar presentations


Presentation on theme: "Social networks from the perspective of Physics János Kertész 1,2 Jukka-Pekka Onnela 2, Jari Saramäki 2, Jörkki Hyvönen 2, Kimmo Kaski 2, Jussi Kumpula."— Presentation transcript:

1 Social networks from the perspective of Physics János Kertész 1,2 Jukka-Pekka Onnela 2, Jari Saramäki 2, Jörkki Hyvönen 2, Kimmo Kaski 2, Jussi Kumpula 2 David Lazer 3 Gábor Szabó 3,4, Albert-László Barabási 3,4 1 Budapest University of Technology and Economics, Hungary 2 Helsinki University of Technology, Finland 3 Harvard University 4 University of Notre Dame, USA

2 Outline 0. Introduction 1.Constructing the social network 2.Basic statistics 3.Granovetter’s hypothesis 4.Thresholding (percolation) 5.Spreading 6.Modeling 7.Conclusions

3 Introduction Complex systems: Many interacting units such that the resulting behavior is more than a mere sum (brain, internet, society…) Much is known about the interactions but complex behavior often still puzzling N = 3 can be many! See: Three-body problem of mechanics Statistical physics: N ~ 10 23 Social sciences: N = 3 – 10 9:

4 Introduction Complex systems: More input needed than mere interactions  Forget about interactions Networks: Scaffold of complexity Useful to concentrate on the carrying NW structure (nodes and links): Holistic approach with very general statements Spectacular recent development: Abundance of data due to IT + new concepts

5 PhenomenonNodesLinks Cell metabolism MoleculesChemical reactions Scientific collaboration ScientistsJoint papers WWWPagesURL links Air trafficAirportsAirline connections EconomyFirmsTrading LanguageWordsSynonymous meaning SocietyPeopleAcquaintances Introduction

6 Introduction Characterization of many empirical NW-s - BROAD DEGREE DISTRIBUTION in many natural and human made NW-s - SMALL WORLD property: Average distance between two nodes usually very small ( ~ log( N ) ) – „6 degrees of separation” - HIGH CLUSTERING: The number of triangles is significantly high Studied in many networks: WWW, Internet, actor, citation, metabolic etc…

7 World not only small and scale free but clustered! Friends of friends are often friends. Clustering coeff. C: ratio of connected neighbors ER graph:too small clustering !

8 Introduction WEIGHTED NW-S Step toward reductionism: Interactions have different strength  weights on links Weights: Fluxes (traffic or chemical reactions), correlation based networks, etc. (Often no negative weights, w ij  0.) How to characterize weighted NW-s? E.g. STRENGTH of node i: s i =  j w ij Intensity, coherence of subgraphs; clustering, motifs etc. (see: Onnela et al. PRE 71, 065103(R) (2005)

9 Introduction SOCIAL NW-S: Much has been taken from Sociology: betweennes, clustering, assortativity… Main method: Questionnaires (10 - 10 000) Weighted social nw-s: Strength of social relationships varies over wide range „I know him/her” „We are on first name basis” „We are friends” „We are good friends” „We are very good friends”… Scale? Subjectivity? How to measure?

10 Introduction Advantage of questionnaires: Ask whatever you are interested in. It enables complex studies, multi-factor analyses. Disadvantage: Difficulty in quantification and subjectivity E.g., AddHealth: Quantification of tie strength by number of joint activities Mutuality test fails very often M.Gonzales et al.Physica A 379, 307-316. (2007) Alternative approach: Use communication databases (email, phone etc)

11 Outline 0. Introduction 1.Constructing the social network 2.Basic statistics 3.Granovetter’s hypothesis 4.Thresholding (percolation) 5.Spreading 6.Modeling 7.Conclusions

12 Constructing the Network Use a network constructed from mobile phone calls as a proxy for a social networkUse a network constructed from mobile phone calls as a proxy for a social network In the network:In the network: Nodes  individuals Links  voice calls Link weights:Link weights: Number of callsNumber of calls (time & money)Total call duration (time & money)

13 Over 7 million private mobile phone subscriptionsOver 7 million private mobile phone subscriptions Focus: voice calls within the home operatorFocus: voice calls within the home operator Data aggregated from a period of 18 weeksData aggregated from a period of 18 weeks Require reciprocity (X  Y AND Y  X) for a linkRequire reciprocity (X  Y AND Y  X) for a link Customers are anonymous (hash codes)Customers are anonymous (hash codes) Data from an European mobile operatorData from an European mobile operator Constructing the Network Y X 15 min 5 min 20 min X Y

14 Outline 0. Introduction 1.Constructing the social network 2.Basic statistics 3.Granovetter’s hypothesis 4.Thresholding (percolation) 5.Spreading 6.Modeling 7.Conclusions

15 Basic Statistics: Visualisation Largest connected component dominates 3.9M / 4.6M nodes 6.5M / 7.0M links Use it for analysis!

16 Basic Statistics: Distributions Fat tail Vertex degree distribution Link weight distribution Dunbar number (monkeysphere): max ~150 connections

17 Outline 0. Introduction 1.Constructing the social network 2.Basic statistics 3.Granovetter’s hypothesis 4.Thresholding (percolation) 5.Spreading 6.Modeling 7.Conclusions

18 Granovetter’s Weak Ties Hypothesis Granovetter* suggests analysis of social networks as a tool for linking micro and macro levels of sociological theory Considers the macro level implications of tie (micro level) strengths: “The strength of a tie is a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services which characterize the tie.” Formulates a hypothesis: The relative overlap of two individual’s friendship networks varies directly with the strength of their tie to one another Explores the impact of the hypothesis on, e.g. diffusion of information, stressing the cohesive power of weak ties * M. Granovetter, The Strength of Weak Ties, The American Journal of Sociology 78, 1360-1380, 1973.

19 Granovetter’s Weak Ties Hypothesis Hypothesis based on theoretical work and some direct evidence Present network is suitable for testing the hypothesis: (i) Call durations  time commitment  tie strength (ii) Call durations  monetary commitment  tie strength (iii) Largest weighted social network so far (Problem: Other factors, such as emotional intensity or reciprocal services?) What is the coupling between network topology and link weights? Consider two connected nodes. We would like to characterize their relative neighborhood overlap, i.e. proportion of common friends This leads naturally to link neighborhood overlap

20 Overlap Definition: relative neighborhood overlap (topological) where the number of triangles around edge ( v i, v j ) is n ij Illustration of the concept:

21 Empirical Verification Let w denote O ij averaged over a bin of w-values Use cumulative link weight distribution: (the fraction of links with weights less than w’) Relative neighbourhood overlap increases as a function of link weight  Verifies Granovetter’s hypothesis (~95%) (Exception: Top 5% of weights) Blue curve: empirical network Red curve: weight randomised network

22 Local Implications Implication for strong links? Neighbourhood overlap is high  People form strongly connected communities Implication for weak links? Neighbourhood overlap is low  Communities are connected by weak links

23 A Piece of the Network community weak links strong links

24 Overlap Global optimization to transport would put high weights to links with high betweenness centrality (# passing shortest paths) In contrast, decreases with b

25 High Weight Links? (a) Average O ij as a function of weight w: w  10 4 : stronger tie  larger overlap w  10 4 : stronger tie  smaller overlap Contradicts the weak ties hypothesis ! Links in the decreasing part correspond to over 3h of communication over the period (b) Putting it into perspective: - For only 5% of links w  10 4 - Corresponds to 325 000 links, cannot be insufficient statistics

26 High Weight Links? Weak links: Strengh of both adjacent nodes (min & max) considerably higher than link weight Strong links: Strength of both adjacent nodes (min & max) about as high as the link weight Indication: High weight relationships clearly dominate on-air time of both, others negligible Time ratio spent communicating with one other person converges to 1 at roughly w ≈ 10 4 Consequence: Less time to interact with others Explaining onset of decreasing trend for w

27 Outline 0. Introduction 1.Constructing the social network 2.Basic statistics 3.Granovetter’s hypothesis 4.Thresholding (percolation) 5.Spreading 6.Modeling 7.Conclusions

28 Children’s approach: Break to learn! We do this systematically using thresholding analysis: Order the links by weight Delete the links, one by one, based on their order Control parameter f is the fraction of removed links We can continuously interpolate, in either direction, between the initial connected network ( f=0 ) and the set of isolated nodes ( f=1 ) We use two different thresholding schemes (i) Increasing thresholding (remove low w ij /O ij links first) (ii) Descending thresholding (remove high w ij /O ij links first) Question: How does the network respond to link removal? How similar is the response to w ij and O ij driven thresholding? Thresholding Analysis: Introduction

29 Thresholding Initial connected network ( f=0 )  All links are intact, i.e. the network is in its initial stage

30 Thresholding Increasing weight thresholded network ( f=0.8 )  80% of the weakest links removed, strongest 20% remain

31 Thresholding Initial connected network ( f=0 )  All links are intact, i.e. the network is in its initial stage

32 Thresholding Decreasing weight thresholded network ( f=0.8 )  80% of the strongest links removed, weakest 20% remain

33 We will study, as a function of the control parameter f, the following: 1.Order parameter (size of the largest component) 2.“Susceptibility” (average size of other components) 3.Average path lengths (in LCC) 4.Average clustering coefficient in the LCC Thresholding

34 Thresholding: Size of Largest Component (c) R LCC is the fraction of nodes in the largest connected component LCC is able to sustain its integrity for moderate values of f Least affected by removal of high O ij links (in tight communities) Most affected by removal of low O ij links (between communities) Difference between removal of low and high w ij links is small, but LCC breaks earlier if weak links are removed (Granovetter) Very few links are required for global connectivity remove low first remove high first

35 Thresholding: Size of Other Components (c) Collapse for different values of f, but what is its nature? “Susceptibility” (average cluster size excl. LCC) n s is the number of clusters with s nodes Percolation theory: S→∞ as f→f c Finite signature of divergence: f c ≈ 0.60 (incr. o.) f c ≈ 0.82 (incr. w.) Demarcation between weak and strong links given by f c ≈ 0.82 Qualitatively different role for weak and strong links remove low first remove high first

36 Granovetter refers to “interpersonal flow” (information, rumour) from one person to another In order for a flow to exist, the two people (nodes) need to be connected at least through one path The size of the LCC says nothing about how tightly connected the component is, only that it is connected Granovetter’s corollary Weak ties create a large number of short paths between nodes in different communities, and thus removing them should increase average path lengths and make it more difficult for the flow to happen Thresholding: Path Lengths in LCC

37 (c) Connectedness necessary but not sufficient condition for flow But how is the LCC connected? Use a.p.l. to study the role of different links for global paths Removing weak links leads to longer paths: f=0.75: =45 vs. =30 Supports the weak ties conjecture on path lengths (communities are locally connected by weak ties) remove low first remove high first

38 Thresholding: Clustering in LCC Effect of different links on the structure of communities? Quantify this with, average clustering coefficient Strong links are mostly within communities (triangles abundant), and thus removing them lowers clustering Weak links are mostly between communities (rarely participate in triangles), and thus removing them has little effect Removing high O ij links shatters communities quickly Removing low O ij links brings out communities remove low first remove high first

39 0. Introduction 1.Constructing the social network 2.Basic statistics 3.Granovetter’s hypothesis 4.Thresholding (percolation) 5.Diffusion of infromation 6.Modeling 7.Conclusions Outline

40 Diffusion of information Knowledge of information diffusion based on unweighted networks Use the present network to study diffusion on a weighted network: Does the local relationship between topology and tie strength have an effect? Spreading simulation: infect one node with new information (1) Empirical: p ij  w ij (2) Reference: p ij  Spreading significantly faster on the reference (average weight) network Information gets trapped in communities in the real network Reference Empirical

41 Diffusion of information Where do individuals get their information? Majority of infections through (1) Empirical: ties of intermediate strength (2) Reference: (would be) weak ties Both weak and strong ties have a diminishing role as information sources: The weakness of weak and strong ties ReferenceEmpirical

42 Best search results: Reach out of your own community Empirical Diffusion of information - Start spreading 100 times (large red node) - Information flows differently due to the local organizational principle (1) Empirical: information flows along a strong tie backbone (2) Reference: information mainly flows along the shortest paths Reference

43 Spreading In simplified terms, we can think of each link as transmitting information locally between the two individuals it connects Strong links involve larger time commitments, so natural to assume that information flow through a link is proportional to its weight w ij Flow through weak (high w ij ) links: (i) Low per se (by definition) (ii) Low overlap O ij  Few alternative paths of length 2, so information can easily get trapped Flow through strong (high w ij ) links: (i) High per se (by definition) (ii) High overlap O ij  Many alternative paths enhance flow further, so particularly well suited to efficient local transfer

44 Searching -Fix a set of search strategies -Study which strategies are successful in finding information -Best search results: Reach out of your own community!

45 0. Introduction 1.Constructing the social network 2.Basic statistics 3.Granovetter’s hypothesis 4.Thresholding (percolation) 5.Spreading 6.Modeling 7.Conclusions Outline

46 Modeling What is all this good for? Understanding structure and mechanisms of the society Improving spreading of news and opinions (Developing marketing strategies and other tools of mass manipulation) MODELING needed

47 Modeling Needed: Weighted network model, which reflects the observations with possibly limited input Links created by random encounters on acquaintance basis Weights generated by one-to-one activities (phone calls) Take into account the different time scales: Encounter (call) frequency Lifetime of relationships Lifetime of nodes  treated together

48 Microscopic mechanisms in sociology Network sociology Cyclic closure Exponential decay for growing geodesic distance Focal closure Distance independent “Sample window” Network model Local attachment (LA) Special case of cyclic closure: Triadic closure Global attachment (GA) Node deletion (ND) * M. Kossinets et al., “Empirical Analysis of an Evolving Social Network”, Science 311, 88 (2006)

49 Modeling i meets j with prob.  w ij, who meets k with prob.  w jk. If k is a common friend w ij, w jk w ki are increased by  (a). If k is not connected to i, w ik = w 0 ( = 1) is created with probability p  (b). With prob. p r new links with w 0 weight are created (c). With prob. p d a node with all links is deleted and a new one is born with no links.

50 Microscopic rules in the model Summary of the model Weighted local search for new acquaintances Reinforcement of existing (popular) links Unweighted global search for new acquaintances Node removal, exp.link & weight lifetimes: =2 =(p d ) -1 Model parameters δ Free weight reinforcement parameter p r = 10 -3 Sets the time scale of the model =1/p d (average node lifetime of 1000 time steps) p r = 5×10 -4 Global connections; results not sensitive for it (one random link per node during 1000 time steps) p Δ Adjusted in relation to δ to keep constant (structure changes due to only link re-organisations)

51 Modeling Changing  by keeping fixed by adjusting p . Communities emerge, with strong internal links. Communities are interconnected by weak links  = 0.001 0.10.5 p d = 0.001; p r = 0.0005; N = 30 000

52 Social network model Tie strength: weak → intermediate → strong tie Samples of N=10 5 network for variable weight - increase δ No communitiesCommunities start nucleating Communities forming Communities with dense & strong internal and sparse & weak external connections (cf. phone network)

53 Communities by inspection Average number of links constant: = N /2 ( ≈ 10 ) => All changes in structure due to re-organisation of links Increasing δ traps search in communities, further enhancing trapping effect => Clear communities form Triangles accumulate weight and act as nuclei for communities to emerge δ = 0.1δ = 0 δ = 0.5 δ = 1

54 Communities by k-clique method k-clique algorithm as definition for communities* Focus on 4-cliques (smallest non-trivial cliques) Relative largest community size R k=4  [0,1] Average community size (excl. largest) Observe clique percolation through the system for small δ Increasing δ leads to condensation of communities * G. Palla et al., “Uncovering the overlapping community structure...”, Nature 435, 814 (2005)

55 Global consequences Remove weak strong links first

56 Global consequences Ascending link removal Model network Descending link removal Phone network Ascending & Descending Phase transition for ascending tie removal (weaker first) Fraction of links, f ff 01

57 Modeling The model fulfills essential criteria of social nw-s: Broad (but not scale free degree) distribution Assortative mixing (popular people attract each other) High clustering: many triangles (by construction) Community structure with strong links inside and weak ones between them

58 Outline 0. Introduction 1.Constructing the social network 2.Basic statistics 3.Granovetter’s hypothesis 4.Thresholding (percolation) 5.Spreading 6.Modeling 7.Conclusions

59 Discussion and Conclusion Weak ties maintain network’s structural integrity; Strong ties maintain local communities; Intermediate ties mostly responsible for first- time infections How can one efficiently search for information in a social network? ”Go out of your community!” Social networks seem better suited to local processing than global transmission of information Are there simple rules or mechanisms that lead to observed properties? Efficient modeling possible Publications: J.-P. Onnela, et al. PNAS 104, 7332-7336 (2007) J.-P. Onnela, et al. New J. Phys. 9, 179 (2007) J.M. Kumpula, et al. PRL (to be published) www.phy.bme.hu/~kertesz/

60 >>In the history of public speaking, there have been many famous denials. One sunny day in 1880, Karl Marx declared: "I am not a Marxist". On a less auspicious occasion in 1973, Richard Nixon insisted "I am not a crook". Neither Marx’ nor Nixon’s audience gave much credence to their denials, and you too may respond with disbelief when I tell you that "I am not a networker".<< >>Instead, the slogan of the day will be "We are all networkers now".<< Marc Granovetter, Connections, 1990:


Download ppt "Social networks from the perspective of Physics János Kertész 1,2 Jukka-Pekka Onnela 2, Jari Saramäki 2, Jörkki Hyvönen 2, Kimmo Kaski 2, Jussi Kumpula."

Similar presentations


Ads by Google