Overlapping communities of large social networks: From “snapshots” to evolution Tamás Vicsek Dept. of Biological Physics, Eötvös University, Hungary

Slides:



Advertisements
Similar presentations
The overlapping community structure of complex networks.
Advertisements

Complex Networks: Complex Networks: Structures and Dynamics Changsong Zhou AGNLD, Institute für Physik Universität Potsdam.
Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Scale Free Networks.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France.
Analysis and Modeling of Social Networks Foudalis Ilias.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Advanced Topics in Data Mining Special focus: Social Networks.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
Mining and Searching Massive Graphs (Networks)
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Imperial College LondonFebruary 2007 Bubble Rap: Forwarding in Small World DTNs in Ever Decreasing Circles Part 2 - People Are the Network Jon Crowcroft.
Complex networks and random matrices. Geoff Rodgers School of Information Systems, Computing and Mathematics.
Global topological properties of biological networks.
Quantifying Social Group Evolution Gergely Palla, Albert-Laszlo Barabasi, and Tamas Vicsek Nature Vol 446 April 2007 Presented by: Liang Ding Finance Department,
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Computer Science 1 Web as a graph Anna Karpovsky.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Models of Influence in Online Social Networks
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Information Networks Power Laws and Network Models Lecture 3.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Analysis of Topological Characteristics of Huge Online Social Networking Services Friday 10am Telefonica Barcelona Yong-Yeol Ahn Seungyeop Han.
Statistical Mechanics of Complex Networks: Economy, Biology and Computer Networks Albert Diaz-Guilera Universitat de Barcelona.
Community detection algorithms: a comparative analysis Santo Fortunato.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
1 Burning a graph as a model of social contagion Anthony Bonato Ryerson University Institute of Software Chinese Academy of Sciences.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Class 9: Barabasi-Albert Model-Part I
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
Class 19: Degree Correlations PartII Assortativity and hierarchy
Equilibrium statistical mechanics of network structures Physica A 334, 583 (2004); PRE 69, (2004); cond-mat ; cond-mat Thanks to:
Social Networking: Large scale Networks
Properties of Growing Networks Geoff Rodgers School of Information Systems, Computing and Mathematics.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Weighted Networks IST402 – Network Science Acknowledgement: Roberta Sinatra Laszlo Barabasi.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Graph clustering to detect network modules
Network (graph) Models
Lecture 23: Structure of Networks
Cohesive Subgraph Computation over Large Graphs
Structures of Networks
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Topics In Social Computing (67810)
Date of download: 11/12/2017 Copyright © ASME. All rights reserved.
Lecture 23: Structure of Networks
Peer-to-Peer and Social Networks Fall 2017
Lecture 23: Structure of Networks
Affiliation Network Models of Clusters in Networks
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Overlapping communities of large social networks: From “snapshots” to evolution Tamás Vicsek Dept. of Biological Physics, Eötvös University, Hungary

Why communities/modules (densely interconnected parts)? The internal organization of large networks is responsible for their function. Complex systems/networks are typically hierarchical. The units organize (become more closely connected) into groups which can themselves be regarded as units on a higher level. We call these densely interconnected groups of nodes as modules/communities/cohesive groups/clusters etc. They are the “building blocks” of the complex networks on many scales. For example: Person->group->department->division->company->industrial sector Letter->word->sentence->paragraph->section->chapter->book

Community/modul finding: An important new subfield of the science of networks Amaral, Barabási, Bornholdt, Newman,….. Questions: How can we recover the hierarchy of overlapping groups/modules/communities in the network if only a (very long) list of links between pairs of units is given? What are their main characteristics? Outline Basic facts and principle Definitions of new quantities Results for phone call, school friendships and collaboration networks

Basic observations: A large complex network is bounded to be highly structured (has modules; function follows from structure) The internal organization is typically hierarchical (i.e., displays some sort of self-similarity of the structure) An important new aspect: Overlaps of modules are essential “mess”, no function Too constrained, limited function Complexity is between randomness and regularity

Role of overlaps Is this like a tree? (hierarchical methods)

Hierarchical methods k-clique template rolling Finding communities Two nodes belong to the same community if they can be connected through adjacent k-cliques a 4-clique

Hierarchical methods k-clique template rolling Finding communities Two nodes belong to the same community if they can be connected through adjacent k-cliques a 4-clique

Hierarchical methods k-clique template rolling Finding communities Two nodes belong to the same community if they can be connected through adjacent k-cliques a 4-clique

Hierarchical versus clique percolation clustering Common clustering methods lead to a partitioning in which someone (a node) can belong to a single community at a time only. For example, I can be located as a member of the community “physicists”, but not, at the same time, be found as a member of my community “family” or “friends”, etc. k-clique template rolling allows large scale, systematic (deterministic) analysis of the network of overlapping communities (network of networks)

Home page of CFinder

UNCOVERING THE OVERLAPPING COMMUNITY STRUCTURE OF COMPLEX NETWORKS IN NATURE AND SOCIETY with G. Palla, I. Derényi, and I. Farkas Definitions An order k community is a k-clique percolation cluster Such communities/clusters obviously can overlap This is why a lot of new interesting questions can be posed New fundamental quantities (cumulative distributions) defined: P(d com ) community degree distribution P(m) membership number distribution P(s ov ) community overlap distribution P(s) community size distribution (not new) G.P,I.D,I.F,T.V Nature 2005

DATA cond-mat authors (electronic preprints, about 30,000 authors) mobile phone (~ 4,000,000 users calling each other) school friendship (84 schools from USA) large data sets: efficient algorithm is needed! Our method is the fastest known to us for these type of data Steps: determine: cliques (not k-cliques!) clique overlap matrix components of the corresponding adjacency matrix Do this for “optimal” k and w, where optimal corresponds to the “richest” (most widely distributed cluster sizes) community structure

Visualization of the communities of a nodecommunities You can download the program and check your own communities

“Web of networks” Each node is a community Nodes are weighted for community size Links are weighted for overlap size DIP “core” data base of protein interactions (S. cerevisiase, a yeast) The other networks we analysed are much larger!!

Community size distribution Community degree distribution Combination of exponential and power law! Emergence of a new feature as going “up” to the next level

Community overlap size membership number.

A brief overview of a few case studies School friendships (disassortativity of communities, role of races) Phone calls (geographical and service usage correlations) Community dynamics for collaborators and phone callers

Three schools from the Add-Health school friendship data set Grades 7-12

Network of school friendship communities with M. Gonzalez, J. Kertész and H Herrmann k=3 (less dense) k=4 (more dense, cohesive) Minorities tend to form more densely interconnected groups

P(k) – degree distribution C(k) – clustering coefficient (k) – degree of neighbour (individuals: assortative communities: diassortative) communities individuals Distribution functions (for k=3)

Quantitative social group dynamics on a large scale i) attachment preferences (with G. Palla and P. Pollner) ii) tracking the evolution of communities (with G. Palla and A-L Barabási)

Community dynamics Dynamics of community growth: the preferential attachment principle applies on the level of communities as well The probability that a previously unlinked community joins a community larger than s grows approximately linearly (for the cond-mat coauthorship network) P.P,G.P,T.V Europhys Lett with P. Pollner and G. Palla

Communities in a “tiny” part of a phone calls network of 4 million users ( with A-L Barabási and G. Palla, Nature, April, 2007)

Callers with the same zip code or age are over-represented in the communities we find

Examples for tracking individual communities.

Lifetime (  ) of a social group as a function of steadiness (  ) and size (s) Cond-mat collaboration network Phone call network Thus, a large group is around for a longer time if it is less steady (and the opposite is true for small groups)

Screen shot of CFinder CFinder has become a commercial product by Firmlinks. GORDIO, a Budapest based HR company has been producing a quickly growing profit by using it.

Outlook: Networks of communities - further aspects of hierarchical organization - correlations, clustering, etc., i.e., everything you can do for vertices - applications, e.g., predictions (fate of a community, key players, etc)

Evolution of a single large community of collaborators s – size (number of authors), t – time (in months)

Small part of the phone call network (surrounding the circled yellow node up to the fourth neighbour) Small part of the collaboration network (surrounding the circled green node up to the fourth neighbour

Distribution of community sizes Over-representation of the usage of a given service as a function of the number of users in a community

Dedicated home page (software, papers, data) Home Screen shots

Basic observations: A large complex network is bounded to be highly structured (has modules; function follows from structure) The internal organization is typically hierarchical (i.e., displays some sort of self-similarity of the structure) An important new aspect: Overlaps of modules are essential

Information about the age distribution of users in communities of size s (Ratio of the standard deviation in a randomized set over actual) Information about the Zip code (spatial) distribution of users in communities of size s (Ratio of the standard deviation in a randomized set over actual)

The number of vertices in the largest component As N grows the width of the quickly growing region decays as 1/N 1/2

Evolution of the social network of scientific collaborations A.-L. B., H.J, Z.N., E.R., A. S., T. V. (Physica A, 2002) Data: collaboration graphs in (M) Mathematics and (NS) Neuroscience The Erdős graph and the Erdős number (Ei=2,W=8,BG=4) R. Faudree L. Lovász P. Erdős B. Bollobás

Collaboration network Cumulative data, Degree distribution: power-law with due to growth and preferential attachment

Collaboration network Internal preferential attachment: cumulative attachment rate: Measured data shows : is quadratic in k 1 k 2 is linear in k 1 k 2 Attachment rate Due to preferential growth and internal reorganization a complex network with all sorts of communities of collaborators are formed (e.g., due to specific topics or geographical reasons)

For k  3, N k * /N k (p c ) ~ N -k/6 For k > 3 N k * /N k (p c ) ~ N 1-k/2 The scaling of the relative size of the giant cluster of k-cliques at p c