Hao-Shang Ma and Jen-Wei Huang Knowledge and Information Discovery Lab, Dept. of Electrical Engineering, National Cheng Kung University The 7th Workshop on Social Network Mining and Analysis (SNA-KDD'13) joint with the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'13) CUT: Community Update and Tracking in Dynamic Social Networks
About Me Jen-Wei Huang ( 黃仁暐 ) Knowledge and Information Discovery Lab Dept. of Electrical Engineering, National Cheng Kung University mail.ncku.edu.tw /11/22KID Lab, National Cheng Kung University2
Research Data Mining and Database ◦ Time Series Mining ◦ Social Network Analysis Multimedia Information Retrieval Ubiquitous Computing ◦ Mobile Computing ◦ Cloud Computing Bioinformatics 2013/11/22KID Lab, National Cheng Kung University3
Outline Introduction CUT Algorithm Experiments Conclusions References 42013/11/22KID Lab, National Cheng Kung University
Introduction Social networking websites allow users to establish their own personal communities or social networks based on relationships of friends. 2012/10/12KID Lab, National Cheng Kung University5
Introduction Based on the relationships between users, social networks exhibit a community structure /11/22KID Lab, National Cheng Kung University
Introduction The detection of communities in a network usually puts network nodes into groups in such a way that nodes in the same group are densely connected to one another. An objective function is chosen to determine the quality of a community. Modularity [1] is a measure of the quality of a partition in terms of the number of intra- community and inter-community edges /11/22KID Lab, National Cheng Kung University
Introduction Social networks are always changing with the time. We want to quickly and efficiently identify the community structures of a network at every timestamp. Updating the network structure by tracking previously known information instead of recalculating the relationships of all nodes and edges in the networks /11/22KID Lab, National Cheng Kung University
Introduction In this work, we define the seed of community, which is a collection of 3- cliques where any two of 3-cliques share more than one edge. By tracking seed of communities, we are able to efficiently update and track the dynamics of communities in a social network. 2013/11/22KID Lab, National Cheng Kung University9
Example Network and 3-clique /11/22KID Lab, National Cheng Kung University
CUT Algorithm We propose CUT algorithm, standing for Community Update and Tracking algorithm, to update and track seed of communities. There are two phases in CUT algorithm. ◦ Initial phase, executed only once. Find seed of communities Extend seed of communities to communities ◦ Update and Tracking phase Maintain and update CAB graph 2013/11/22KID Lab, National Cheng Kung University11
Find Seed of Communities 1. Find all 3-cliques in a network 2. Build CBA (Clique Bipartite Adjacent) graph 3. Determine the seed of communities in a network /11/22KID Lab, National Cheng Kung University
Find All 3-cliques Backtracking algorithm /11/22KID Lab, National Cheng Kung University
All 3-cliques in the Network /11/22KID Lab, National Cheng Kung University
Clique Adjacent Bipartite Graph /11/22KID Lab, National Cheng Kung University
All 3-cliques in CAB /11/22KID Lab, National Cheng Kung University
Determine Seed of Community DFS-like algorithm to find connected component /11/22KID Lab, National Cheng Kung University
CAB Graph The complexity of tracking CAB is lower than that of tracking the original graph ◦ Complexity of building CAB is O(3|C|)=O(|C|) ◦ Complexity of determining the connected component is O(3|C|)=O(|C|) Easy to combine or split the seeds of community 2013/11/22KID Lab, National Cheng Kung University18
Extend to Communities Ignore the sparse nodes whose degree is smaller than 2. Assign the remain nodes to the closest seed of community Closest: the seed of community which has the most links to the node /11/22KID Lab, National Cheng Kung University
Update and Tracking Phase Maintain and Update CAB Graph ◦ If there are some changes in the network, do the following cases Case 1: New nodes & new edges are added Case 2: Old nodes & edges are removed Extend to Communities /11/22KID Lab, National Cheng Kung University
Case 1: Merge and Join 21 New Node : 20,21 New Edge : (2,8) (5,20), (9,20), (11,21) New 3-cliques: (2,6,8) and (5,9,20) 2013/11/22KID Lab, National Cheng Kung University
Case 1: Merge and Join /11/22KID Lab, National Cheng Kung University
Case 2: Split and Removal If there are nodes removed, we find all edges which connect to the removed nodes /11/22KID Lab, National Cheng Kung University N 10 is removed. Therefore, (4,10),(6,10) (8,10),(10,12) (10,11) are removed.
Node Removed Case - Split Remove corresponding edges and cliques Run FindSeedofCommunity algorithm again to update to new seeds of communities Complexity is O(3|C|+| removed C |) /11/22KID Lab, National Cheng Kung University
Joint Case There are new nodes added and edges removed at the same time /11/22KID Lab, National Cheng Kung University
Joint Case We simply deal with the Case 1 first, and then deal with the Case 2 so that we can decrease the unnecessary splits. Finally, extend seed of communities to communities /11/22KID Lab, National Cheng Kung University
Related Works - Update the Community Structure Nam P. Nguyen et al. propose a QCA algorithm. [9] ◦ The QCA algorithm uses the already known community structure, and deal with the changing cases, new nodes, new edges, nodes removed, and edges removed based on modularity. ◦ In QCA algorithm, they keep the whole community structure at each timestamp. ◦ Using original CPM in removed case every time, which cost lots of time. ◦ They have to identifying the nodes or edges belong to which type of cases. It costs much time as well /11/22KID Lab, National Cheng Kung University
Experiments Coauthor network (2002~2010) ◦ 1. About authors in one network ◦ 2. Densely connected graph ◦ 3. Five years as a time period, t1 is (first update) ◦ 4. Variations of network at each time stamp are small /11/22KID Lab, National Cheng Kung University
Experiments /11/22KID Lab, National Cheng Kung University
Experiments p2p-Gnutella network ◦ 1. t1-t4 is a snapshot from August 4 to , about 6000 nodes ◦ 2. Sparse connected graph ◦ 3. Variations of network at each time stamp are large /11/22KID Lab, National Cheng Kung University
Experiments /11/22KID Lab, National Cheng Kung University
Conclusions We design CUT algorithm for updating community structures in dynamic social networks instead of recalculating relationships of all nodes and edges in the social network. Keeping seeds of communities in the memory at each timestamp is more efficient than keeping all communities. Using Clique Adjacent Bipartite graph to update and track seeds of community leads to lower complexity /11/22KID Lab, National Cheng Kung University
References 1. M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phy. Rev. E 69, Bowen Yan and Steve Gregory,” Detecting Communities in Networks by Merging Cliques,” ICIS, CLAUSET, G., NEWMAN, M. E. and MOORE, C., “Finding community structure in very large networks,” Phys. Rev. E 70, , Zhengzhang Chen, Kevin A. Wilson, Ye Jin, William Hendrix and Nagiza F. Samatova, “Detecting and Tracking Community Dynamics in Evolutionary Networks,” ICDMW, /11/22KID Lab, National Cheng Kung University
References 5. Yi Wang, Bin Wu, and Xin Pei, “CommTracker: A Core-Based Algorithm of Tracking Community Evolution,” ADMA, Nam P. Nguyen, Thang N. Dinh, Ying Xuan, and My T. Thai. “Adaptive Algorithms for Detecting Community Structure in Dynamic Social Networks,” INFOCOM, Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte and Etienne Lefebvre,”Fast unfolding of communities in large networks,” JSTAT, Nan Du, Bin Wu, Xin Pei, Bai Wang and Liutong Xu,” Community Detection in Large-Scale Social Networks,” SNA-KDD, /11/22KID Lab, National Cheng Kung University