1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Online Search of Overlapping Communities Wanyun Cui, Fudan University Yanghua Xiao, Fudan University Haixun Wang, Microsoft Research Asia Yiqi Lu, Fudan University Wei Wang, Fudan University Presenter. Wanyun Cui
2/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline Motivation Model Algorithm Experiments Applications
3/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline Motivation Model Algorithm Experiments Applications
4/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Complex network Complex network is everywhere. Social Network
5/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Complex network Complex network is everywhere. Internet
6/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Complex network Complex network is everywhere. Protein Network
7/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Complex network Complex network is everywhere. InternetSocial NetworkProtein Network
8/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Community structures Complex network is everywhere. Most real life networks have community structures. The graph can be divided into different groups such that the vertices within each group are closely connected and the vertices between different groups are sparsely connected InternetSocial NetworkProtein Network
9/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Overlapping community structure Overlapping community: a vertex may belong to multiple communities
10/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Overlapping community structure Overlapping community: a vertex may belong to multiple communities C1: small boat C2: meaning of bucket C3: big boat C4: table wares
11/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Finding community structures Two possible ways to find the community structure OCD: overlapping community detection OCS: overlapping community search
12/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS OCD: divides the entire network to find communities
13/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph Facebook network: over 800 million nodes and 100 billion links algorithmcomplexity Girvan–Newman algorithm O(|E| 3 ) LPAAlmost linear LAO(|C||E|+|V|)
14/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph A fixed parameter or criterion is not appropriate for all vertices and queries. Communities of a student Communities of Barack Obama
15/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph Graphs in real life are always evolving over time. We cannot afford to run OCD very frequently. OCD loses its freshness and effectiveness
16/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph Usually performed in an offline fashion
17/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCS: problem definition OCS: Given graph G, a query vertex v Return: all communities that v belong to Given:Return:
18/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS Advantages of OCS: More efficient Personalized criterion Light weight We just need to find communities within the local neighborhoods of the vertex. Our OCS solution only needs several milliseconds to find answer
19/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS Advantages of OCS: More efficient Personalized criterion Friendly to dynamic graph
20/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS Advantages of OCS: More efficient Personalized criterion Light weight A good choice to find communities in an online fashion
21/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Applications of OCS Friend recommendation on Facebook. Semantic expansion. Infectious disease control. Etc.
22/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Challenges of OCS Modeling Complexity and scalability A community should be dense enough Overlapping aware Generality
23/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Challenges of OCS Modeling Complexity and scalability OCS in the worst case may need to enumerate an exponential number of valid communities. Computational hard Approximate approach
24/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline Introduction Model Algorithm Experiments Applications
25/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model Community structure awareness Overlapping awareness Generality The inner edges of a community should be dense Clique as the unit of community A clique of 6 vertices
26/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model Community structure awareness Overlapping awareness Generality Two k-cliques are adjacent if they share k-1 vertices A community is a component in the k-clique graph Original graphClique graph (k=4)
27/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model Community structure awareness Overlapping awareness Generality
28/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model Community structure awareness Overlapping awareness Generality It’s ok if a few edges are missing in the clique
29/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model Community structure awareness Overlapping awareness Generality If two cliques share at least vertices, they are adjacent.
30/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model Community structure awareness Overlapping awareness Generality Original graph
31/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science k=4
32/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Alpha-gamma ocs k=3
33/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Parameter selection
34/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline Introduction Model Algorithm Experiments Applications
35/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Algorithm Exact algorithm Approximate algorithm
36/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Exact Algorithm Example k=4, (3,1)-OCS Query vertex = Bob
37/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Exact Algorithm Example k=4, (3,1)-OCS Query vertex = Bob Drawback exponential enumerations
38/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Approximate Algorithm Example k=4, (3,1)-OCS Query vertex = Bob Approximate the new clique contains at least one new vertex
39/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Approximate Algorithm Example k=4, (3,1)-OCS Query vertex = Bob Approximate the new clique contains at least one new vertex
40/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline Introduction Model Algorithm Experiments Applications
41/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Experiments Setup Dataset Intel Core2 2.13GHz 4GB memory 64 bit windows 7
42/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Experiments Setup Dataset Dataset|V||E| WordNet DBLP Google Livejournal
43/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Effectiveness It successfully unveils multiple research interests Example Jiawei Han K=6 Jiawei Han C1: multimedia data mining C2: stream data mining C3: information network
44/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Effectiveness Our model is flexible to support different parameters. Example Jiawei Han K=9 Jiawei Han
45/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Effectiveness For most vertices, OCS model can find non-trivial results.
46/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Performance OCS is more efficient than OCD. Competitors: LA OSLOM Amortized time (Total time of OCD)/n
47/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Performance: influence of parameters
48/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Accuracy of approximate algorithm More than 70% accuracy can be consistently achieved, in some cases almost 90% accuracy can be achieved
49/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline Introduction Model Algorithm Experiments Applications
50/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Diversity-based Social Network Analysis What is the distribution of diversity? Can we find people with really large diversity?
51/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Name disambiguation Ambiguous names with a significant number of entities also have a large number of communities. Real person’s communities is smaller than these ambiguous names.
52/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Contributions Problem definition Model Guide for parameter selection Algorithms Extensive experiments and applications
53/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Q&A Thank you!