Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science

Similar presentations


Presentation on theme: "1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science"— Presentation transcript:

1 1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Online Search of Overlapping Communities Wanyun Cui, Fudan University Yanghua Xiao, Fudan University Haixun Wang, Microsoft Research Asia Yiqi Lu, Fudan University Wei Wang, Fudan University Presenter. Wanyun Cui

2 2/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Outline  Motivation  Model  Algorithm  Experiments  Applications

3 3/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Outline  Motivation  Model  Algorithm  Experiments  Applications

4 4/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Complex network  Complex network is everywhere. Social Network

5 5/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Complex network  Complex network is everywhere. Internet

6 6/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Complex network  Complex network is everywhere. Protein Network

7 7/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Complex network  Complex network is everywhere. InternetSocial NetworkProtein Network

8 8/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Community structures  Complex network is everywhere.  Most real life networks have community structures. The graph can be divided into different groups such that the vertices within each group are closely connected and the vertices between different groups are sparsely connected InternetSocial NetworkProtein Network

9 9/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Overlapping community structure  Overlapping community: a vertex may belong to multiple communities

10 10/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Overlapping community structure  Overlapping community: a vertex may belong to multiple communities C1: small boat C2: meaning of bucket C3: big boat C4: table wares

11 11/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Finding community structures  Two possible ways to find the community structure OCD: overlapping community detection OCS: overlapping community search

12 12/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com OCD vs. OCS  OCD: divides the entire network to find communities

13 13/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com OCD vs. OCS  Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph  Facebook network: over 800 million nodes and 100 billion links algorithmcomplexity Girvan–Newman algorithm O(|E| 3 ) LPAAlmost linear LAO(|C||E|+|V|)

14 14/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com OCD vs. OCS  Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph  A fixed parameter or criterion is not appropriate for all vertices and queries. Communities of a student Communities of Barack Obama

15 15/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com OCD vs. OCS  Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph  Graphs in real life are always evolving over time.  We cannot afford to run OCD very frequently.  OCD loses its freshness and effectiveness

16 16/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com OCD vs. OCS  Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph  Usually performed in an offline fashion

17 17/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com OCS: problem definition  OCS: Given graph G, a query vertex v Return: all communities that v belong to Given:Return:

18 18/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com OCD vs. OCS  Advantages of OCS: More efficient Personalized criterion Light weight  We just need to find communities within the local neighborhoods of the vertex.  Our OCS solution only needs several milliseconds to find answer

19 19/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com OCD vs. OCS  Advantages of OCS: More efficient Personalized criterion Friendly to dynamic graph

20 20/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com OCD vs. OCS  Advantages of OCS: More efficient Personalized criterion Light weight  A good choice to find communities in an online fashion

21 21/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Applications of OCS Friend recommendation on Facebook. Semantic expansion. Infectious disease control. Etc.

22 22/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Challenges of OCS Modeling Complexity and scalability  A community should be dense enough  Overlapping aware  Generality

23 23/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Challenges of OCS Modeling Complexity and scalability  OCS in the worst case may need to enumerate an exponential number of valid communities. Computational hard  Approximate approach

24 24/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Outline  Introduction  Model  Algorithm  Experiments  Applications

25 25/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Model  Community structure awareness  Overlapping awareness  Generality  The inner edges of a community should be dense  Clique as the unit of community A clique of 6 vertices

26 26/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Model  Community structure awareness  Overlapping awareness  Generality  Two k-cliques are adjacent if they share k-1 vertices  A community is a component in the k-clique graph Original graphClique graph (k=4)

27 27/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Model  Community structure awareness  Overlapping awareness  Generality

28 28/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Model  Community structure awareness  Overlapping awareness  Generality It’s ok if a few edges are missing in the clique

29 29/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Model  Community structure awareness  Overlapping awareness  Generality If two cliques share at least vertices, they are adjacent.

30 30/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Model  Community structure awareness  Overlapping awareness  Generality Original graph

31 31/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com k=4

32 32/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Alpha-gamma ocs k=3

33 33/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Parameter selection

34 34/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Outline  Introduction  Model  Algorithm  Experiments  Applications

35 35/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Algorithm  Exact algorithm  Approximate algorithm

36 36/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Exact Algorithm  Example k=4, (3,1)-OCS Query vertex = Bob

37 37/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Exact Algorithm  Example k=4, (3,1)-OCS Query vertex = Bob  Drawback exponential enumerations

38 38/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Approximate Algorithm  Example k=4, (3,1)-OCS Query vertex = Bob  Approximate the new clique contains at least one new vertex

39 39/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Approximate Algorithm  Example k=4, (3,1)-OCS Query vertex = Bob  Approximate the new clique contains at least one new vertex

40 40/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Outline  Introduction  Model  Algorithm  Experiments  Applications

41 41/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Experiments  Setup  Dataset  Intel Core2 2.13GHz  4GB memory  64 bit windows 7

42 42/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Experiments  Setup  Dataset Dataset|V||E| WordNet82676133445 DBLP5608511816613 Google9164274322051 Livejournal484757242851237

43 43/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Effectiveness  It successfully unveils multiple research interests  Example Jiawei Han K=6 Jiawei Han C1: multimedia data mining C2: stream data mining C3: information network

44 44/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Effectiveness  Our model is flexible to support different parameters.  Example Jiawei Han K=9 Jiawei Han

45 45/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Effectiveness  For most vertices, OCS model can find non-trivial results.

46 46/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Performance  OCS is more efficient than OCD.  Competitors: LA OSLOM  Amortized time (Total time of OCD)/n

47 47/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Performance: influence of parameters

48 48/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Accuracy of approximate algorithm  More than 70% accuracy can be consistently achieved, in some cases almost 90% accuracy can be achieved

49 49/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Outline  Introduction  Model  Algorithm  Experiments  Applications

50 50/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Diversity-based Social Network Analysis  What is the distribution of diversity?  Can we find people with really large diversity?

51 51/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Name disambiguation  Ambiguous names with a significant number of entities also have a large number of communities.  Real person’s communities is smaller than these ambiguous names.

52 52/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Contributions  Problem definition  Model  Guide for parameter selection  Algorithms  Extensive experiments and applications

53 53/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: zhenjiong@gmail.com Q&A Thank you!


Download ppt "1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science"

Similar presentations


Ads by Google