Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph Data Management Lab, School of Computer Science Put conference information here: The 12-th International Conference.

Similar presentations


Presentation on theme: "Graph Data Management Lab, School of Computer Science Put conference information here: The 12-th International Conference."— Presentation transcript:

1 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Put conference information here: The 12-th International Conference of Date Engineering Version 1(2012-3-25) 张俊骏 A Large-Scale Community Structure Analysis in Facebook Email:08302010022@fudan.edu.cn

2 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Put conference information here: The 12-th International Conference of Date Engineering OutLine Introduction Data Collection Algorithm (1) BFS sampling(2) Uniform sampling Detection Communities (1) LPA algorithm(2) FNCA algorithm Experimentation (1) Community structure similarity (2) Out-of-scale community

3 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Introduction Large-Scale:There have been over 500 million users registered in Facebook in 2011. Community Structure : (1) Relationships are very tight over some areas of the social life, such as family, colleagues,friends. (2) While the outgoing connections not belonging to any of these categories are less likely to happen.

4 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Introduction(2) Large-Scale:There have been over 500 million users registered in Facebook in 2011. Community Structure : (1) Relationships are very tight over some areas of the social life, such as family, colleagues,friends. (2) While the outgoing connections not belonging to any of these categories are less likely to happen.

5 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Introduction(3) Community:A sub-structure within the overall graph, in which the density of the relationships in a certain community is much greater than the density among communities. Clustering : Get the communities within the certain graph (overall, or generating subgraph). In mathematic word, find a partition V = (V1 ∪ V2 ∪... ∪ Vn) , in which V1-Vn are vertex sets and for any Vx and Vy, Vx ∩ Vy = Ø

6 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Introduction(4) DataSets:(1) 2 different samples of the graph of relationships among the social network users. (2) Each contains millions entities, and then adopting two fast and efficient community detecting algorithms. (3) Working with no a-priori knowledge.

7 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Data Collection Algorithm BFS Sampling

8 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Data Collection Algorithm (2) BFS Sampling (1) Starting from one node (2) End when reaching the required level or node number. (3) Easy to achieve ; Efficient (4) Depend on the node selected at the start.

9 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Data Collection Algorithm (3) Uniform Sampling

10 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Data Collection Algorithm(4) Uniform Sampling Legal ID number in Facebook: about 2^32 Existed ID number in Facebook: about 500 million (2011) Thus, theoretically, if we want to mine a dataset of 1 million existed IDs, we need to test: S = 1,000,000 / (500,000,000/2^32) = 8,590,000 legal IDs Thus, generate 8,590,000 legal IDs randomly, check whether that ID exist. If so, mine the information of this node ; otherwise, drop it.

11 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Data Collection Algorithm(5) Uniform Sampling Obviously, the advantage of uniform sampling is the fact that the social network of the nodes will not make effect on the result. In the actual experiment, the generating dataset is a little smaller than BFS, because some users hide themselves from the random search.

12 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Data Collection Algorithm(6) DataSet Description 平均集聚系数为所有结点 Vi 的局部集聚系数的均值 结点 Vi 的局部集聚系数 Ci 是它的相邻结点之间的连接数与它们所有可能 存在连接的数量的比值。

13 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities LPA algorithm

14 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities(2) LPA algorithm (1) Under specific conditions, could not converge. In order to avoid deadlocks and to guarantee an efficient network clustering, we suggested to adopt an "asynchronous" update of the labels, thus considering the values of some neighbors at the previous iteration and some at the actual one. (2) About 5 iterations are sufficient to correctly classify 95% of vertices of the network.

15 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities(3) LPA algorithm (3) It could exist a path connecting a pair of vertices in a group passing through vertices belonging to different groups.We devise a final step to split the groups into one or more contiguous communities. (4) Near linear cost (5) Not stable in some cases

16 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities(4) FNCA algorithm(Pre)

17 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities(4) FNCA algorithm(Pre)

18 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities(5) FNCA algorithm Aij=1 当且仅当点 i 和点 j 互相连接。 δ ( u,v ) =1 当且仅当 u=v ki 就是点 i 与所有其他点 j 的 Aij 的总和 ( 即点 i 的总边数 ) m 是所有点的 k 值的总和的一半(即图的总边数) r(i) 即 i 所属的社区

19 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities(6)

20 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities(7) FNCA algorithm (1) Experimental results show that, the clustering solution of FNCA is good enough before iteration number reaches 50 for most networks (even large scale) (2) Generally speaking, the community structure of a network is evident when its Q-value is greater than 0.3 (3) The time complexity of the FNCA algorithm can not be worse than O(T * n * k * c)

21 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities(8) Experimentation Result

22 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Detecting Communities(9) Experimentation Result

23 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Experimentation Community structure similarity

24 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Experimentation(2) Community structure similarity rough method: improved method: M11 代表 v 交 w 之间共享的元素总数, M01 代表 w-v, M10 代表 v-w 当且仅当 v=w 时这个 J 值等于 1

25 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Experimentation(3) Experimantal results

26 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Experimentation(4) Out-of-scale community Maybe the shortage of algorithms, maybe it real exists. Anyway, it will be studied in the future.

27 Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Thank you!


Download ppt "Graph Data Management Lab, School of Computer Science Put conference information here: The 12-th International Conference."

Similar presentations


Ads by Google