Jie Tang Computer Science, Tsinghua

Jie Tang Computer Science, Tsinghua University @WWW’2017
Computational Models for Social Network Analysis —mining big social networks (Part IV: Representation) Jie Tang Computer Science, Tsinghua University @WWW’2017

Fundamental question How to represent users, relationships, and groups in a big network? Fast representation Embedding representation

Example: Who are Similar to Barabási?
[1] Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. Panther: Fast Top-k Similarity Search on Large Networks. KDD'15. pp

Similar Authors in AMiner.org

Let us recall Similarity Search…

Does not really consider the topology
Common Neighbors Two users in a social network are similar if they share many of the same friends (Jaccard, 1901). Does not really consider the topology social

SimRank SimRank is a general similarity measure, based on a simple and intuitive graph-theoretic model (Jeh and Widom, KDD’02).

Cannot support real-time search!
Path Similarity Intuition: two vertices are similar if they frequently appear on the same paths. v2 v1 v3 v5 v4 (T=2) A path is a T-length sequence of vertices p = (v1,··· ,vT+1). Π is all the T-paths in G. Path weight: Sps(v1,v2)=0.37, Sps(v1,v3)=0.42, Sps(v1,v4)=0.39, Sps(v1,v5)=0.09. Low efficiency! Cannot support real-time search!

Related Work and Challenges
Method Time Complexity Space Complexity SimRank [kdd’02] O(IN2d2) O(N2) TopSim [ICDE’12] O(NTdT) O(N+M) RWR [KDD’04] O(IN2d) RoleSim [KDD’11] ReFex [KDD’11] O(N+I(fM+Nf2)) O(N+Mf) 1 Share many direct/indirect common neighbors. 2 Disconnected, but share similar structure. Find top-K similar vertices for any vertex in a network d: average degree, f: feature number, T: path length Challenges C1 : How to design a similarity method that applies to both similarities? C2: Computational efficiency challenge.

Panther: Fast Top-k Similarity Search on Big Networks
[1] Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. Panther: Fast Top-k Similarity Search on Large Networks. KDD'15. pp

Path Similarity 1 Intuition: two vertices are similar if they frequently appear on the same paths. v2 v1 v3 v5 v4 (T=2) A path is a T-length sequence of vertices p = (v1,··· ,vT+1). Π is all the T-paths in G. Path weight: Sps(v1,v2)=0.37, Sps(v1,v3)=0.42, Sps(v1,v4)=0.39, Sps(v1,v5)=0.09.

Pantherps Basic idea: random path sampling Simplified path similarity:
O(RT) O(dT)

Upper bound of range set’s VC dimension
Theoretical Analysis How many random paths shall we sample? Domain and range set Upper bound of range set’s VC dimension Distribution 1 2 3 Required sample size

Theoretical Analysis Domain: Π Range set: VC bound: Distribution:
Details Domain: Π Range set: VC bound: Distribution: Path similarity is Conclusion R random paths can guarantee ε and 1−δ.

Proof of Details A set Q of size l can be shattered by RG
Assume and A set Q of size l can be shattered by RG A 1-1 corresponding between each subset in Q and each range Pi in RG A path belongs only to the ranges w.r.t a pair of vertices in the path Contradiction

Vector Similarity and Panthervs
2 Limitation of path similarity: bias to close neighbors. Vector Similarity: the probability distributions of a vertex linking to all other vertices are similar if their topology structures are similar. Panthervs : Use top-D path similarities calculated by Pantherps to construct a vector: v 0.13 0.04 u 0.12 0.39 (T=2) 0.13 0.04 w 0.11 0.02 0.12 0.25 0.39 0.12 Svs(u,w)=0.27 > Svs (u,v)=0.16 0.25 0.12 0.11 0.02

Time Complexity Random path Vertex-to-path index Kd-tree
Method Time Complexity Space Complexity SimRank O(IN2d2) O(N2) TopSim O(NTdT) O(N+M) RWR O(IN2d) RoleSim ReFex O(N+I(fM+Nf2)) O(N+Mf) Pantherps O(RTc+NdT) O(RT+Nd) Panthervs O(RTc+NdT+Nc) O(RT+Nd+ND) Random path Vertex-to-path index Kd-tree Random path sampling Top-k similarity search for any vertex Build and query kd-tree

Experiments

Efficiency Performance
Tencent network Preprocessing time + top-k similarity search time |V| |E| RWR [(KDD’04] TopSim [ICDE’12] RoleSim [KDD’11] ReFex Pantherps Panthervs 6,523 10,000 +7.79hr +38.58m +37.26s 3.85s+0.07s 0.07s+0.26s 0.99s+0.21s 25,844 50,000 +>150hr +11.20hr +12.98m 26.09s+0.40s 0.28s+1.53s 2.45s+4.21s 48,837 100,000 +30.94hr +1.06hr 2.02m+0.57s 0.58s+3.48s 5.30s+5.96s 169,209 500,000 +>120hr +>72hr 17.18m+2.51s 8.19s+16.08s 27.94s+24.17s 230,103 1,000,000 31.50m+3.29s 15.31s+30.63s 49.83s+22.86s 443,070 5,000,000 24.15hr+8.55s 50.91s+2.82m 4.01m+1.29m 702,049 10,000,000 >48hr 2.21m+6.24m 8.60m+6.58m 2,767,344 50,000,000 15.787m+1.36hr 1.60hr+2.17hr 5,355,507 100,000,000 44.09m+4.50hr 5.61hr+6.47hr 26,033,969 500,000,000 4.82hr+25.01hr 32.90hr+47.34hr 51,640,620 1,000,000,000 13.32hr+80.38hr 98.15hr hr 390X speed up 270X speed up Can scale up to handle 1 billion edges T=5, c=0.5, ε=√1/|E| and δ=0.1, R=16,609,640

Accuracy Performance of Pantherps
Evaluate how Pantherps can approximate common neighbors. The score represents the improvement over a random method. KDD Twitter Mobile Co-author networks: |V|=3K, |E| = 7K. Twitter network: |V| = 100K, |E| = 500K. Mobile network: |V| = 200K, |E| = 200K.

Accuracy Performance of Panthervs
Identity Resolution Assume the same authors in different networks of the same domain are similar to each other. Settings Given any two co-author networks, e.g., KDD and ICDM, if the top-k similar vertices from ICDM consists of the query author from KDD, we say that the method hits a correct instance. KDD-ICDM SIGIR-CIKM SIGMOD-ICDE

Parameter Analysis: Path Length T
The performance gets better when T increases. The performance almost becomes stable When T ≥ 5. Effect of path length T on the accuracy performance of Panthervs.

Parameter Analysis: Error Bound ε
Tencent networks When |E|/(1/ε)2 ranges from 5 to 20, Pantherps are almost convergent; The value (1/ε)2 is almost linearly positively correlated with the number of edges in a network; Therefore, we empirically set ε=√1/|E| in our experiments

Deploy in AMiner.org

Big Network Analysis BIG Networks User Tie Topology Heterogeneous
data User Tie Topology Heterogeneous Micro Macro tie Influence Dynamic - User Modeling - Demographics - Social Role - Social Tie/Link - Homophily - Social Influence - Triad Formation - Community - Group Behavior Big&Big social Social Theories Graph Theories BIG Networks

Future Work BIG Networks User Tie Topology Heterogeneous
data User Tie Topology Heterogeneous Micro Macro Question 1: How to incorporate the huge-volume & streaming individual behavior data into network analysis? Question 2: What is the Interplay between (Micro) Individual Behavior and (Macro) Network Distribution? tie Influence Dynamic - User modeling - Demographics - Social Role - Social Tie/Link - Homophily - Social Influence - Triad Formation - Community - Group Behavior Big&Big social Social Theories Graph Theories BIG Networks

Related Publications Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social Influence Analysis in Large-scale Networks. In KDD’09, pages , 2009. Chenhao Tan, Jie Tang, Jimeng Sun, Quan Lin, and Fengjiao Wang. Social action tracking via noise tolerant time-varying factor graphs. In KDD’10, pages 807–816, 2010. Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, and Jingyi Guo. Mining Advisor-Advisee Relationships from Research Publication Networks. In KDD'10, pages Jie Tang, Sen Wu, and Jimeng Sun. Confluence: Conformity Influence in Large Social Networks. In KDD’13, pages , 2013. Yuxiao Dong, Yang Yang, Jie Tang, Yang Yang, Nitesh V. Chawla. Inferring User Demographics and Social Strategies in Mobile Social Networks. In KDD’14, 2014. Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip Yu. COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency. In KDD'15, pages Jing Zhang, Biao Liu, Jie Tang, Ting Chen, and Juanzi Li. Social Influence Locality for Modeling Retweeting Behaviors. In IJCAI'13, pages , 2013. Jing Zhang, Jie Tang, Honglei Zhuang, Cane Wing-Ki Leung, and Juanzi Li. Role-aware Conformity Influence Modeling and Analysis in Social Networks. In AAAI'14, 2014. Jing Zhang, Jie Tang, Yuanyi Zhong, Yuchen Mo, Juanzi Li, Guojie Song, Wendy Hall, and Jimeng Sun. StructInf: Mining Structural Influence from Social Streams. In AAAI'17. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In KDD’08, pages , 2008. Tiancheng Lou and Jie Tang. Mining Structural Hole Spanners Through Information Diffusion in Social Networks. In WWW'13, pages , 2013. Jiezhong Qiu, Yixuan Li, Jie Tang, Zheng Lu, Hao Ye, Bo Chen, Qiang Yang, and John Hopcroft. The Lifecycle and Cascade of WeChat Social Messaging Groups. In WWW'16, pages Jie Tang, Tiancheng Lou, Jon Kleinberg, and Sen Wu. Transfer Learning to Infer Social Ties across Heterogeneous Networks. In TOIS, Vol 34 (2), No. 7, 2016 Jimeng Sun and Jie Tang. A Survey of Models and Algorithms for Social Influence Analysis. Social Network Data Analytics, Aggarwal, C. C. (Ed.), Kluwer Academic Publishers, pages 177–214, 2011.

References S. Milgram. The Small World Problem. Psychology Today, 1967, Vol. 2, 60–67 J.H. Fowler and N.A. Christakis. The Dynamic Spread of Happiness in a Large Social Network: Longitudinal Analysis Over 20 Years in the Framingham Heart Study. British Medical Journal 2008; 337: a2338 R. Dunbar. Neocortex size as a constraint on group size in primates. Human Evolution, 1992, 20: 469–493. R. M. Bond, C. J. Fariss, J. J. Jones, A. D. I. Kramer, C. Marlow, J. E. Settle and J. H. Fowler. A 61-million-person experiment in social influence and political mobilization. Nature, 489: , 2012. Why I Deleted My Klout Profile, by Pam Moore, at Social Media Today, originally published November 19, 2011; retrieved November S. Aral and D Walker. Identifying Influential and Susceptible Members of Social Networks. Science, 337: , 2012. J. Ugandera, L. Backstromb, C. Marlowb, and J. Kleinberg. Structural diversity in social contagion. PNAS, 109 (20): , 2012. S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. PNAS, 106 (51): , 2009. J. Scripps, P.-N. Tan, and A.-H. Esfahanian. Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. In KDD’09, pages 747–756, 2009. Rubin, D. B Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 5, 688–701.

References(cont.) A. Anagnostopoulos, R. Kumar, M. Mahdian. Influence and correlation in social networks. In KDD’08, pages 7-15, 2008. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report SIDL-WP , Stanford University, 1999. G. Jeh and J. Widom. Scaling personalized web search. In WWW '03, pages , 2003. G. Jeh and J. Widom, SimRank: a measure of structural-context similarity. In KDD’02, pages , 2002. A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. In WSDM’10, pages 207–217, 2010. P. Domingos and M. Richardson. Mining the network value of customers. In KDD’01, pages 57–66, 2001. D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD’03, pages 137–146, 2003. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD’07, pages 420–429, 2007. W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD'09, pages , 2009. E. Bakshy, D. Eckles, R. Yan, and I. Rosenn. Social influence in social advertising: evidence from field experiments. In EC'12, pages , 2012. A. Goyal, F. Bonchi, and L. V. Lakshmanan. Discovering leaders from community actions. In CIKM’08, pages 499–508, 2008. N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the influential bloggers in a community. In WSDM’08, pages 207–217, 2008.

References(cont.) E. Bakshy, B. Karrer, and L. A. Adamic. Social influence and the diffusion of user-created content. In EC ’09, pages 325–334, New York, NY, USA, ACM. P. Bonacich. Power and centrality: a family of measures. American Journal of Sociology, 92:1170–1182, 1987. R. B. Cialdini and N. J. Goldstein. Social influence: compliance and conformity. Annu Rev Psychol, 55:591–621, 2004. D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, and S. Suri. Feedback effects between similarity and social influence in online communities. In KDD’08, pages 160–168, 2008. P. W. Eastwick and W. L. Gardner. Is it a game? evidence for social influence in the virtual world. Social Influence, 4(1):18–32, 2009. S. M. Elias and A. R. Pratkanis. Teaching social influence: Demonstrations and exercises from the discipline of social psychology. Social Influence, 1(2):147–162, 2006. T. L. Fond and J. Neville. Randomization tests for distinguishing social influence and homophily effects. In WWW’10, 2010. M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring Networks of Diffusion and Influence. In KDD’10, pages 1019–1028, 2010. M. E. J. Newman. A measure of betweenness centrality based on random walks. Social Networks, 2005. D. J. Watts and S. H. Strogatz. Collective dynamics of ’small-world’ networks. Nature, pages 440–442, Jun 1998. J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In ICDM’05, pages 418–425, 2005.

Thank you！ Collaborators: John Hopcroft, Jon Kleinberg, Chenhao Tan (Cornell) Jiawei Han (UIUC), Philip Yu (UIC) Jian Pei (SFU), Hanghang Tong (ASU) Tiancheng Lou (Google&Baidu), Jimeng Sun (GIT) Wei Chen, Ming Zhou, Long Jiang, Chi Wang, Yuxiao Dong (Microsoft) Yutao Zhang, Jing Zhang, Zhanpeng Fang, Zi Yang, Sen Wu, etc. (THU) Jie Tang, KEG, Tsinghua U, Download all data & Codes,

Jie Tang Computer Science, Tsinghua

Similar presentations

Presentation on theme: "Jie Tang Computer Science, Tsinghua"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jie Tang Computer Science, Tsinghua

Similar presentations

Presentation on theme: "Jie Tang Computer Science, Tsinghua"— Presentation transcript:

Similar presentations

About project

Feedback