Jie Tang Computer Science, Tsinghua

Slides:



Advertisements
Similar presentations
Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
Advertisements

1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
Local Discriminative Distance Metrics and Their Real World Applications Local Discriminative Distance Metrics and Their Real World Applications Yang Mu,
1 Jie Tang, Chenhui Zhang Tsinghua University Keke Cai, Li Zhang, Zhong Su IBM, China Research Lab Sampling Representative Users from Large Social Networks.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Computational Models for Social Networks
Confluence: Conformity Influence in Large Social Networks
Maximizing the Spread of Influence through a Social Network
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
1 Yuxiao Dong *$, Jie Tang $, Sen Wu $, Jilei Tian # Nitesh V. Chawla *, Jinghai Rao #, Huanhuan Cao # Link Prediction and Recommendation across Multiple.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Honglei Zhuang1, Jing Zhang2, George Brova1,
1 Computational Models for Micro-level Social Network Analysis Jie Tang Tsinghua University, China.
Department of Computer Science and Technology
1 Social Influence and Information Diffusion Jie Tang Department of Computer Science and Technology Tsinghua University.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Personalized Influence Maximization on Social Networks
1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Information Spread and Information Maximization in Social Networks Xie Yiran 5.28.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Advisor-advisee Relationship Mining from Research Publication Network Chi Wang 1, Jiawei Han 1, Yuntao Jia 1, Jie Tang 2, Duo Zhang 1, Yintao Yu 1, Jingyi.
December 7-10, 2013, Dallas, Texas
On Node Classification in Dynamic Content-based Networks.
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Eva Tardos Cornell University KDD 2003.
1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.
1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
1 CoupledLP: Link Prediction in Coupled Networks Yuxiao Dong #, Jing Zhang +, Jie Tang +, Nitesh V. Chawla #, Bai Wang* # University of Notre Dame + Tsinghua.
Lecture 5-1 Positive Influence Dominating Set Ding-Zhu Du Univ of Texas at Dallas.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
1 Zi Yang Tsinghua University Joint work with Prof. Jie Tang, Prof. Juanzi Li, Dr. Keke Cai, Jingyi Guo, Chi Wang, etc. July 21, 2011, CASIN 2011, Tsinghua.
1 Zi Yang Tsinghua University Joint work with Prof. Jie Tang, Prof. Juanzi Li, Dr. Keke Cai, Jingyi Guo, Chi Wang, etc. July 21, 2011, CASIN 2011, Tsinghua.
1 Social Influence and Sentiment Analysis —From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology.
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Wenyu Zhang From Social Network Group
Large Graph Mining: Power Tools and a Practitioner’s guide
Groups of vertices and Core-periphery structure
Cross-lingual Knowledge Linking Across Wiki Knowledge Bases
Link-Based Ranking Seminar Social Media Mining University UC3M
Social Role-Aware Emotion Contagion in Image Social Networks
Jie Tang Computer Science, Tsinghua
Learning Influence Probabilities In Social Networks
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Probably Approximately
Volume 74, Issue 6, Pages (December 2018)
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Structural influence:
Jiawei Han Department of Computer Science
Query Independent Scholarly Article Ranking
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Example: Academic Search
Asymmetric Transitivity Preserving Graph Embedding
Discovering Influential Nodes From Social Trust Network
DATA-Intensive systems Department of computer science
Representation Learning on Networks
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Network Embedding under Partial Monitoring for Evolving Networks
Presentation transcript:

Jie Tang Computer Science, Tsinghua University @WWW’2017 Computational Models for Social Network Analysis —mining big social networks (Part IV: Representation) Jie Tang Computer Science, Tsinghua University @WWW’2017

Fundamental question How to represent users, relationships, and groups in a big network? Fast representation Embedding representation

Example: Who are Similar to Barabási? [1] Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. Panther: Fast Top-k Similarity Search on Large Networks. KDD'15. pp. 1445-1454.

Similar Authors in AMiner.org

Let us recall Similarity Search…

Does not really consider the topology Common Neighbors Two users in a social network are similar if they share many of the same friends (Jaccard, 1901). Does not really consider the topology social

SimRank SimRank is a general similarity measure, based on a simple and intuitive graph-theoretic model (Jeh and Widom, KDD’02).

Cannot support real-time search! Path Similarity Intuition: two vertices are similar if they frequently appear on the same paths. v2 v1 v3 v5 v4 (T=2) A path is a T-length sequence of vertices p = (v1,··· ,vT+1). Π is all the T-paths in G. Path weight: Sps(v1,v2)=0.37, Sps(v1,v3)=0.42, Sps(v1,v4)=0.39, Sps(v1,v5)=0.09. Low efficiency! Cannot support real-time search!

Related Work and Challenges Method Time Complexity Space Complexity SimRank [kdd’02] O(IN2d2) O(N2) TopSim [ICDE’12] O(NTdT) O(N+M) RWR [KDD’04] O(IN2d) RoleSim [KDD’11] ReFex [KDD’11] O(N+I(fM+Nf2)) O(N+Mf) 1 Share many direct/indirect common neighbors. 2 Disconnected, but share similar structure. Find top-K similar vertices for any vertex in a network d: average degree, f: feature number, T: path length Challenges C1 : How to design a similarity method that applies to both similarities? C2: Computational efficiency challenge.

Panther: Fast Top-k Similarity Search on Big Networks [1] Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. Panther: Fast Top-k Similarity Search on Large Networks. KDD'15. pp. 1445-1454.

Path Similarity 1 Intuition: two vertices are similar if they frequently appear on the same paths. v2 v1 v3 v5 v4 (T=2) A path is a T-length sequence of vertices p = (v1,··· ,vT+1). Π is all the T-paths in G. Path weight: Sps(v1,v2)=0.37, Sps(v1,v3)=0.42, Sps(v1,v4)=0.39, Sps(v1,v5)=0.09.

Pantherps Basic idea: random path sampling Simplified path similarity: O(RT) O(dT)

Upper bound of range set’s VC dimension Theoretical Analysis How many random paths shall we sample? Domain and range set Upper bound of range set’s VC dimension Distribution 1 2 3 Required sample size

Theoretical Analysis Domain: Π Range set: VC bound: Distribution: Details Domain: Π Range set: VC bound: Distribution: Path similarity is Conclusion R random paths can guarantee ε and 1−δ.

Proof of Details A set Q of size l can be shattered by RG Assume and A set Q of size l can be shattered by RG A 1-1 corresponding between each subset in Q and each range Pi in RG A path belongs only to the ranges w.r.t a pair of vertices in the path Contradiction

Vector Similarity and Panthervs 2 Limitation of path similarity: bias to close neighbors. Vector Similarity: the probability distributions of a vertex linking to all other vertices are similar if their topology structures are similar. Panthervs : Use top-D path similarities calculated by Pantherps to construct a vector: v 0.13 0.04 u 0.12 0.39 (T=2) 0.13 0.04 w 0.11 0.02 0.12 0.25 0.39 0.12 Svs(u,w)=0.27 > Svs (u,v)=0.16 0.25 0.12 0.11 0.02

Time Complexity Random path Vertex-to-path index Kd-tree Method Time Complexity Space Complexity SimRank O(IN2d2) O(N2) TopSim O(NTdT) O(N+M) RWR O(IN2d) RoleSim ReFex O(N+I(fM+Nf2)) O(N+Mf) Pantherps O(RTc+NdT) O(RT+Nd) Panthervs O(RTc+NdT+Nc) O(RT+Nd+ND) Random path Vertex-to-path index Kd-tree Random path sampling Top-k similarity search for any vertex Build and query kd-tree

Experiments

Efficiency Performance Tencent network Preprocessing time + top-k similarity search time |V| |E| RWR [(KDD’04] TopSim [ICDE’12] RoleSim [KDD’11] ReFex Pantherps Panthervs 6,523 10,000 +7.79hr +38.58m +37.26s 3.85s+0.07s 0.07s+0.26s 0.99s+0.21s 25,844 50,000 +>150hr +11.20hr +12.98m 26.09s+0.40s 0.28s+1.53s 2.45s+4.21s 48,837 100,000 +30.94hr +1.06hr 2.02m+0.57s 0.58s+3.48s 5.30s+5.96s 169,209 500,000 +>120hr +>72hr 17.18m+2.51s 8.19s+16.08s 27.94s+24.17s 230,103 1,000,000 31.50m+3.29s 15.31s+30.63s 49.83s+22.86s 443,070 5,000,000 24.15hr+8.55s 50.91s+2.82m 4.01m+1.29m 702,049 10,000,000 >48hr 2.21m+6.24m 8.60m+6.58m 2,767,344 50,000,000 15.787m+1.36hr 1.60hr+2.17hr 5,355,507 100,000,000 44.09m+4.50hr 5.61hr+6.47hr 26,033,969 500,000,000 4.82hr+25.01hr 32.90hr+47.34hr 51,640,620 1,000,000,000 13.32hr+80.38hr 98.15hr+120.01hr 390X speed up 270X speed up Can scale up to handle 1 billion edges T=5, c=0.5, ε=√1/|E| and δ=0.1, R=16,609,640

Accuracy Performance of Pantherps Evaluate how Pantherps can approximate common neighbors. The score represents the improvement over a random method. KDD Twitter Mobile Co-author networks: |V|=3K, |E| = 7K. Twitter network: |V| = 100K, |E| = 500K. Mobile network: |V| = 200K, |E| = 200K.

Accuracy Performance of Panthervs Identity Resolution Assume the same authors in different networks of the same domain are similar to each other. Settings Given any two co-author networks, e.g., KDD and ICDM, if the top-k similar vertices from ICDM consists of the query author from KDD, we say that the method hits a correct instance. KDD-ICDM SIGIR-CIKM SIGMOD-ICDE

Parameter Analysis: Path Length T The performance gets better when T increases. The performance almost becomes stable When T ≥ 5. Effect of path length T on the accuracy performance of Panthervs.

Parameter Analysis: Error Bound ε Tencent networks When |E|/(1/ε)2 ranges from 5 to 20, Pantherps are almost convergent; The value (1/ε)2 is almost linearly positively correlated with the number of edges in a network; Therefore, we empirically set ε=√1/|E| in our experiments

Deploy in AMiner.org

Big Network Analysis BIG Networks User Tie Topology Heterogeneous data User Tie Topology Heterogeneous Micro Macro tie Influence Dynamic - User Modeling - Demographics - Social Role - Social Tie/Link - Homophily - Social Influence - Triad Formation - Community - Group Behavior Big&Big social Social Theories Graph Theories BIG Networks

Future Work BIG Networks User Tie Topology Heterogeneous data User Tie Topology Heterogeneous Micro Macro Question 1: How to incorporate the huge-volume & streaming individual behavior data into network analysis? Question 2: What is the Interplay between (Micro) Individual Behavior and (Macro) Network Distribution? tie Influence Dynamic - User modeling - Demographics - Social Role - Social Tie/Link - Homophily - Social Influence - Triad Formation - Community - Group Behavior Big&Big social Social Theories Graph Theories BIG Networks

Related Publications Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social Influence Analysis in Large-scale Networks. In KDD’09, pages 807-816, 2009. Chenhao Tan, Jie Tang, Jimeng Sun, Quan Lin, and Fengjiao Wang. Social action tracking via noise tolerant time-varying factor graphs. In KDD’10, pages 807–816, 2010. Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, and Jingyi Guo. Mining Advisor-Advisee Relationships from Research Publication Networks. In KDD'10, pages 203-212. Jie Tang, Sen Wu, and Jimeng Sun. Confluence: Conformity Influence in Large Social Networks. In KDD’13, pages 347-355, 2013. Yuxiao Dong, Yang Yang, Jie Tang, Yang Yang, Nitesh V. Chawla. Inferring User Demographics and Social Strategies in Mobile Social Networks. In KDD’14, 2014. Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip Yu. COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency. In KDD'15, pages 1485-1494. Jing Zhang, Biao Liu, Jie Tang, Ting Chen, and Juanzi Li. Social Influence Locality for Modeling Retweeting Behaviors. In IJCAI'13, pages 2761-2767, 2013. Jing Zhang, Jie Tang, Honglei Zhuang, Cane Wing-Ki Leung, and Juanzi Li. Role-aware Conformity Influence Modeling and Analysis in Social Networks. In AAAI'14, 2014. Jing Zhang, Jie Tang, Yuanyi Zhong, Yuchen Mo, Juanzi Li, Guojie Song, Wendy Hall, and Jimeng Sun. StructInf: Mining Structural Influence from Social Streams. In AAAI'17. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In KDD’08, pages 990-998, 2008. Tiancheng Lou and Jie Tang. Mining Structural Hole Spanners Through Information Diffusion in Social Networks. In WWW'13, pages 837-848, 2013. Jiezhong Qiu, Yixuan Li, Jie Tang, Zheng Lu, Hao Ye, Bo Chen, Qiang Yang, and John Hopcroft. The Lifecycle and Cascade of WeChat Social Messaging Groups. In WWW'16, pages 311-320. Jie Tang, Tiancheng Lou, Jon Kleinberg, and Sen Wu. Transfer Learning to Infer Social Ties across Heterogeneous Networks. In TOIS, Vol 34 (2), No. 7, 2016 Jimeng Sun and Jie Tang. A Survey of Models and Algorithms for Social Influence Analysis. Social Network Data Analytics, Aggarwal, C. C. (Ed.), Kluwer Academic Publishers, pages 177–214, 2011.

References S. Milgram. The Small World Problem. Psychology Today, 1967, Vol. 2, 60–67 J.H. Fowler and N.A. Christakis. The Dynamic Spread of Happiness in a Large Social Network: Longitudinal Analysis Over 20 Years in the Framingham Heart Study. British Medical Journal 2008; 337: a2338 R. Dunbar. Neocortex size as a constraint on group size in primates. Human Evolution, 1992, 20: 469–493. R. M. Bond, C. J. Fariss, J. J. Jones, A. D. I. Kramer, C. Marlow, J. E. Settle and J. H. Fowler. A 61-million-person experiment in social influence and political mobilization. Nature, 489:295-298, 2012. http://klout.com Why I Deleted My Klout Profile, by Pam Moore, at Social Media Today, originally published November 19, 2011; retrieved November 26 2011 S. Aral and D Walker. Identifying Influential and Susceptible Members of Social Networks. Science, 337:337-341, 2012. J. Ugandera, L. Backstromb, C. Marlowb, and J. Kleinberg. Structural diversity in social contagion. PNAS, 109 (20):7591-7592, 2012. S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. PNAS, 106 (51):21544-21549, 2009. J. Scripps, P.-N. Tan, and A.-H. Esfahanian. Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. In KDD’09, pages 747–756, 2009. Rubin, D. B. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 5, 688–701. http://en.wikipedia.org/wiki/Randomized_experiment

References(cont.) A. Anagnostopoulos, R. Kumar, M. Mahdian. Influence and correlation in social networks. In KDD’08, pages 7-15, 2008. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report SIDL-WP-1999-0120, Stanford University, 1999. G. Jeh and J. Widom. Scaling personalized web search. In WWW '03, pages 271-279, 2003. G. Jeh and J. Widom, SimRank: a measure of structural-context similarity. In KDD’02, pages 538-543, 2002. A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. In WSDM’10, pages 207–217, 2010. P. Domingos and M. Richardson. Mining the network value of customers. In KDD’01, pages 57–66, 2001. D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD’03, pages 137–146, 2003. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD’07, pages 420–429, 2007. W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD'09, pages 199-207, 2009. E. Bakshy, D. Eckles, R. Yan, and I. Rosenn. Social influence in social advertising: evidence from field experiments. In EC'12, pages 146-161, 2012. A. Goyal, F. Bonchi, and L. V. Lakshmanan. Discovering leaders from community actions. In CIKM’08, pages 499–508, 2008. N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the influential bloggers in a community. In WSDM’08, pages 207–217, 2008.

References(cont.) E. Bakshy, B. Karrer, and L. A. Adamic. Social influence and the diffusion of user-created content. In EC ’09, pages 325–334, New York, NY, USA, 2009. ACM. P. Bonacich. Power and centrality: a family of measures. American Journal of Sociology, 92:1170–1182, 1987. R. B. Cialdini and N. J. Goldstein. Social influence: compliance and conformity. Annu Rev Psychol, 55:591–621, 2004. D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, and S. Suri. Feedback effects between similarity and social influence in online communities. In KDD’08, pages 160–168, 2008. P. W. Eastwick and W. L. Gardner. Is it a game? evidence for social influence in the virtual world. Social Influence, 4(1):18–32, 2009. S. M. Elias and A. R. Pratkanis. Teaching social influence: Demonstrations and exercises from the discipline of social psychology. Social Influence, 1(2):147–162, 2006. T. L. Fond and J. Neville. Randomization tests for distinguishing social influence and homophily effects. In WWW’10, 2010. M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring Networks of Diffusion and Influence. In KDD’10, pages 1019–1028, 2010. M. E. J. Newman. A measure of betweenness centrality based on random walks. Social Networks, 2005. D. J. Watts and S. H. Strogatz. Collective dynamics of ’small-world’ networks. Nature, pages 440–442, Jun 1998. J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In ICDM’05, pages 418–425, 2005.

Thank you! Collaborators: John Hopcroft, Jon Kleinberg, Chenhao Tan (Cornell) Jiawei Han (UIUC), Philip Yu (UIC) Jian Pei (SFU), Hanghang Tong (ASU) Tiancheng Lou (Google&Baidu), Jimeng Sun (GIT) Wei Chen, Ming Zhou, Long Jiang, Chi Wang, Yuxiao Dong (Microsoft) Yutao Zhang, Jing Zhang, Zhanpeng Fang, Zi Yang, Sen Wu, etc. (THU) Jie Tang, KEG, Tsinghua U, http://keg.cs.tsinghua.edu.cn/jietang Download all data & Codes, http://arnetminer.org/data http://arnetminer.org/data-sna