Jie Tang Computer Science, Tsinghua

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Confluence: Conformity Influence in Large Social Networks
Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.
Modelling Paying Behavior in Game Social Networks Zhanpeng Fang +, Xinyu Zhou +, Jie Tang +, Wei Shao #, A.C.M. Fong *, Longjun Sun #, Ying Ding -, Ling.
1 Inferring User Demographics and Social Strategies in Mobile Social Networks Yuxiao Dong #, Yang Yang +, Jie Tang +, Yang Yang #, Nitesh V. Chawla # #
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
IJCAI Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh,
1 Yuxiao Dong *$, Jie Tang $, Sen Wu $, Jilei Tian # Nitesh V. Chawla *, Jinghai Rao #, Huanhuan Cao # Link Prediction and Recommendation across Multiple.
Graph Data Management Lab School of Computer Science , Bristol, UK.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Honglei Zhuang1, Jing Zhang2, George Brova1,
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Webpage Understanding: an Integrated Approach
1 Yuxiao Dong *, Jie Tang $, Tiancheng Lou #, Bin Wu &, Nitesh V. Chawla * How Long will She Call Me? Distribution, Social Theory and Duration Prediction.
Modelling Paying Behavior in Game Social Networks Zhanpeng Fang +, Xinyu Zhou +, Jie Tang +, Wei Shao #, A.C.M. Fong *, Longjun Sun #, Ying Ding -, Ling.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.
1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
Predicting Positive and Negative Links in Online Social Networks
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Department of Electrical Engineering and Computer Science Kunpeng Zhang, Yu Cheng, Yusheng Xie, Doug Downey, Ankit Agrawal, Alok Choudhary {kzh980,ych133,
1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.
1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
1 CoupledLP: Link Prediction in Coupled Networks Yuxiao Dong #, Jing Zhang +, Jie Tang +, Nitesh V. Chawla #, Bai Wang* # University of Notre Dame + Tsinghua.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
1 Zi Yang Tsinghua University Joint work with Prof. Jie Tang, Prof. Juanzi Li, Dr. Keke Cai, Jingyi Guo, Chi Wang, etc. July 21, 2011, CASIN 2011, Tsinghua.
1 Zi Yang Tsinghua University Joint work with Prof. Jie Tang, Prof. Juanzi Li, Dr. Keke Cai, Jingyi Guo, Chi Wang, etc. July 21, 2011, CASIN 2011, Tsinghua.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Jie Tang Computer Science, Tsinghua
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
The Role of Optimal Distinctiveness and Homophily in Online Dating
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook By: Lars Backstrom - Facebook Inc, Jon Kleinberg.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Learning Triadic Influence in Large Social Networks
Cross-lingual Knowledge Linking Across Wiki Knowledge Bases
Social Role-Aware Emotion Contagion in Image Social Networks
Collective Network Linkage across Heterogeneous Social Platforms
CIKM Competition 2014 Second Place Solution
Extra Tree Classifier-WS3 Bagging Classifier-WS3
CIKM Competition 2014 Second Place Solution
Postdoc, School of Information, University of Arizona
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Jiawei Han Department of Computer Science
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Example: Academic Search
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
GANG: Detecting Fraudulent Users in OSNs
Actively Learning Ontology Matching via User Interaction
“The Spread of Physical Activity Through Social Networks”
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Modeling Topic Diffusion in Scientific Collaboration Networks
Presentation transcript:

Jie Tang Computer Science, Tsinghua University @WWW’2017 Computational Models for Social Network Analysis —mining big social networks (Part I: User modeling) Jie Tang Computer Science, Tsinghua University @WWW’2017

Roadmap BIG Networks User Tie Structure Heterogeneous Dynamic Big&Big data User Tie Structure Heterogeneous Micro Macro tie Influence Dynamic - User Modeling - Demographics - Social Role - Social Tie/Link - Homophily - Social Influence - Triad Formation - Community - Group Behavior Big&Big social Social Theories Graph Theories BIG Networks

Roadmap BIG Network User Tie Structure Heterogeneous Dynamic Big&Big data User Tie Structure Heterogeneous Micro Macro tie Influence Dynamic - User Modeling - Demographics - Social Role - Social Tie/Link - Homophily - Social Influence - Triad Formation - Community - Group Behavior Big&Big social Social Theories Graph Theories BIG Network

User Modeling—Demographics and social strategies Did you know: As of 2014, there are 7.3 billion mobile users. Users average 22 calls, 23 messages, and 110 status checks per day. male Less friends More stable female Young Senior 2x more social connections 4x more opposite-gender circles have than [1] Yuxiao Dong, Yang Yang, Jie Tang, Yang Yang, Nitesh V. Chawla. Inferring User Demographics and Social Strategies in Mobile Social Networks. KDD’14, pages 15-24. (Report by United Nations)

Our Data Read-world large mobile network data[1] Two networks: An anonymous country No communication content. Aug. 2008 – Sep. 2008. > 7 million mobile users + demographic information. > 1 billion communication records (call and message). Two networks: Network #nodes #edges CALL 7,440,123 32,445,941 SMS 4,505,958 10,913,601 [1] J.P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, A. L. Barabasi. Structure and tie strengths in mobile communication networks. PNAS 2007.

Valley and reversion at 38-40 years old Ego Network Peak at 22 years old Valley and reversion at 38-40 years old Correlations between user demographics and network properties. Young people are very active in broadening their social circles, while seniors have the tendency to maintain small but close connections.

Demographic Homophily People tend to communicate with others of both similar gender and age, i.e., demographic homophily.

Social Triad People expand both same-gender and opposite-gender social groups during the dating and reproductively active period.

Social Triad vs. People’s social attention to opposite-gender groups quickly disappears, and the insistence on same-gender social groups lasts for a lifetime.

Social Tie Strength vs. Color: #calls / per month Interactions between two young opposite-gender people are much more frequent than those between young same-gender people.

Social Tie Strength vs. When people become mature, reversely, same-gender interactions are more frequent than those between opposite-gender users.

Social Tie Strength vs. Cross-generation interactions between two females are more frequent than those between two males or one male and one female.

Null Model Users’ gender and age are randomly shuffled Randomly shuffle 10,000 times x: empirical result from real data 𝑥 : shuffled results 𝜇 𝑥 : the average of shuffled data 𝜎( 𝑥 ): the standard deviation of shuffled data 𝒛 𝒙 : z-score 𝑧 𝑥 = 𝑥−𝜇( 𝑥 ) 𝜎( 𝑥 )

Social Triad 𝑥: empirical result from real data

Social Triad 𝜇 𝑥 : the average of shuffled data

Social Triad 𝒛 𝒙 : z-score |z-score| > 3.3 (p < 0.001) is considered to be extremely statistically significant.

User Modeling—social strategies across the lifespan male Less friends More stable female Young Senior 2x more social connections 4x more opposite-gender circles have than more friends same-gender fewer friends only same-gender opposite-gender closed circles [1] Yuxiao Dong, Yang Yang, Jie Tang, Yang Yang, Nitesh V. Chawla. Inferring User Demographics and Social Strategies in Mobile Social Networks. KDD’14, pages 15-24. (Report by United Nations)

Demographic Prediction Infer Users’ Gender Y and Age Z Separately. Model correlations between gender Y and attributes X; Model correlations between age Z and attributes X; bag of labels Y X features gender P(Y | X) Z X features age P(Z | X) bag of instances bag of instances

Demographic Prediction Infer Users’ Gender Y and Age Z Simultaneously. Model correlations between gender Y and attributes X, Network G and Y; Model correlations between age Z and attributes X, Network G and Z; Model interrelations between Y and Z; Y X features gender P(Y, Z | G, X) Z age

Demographic Prediction Infer Users’ Gender Y and Age Z Simultaneously. Model correlations between gender Y and attributes X, Network G and Y; Model correlations between age Z and attributes X, Network G and Z; Model interrelations between Y and Z; Input: G = (VL, VU, E, YL, ZL), X Output: f(G, X)(YU, ZU) Gender Y: Male (55%) / Female (45%) Age Z: Young (18-24) / Young-Adult (25-34) / Middle-Age (35-49) / Senior (>49)

WhoAmI Method Joint Distribution: Triadic factor h() Dyadic factor g() Modeling social strategies on social triad Modeling social strategies on social tie Triadic factor h() Dyadic factor g() Modeling interrelations between gender and age Random variable Z: Age Random variable Y: Gender Attribute factor f() Modeling social strategies on ego networks Joint Distribution: Code is available at: http://arnetminer.org/demographic

WhoAmI: Experiments Data: active users (#contacts >=5 in two months) >1.09 million users in CALL >304 thousand users in SMS 50% as training data 50% as test data

WhoAmI: Experiments Baselines: Evaluation Metrics: LRC: Logistic Regression SVM: Support Vector Machine NB: Naïve Bayes RF: Random Forest BAG: Bagged Decision Tree RBF: Gaussian Radial Basis NN FGM: Factor Graph Model DFG (WhoAmI) Evaluation Metrics: Weighted Precision Weighted Recall Weighted F1 Measure Accuracy

Results

Summary Predictability of User Demographic Profiles The proposed WhoAmI (DFG) outperforms baselines by up to 10% in terms of F1-Meausre. We can infer 80% of users’ gender from the CALL network We can infer 73% of users’ age from the SMS network The phone call behavior reveals more user gender than text messaging The text messaging behavior reveals more user age than phone call

Generalization Can we generalize the method to other networks?

Inferring Gender in AMiner https://aminer.org/gender An interesting API Our Method FGNL is a baseline method [1] Xiaotao Gu, Hong Yang, Jie Tang, and Jing Zhang. Web User Profiling using Data Redundancy. ASONAM'16. (Best Student Paper Runner-up)

Addressing User Modeling as an Integration problem —Beyond extraction/prediction

However, the extracted information is correct but not precise… Model user profiles Homepage A common method: Finding the homepage (relevant pages) and extract profile attributes. However, the extracted information is correct but not precise…

User profiles are distributed… Some information goes out of date… Many information is semi-structured—the key is not extraction. Wikipedia Homepage LinkedIn AMiner

Connecting Multiple Networks Identifying users from multiple heterogeneous networks and integrating semantics from the different networks together. LinkedIn Wikipedia Jeannette Wing Jeannette Wing is a fundamental issue in many applications Google Scholar AMiner

Considering the networks…

Local vs. Global consistency Given three networks, AMiner

Local vs. Global consistency Local matching: matching users by profiles AMiner Pairwise similarity features Username similarity and uniqueness Profile content similarity Ego network similarity Social status Local consistency Energy function

Local vs. Global consistency Network matching: matching users’ ego networks AMiner Network matching Local consistency Encourage “neighborhood-preserving matching”

Network Matching Network matching: matching users’ ego networks Input networks Matching graph Energy function

Local vs. Global consistency Global consistency: matching users by avoiding global inconsistency AMiner Network matching Local consistency Global inconsistency Avoid “global inconsistency”

Local vs. Global consistency Global consistency: matching users by avoiding global inconsistency AMiner Network matching Local consistency x Global inconsistency x x x x x

Local vs. Global consistency Global consistency: matching users by avoiding global inconsistency AMiner Network matching Local consistency x Inconsistent! Global inconsistency x x x x x

Avoid global inconsistency Input networks Matching graph Energy function

COSNET: Connecting Social Networks with Local and Global Consistency Input: G={G1, G2, …, Gm}, with Gk=(Vk, Ek, Rk) Formalization: X={xi}, all possible pairwise matchings and each corresponds to COSNET: an energy-based model [1] Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip Yu. COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency. KDD’15, page 1485-1494.

Model Construction Objective function by combining all the energy functions

Model Learning Max-margin learning As the original problem is intractable, we use Lagrangian relaxation to decompose the original objective function into a set of easy-to-solve sub-problems

Model Learning (cont.) Dual decomposition This provides a lower bound to the original function The resulting objective function is convex and non-differentiable, and can be solved by projected sub-gradient method

Model Learning (cont.)

Results

Researcher Profile LinkedIn VideoLectures USPTO

Data Sets SNS Academia Dataset Network #Users #Relationships Twitter 40,171,624 1,468,365,182 LiveJournal 3,017,286 87,037,567 Flickr 215,495 9,114,557 Last.fm 136,420 1,685,524 MySpace 854,498 6,489,736 Academia LinkedIn 2,985,414 25,965,384 ArnetMiner 1,053,188 3,916,907 VideoLectures 11,178 786,353 Ground Truth Thank Shlomo Berkovsky, Terence Chen, and Dali Kaafar for sharing the SNS data with ground-truth [28]. In Academia, we chose 10,000 authors from ArnetMiner who were connected with LinkedIn profiles and VideoLectures profiles as the ground truth. Data&codes: https://aminer.org/cosnet

Connecting AMiner with … LinkedIn and VideoLectures Name-match: match name only; SVM: use classifier to identify the same user; MNA: an optimization method; SiGMa: local propagation; COSNET: our method; COSNET-: w/o global consistency. Data&codes: https://aminer.org/cosnet

Connecting Social Media Sites Twitter, LiveJournal, Last.fm, Flickr, MySpace Name-match: match name only; SVM: use classifier to identify the same user; MNA: an optimization method; SiGMa: local propagation; COSNET: our method; COSNET-: w/o global consistency. Data&codes: https://aminer.org/cosnet

Effects of Global Consistency COSNET-: w/o global consistency. Academia Collection SNS Collection

Big Network Analysis BIG Network User Tie Topology Heterogeneous data User Tie Topology Heterogeneous Micro Macro tie Influence Dynamic - User Modeling - Demographics - Social Role - Social Tie/Link - Homophily - Social Influence - Triad Formation - Community - Group Behavior Big&Big social Social Theories Graph Theories BIG Network

Thank you! Collaborators: John Hopcroft, Jon Kleinberg, Chenhao Tan (Cornell) Jiawei Han (UIUC), Philip Yu (UIC) Jian Pei (SFU), Hanghang Tong (ASU) Tiancheng Lou (Google&Baidu), Jimeng Sun (GIT) Wei Chen, Ming Zhou, Long Jiang, Chi Wang, Yuxiao Dong (Microsoft) Yutao Zhang, Jing Zhang, Zhanpeng Fang, Zi Yang, Sen Wu, etc. (THU) Jie Tang, KEG, Tsinghua U, http://keg.cs.tsinghua.edu.cn/jietang Download all data & Codes, http://arnetminer.org/data http://arnetminer.org/data-sna