1 Inferring User Demographics and Social Strategies in Mobile Social Networks Yuxiao Dong #, Yang Yang +, Jie Tang +, Yang Yang #, Nitesh V. Chawla # #

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Confluence: Conformity Influence in Large Social Networks
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Strong and Weak Ties Chapter 3, from D. Easley and J. Kleinberg book.
Modelling Paying Behavior in Game Social Networks Zhanpeng Fang +, Xinyu Zhou +, Jie Tang +, Wei Shao #, A.C.M. Fong *, Longjun Sun #, Ying Ding -, Ling.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
1 Yuxiao Dong *$, Jie Tang $, Sen Wu $, Jilei Tian # Nitesh V. Chawla *, Jinghai Rao #, Huanhuan Cao # Link Prediction and Recommendation across Multiple.
CIKM’2008 Presentation Oct. 27, 2008 Napa, California
Graph Data Management Lab School of Computer Science , Bristol, UK.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Ensemble Learning: An Introduction
Chapter 5 Data mining : A Closer Look.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
1 Yuxiao Dong *, Jie Tang $, Tiancheng Lou #, Bin Wu &, Nitesh V. Chawla * How Long will She Call Me? Distribution, Social Theory and Duration Prediction.
Modelling Paying Behavior in Game Social Networks Zhanpeng Fang +, Xinyu Zhou +, Jie Tang +, Wei Shao #, A.C.M. Fong *, Longjun Sun #, Ying Ding -, Ling.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
by B. Zadrozny and C. Elkan
Exploring the dynamics of social networks Aleksandar Tomašević University of Novi Sad, Faculty of Philosophy, Department of Sociology
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Jure Leskovec, CMU Eric Horwitz, Microsoft Research.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.
1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
Tie Strength, Embeddedness & Social Influence: Evidence from a Large Scale Networked Experiment Sinan Aral, Dylan Walker Presented by: Mengqi Qiu(Mendy)
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
CONCLUSION There was a significant main effect of dyadic gender composition on dyadic tie strength, such that same-gender dyads demonstrated stronger ties.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
1 Modeling Coherent Mortality Forecasts using the Framework of Lee-Carter Model Presenter: Jack C. Yue /National Chengchi University, Taiwan Co-author:
Predicting Positive and Negative Links in Online Social Networks
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Networks and Surrounding Contexts Chapter 4, from D. Easley and J. Kleinberg book.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
User-Centric Data Dissemination in Disruption Tolerant Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering Pennsylvania State University.
Online Social Networks and Media
1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
Classification Ensemble Methods 1
Quantification in Social Networks Letizia Milli, Anna Monreale, Giulio Rossetti, Dino Pedreschi, Fosca Giannotti, Fabrizio Sebastiani Computer Science.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Jure Leskovec (Stanford), Daniel Huttenlocher and Jon Kleinberg (Cornell)
1 CoupledLP: Link Prediction in Coupled Networks Yuxiao Dong #, Jing Zhang +, Jie Tang +, Nitesh V. Chawla #, Bai Wang* # University of Notre Dame + Tsinghua.
Chi-Square Chapter 14. Chi Square Introduction A population can be divided according to gender, age group, type of personality, marital status, religion,
PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
A Simple Approach for Author Profiling in MapReduce
The Role of Optimal Distinctiveness and Homophily in Online Dating
Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook By: Lars Backstrom - Facebook Inc, Jon Kleinberg.
Learning Triadic Influence in Large Social Networks
Social Role-Aware Emotion Contagion in Image Social Networks
Jie Tang Computer Science, Tsinghua
Dieudo Mulamba November 2017
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
GANG: Detecting Fraudulent Users in OSNs
Modeling IDS using hybrid intelligent systems
“The Spread of Physical Activity Through Social Networks”
Modeling Topic Diffusion in Scientific Collaboration Networks
Presentation transcript:

1 Inferring User Demographics and Social Strategies in Mobile Social Networks Yuxiao Dong #, Yang Yang +, Jie Tang +, Yang Yang #, Nitesh V. Chawla # # University of Notre Dame + Tsinghua University

2 Yuxiao Dong, Yang Yang, Jie Tang, Yang Yang, Nitesh V. Chawla. Inferring User Demographics and Social Strategies in Mobile Social Networks. KDD Did you know: As of 2014, there are 7.3 billion mobile phones, larger than the global population. Users average 22 calls, 23 messages, and 110 status checks per day. Younger Older have than 2x more social friends 4x more opposite-gender circles female More stable male Fewer friends

3 Big Mobile Data Real-world large-scale mobile data –An anonymous country. –No communication content. –Aug – Sep –> 7 million mobile users + demographic information. Gender: Male (55%) / Female (45%) Age: Young (18-24) / Young-Adult (25-34) / Middle-Age (35-49) / Senior (>49) –> 1 billion communication records (call and message). Two networks: Network#nodes#edges CALL7,440,12332,445,941 SMS4,505,95810,913,601

4 What We Do How do people communicate / interact with each other with mobile phones? –Infer human social strategies on demographics. To what extent can user demographic profiles be inferred from their mobile communication interactions? –Infer user demographics based on social strategies. Applications: –Viral marketing –Personalized services –User modeling –Customer churn warning –…–…

5 Infer human social strategies on demographics user demographics + mobile social network  social strategies

6 Social Strategy Human needs are defined according to the existential categories of being, having, doing, and interacting [1]. Two basic human needs [2] are to –Meet new people  Social needs. –Strengthen existing relationships  Social needs. Social strategies are used by people to meet social needs. –Human needs are constant across historical time periods. –However, the strategies by which these needs are satisfied change over time [1,3]. Barabasi and Dunbar [3] : –“Women are more focused on opposite-sex relationships than men during the reproductively active period of their lives.” … “As women age, their attention ships from their spouse to younger females---their daughters.” –“Human social strategies have more complex dynamics than previously assumed.” M.J. Piskorski. Social strategies that work. Harvard Business Review. Nov V. Palchykov, K. Kaski, J. Kertesz, A.-L. Barabasi, R. I. M. Dunbar. Sex differences in intimate relationships. Scientific Reports 2012.

7 Social Strategy We study demographic-based social strategy with respect to the micro-level network structures. –Ego network –Social tie –Social triad Male Female

8 Social Strategy: Ego Network Correlations between user demographics and network properties.

9 Social Strategy: Ego Network Correlations between user demographics and network properties. Social Strategies: Young people are active in broadening their social circles, while seniors have the tendency to maintain small but close connections. 2 times

10 Social Strategy: Ego Network In your mobile phone contact list, do you have more female or male friends? In your mobile phone contact list, do you have more female or male friends?

11 Social Strategy: Ego Network X: age of central user. Y: age of friends. Positive Y: female friends; Negative Y: male friends; Spectrum: distribution Social Strategies: People tend to communicate with others of both similar gender and age, i.e., demographic homophily. Female friends’ age Male friends’ age

12 Social Strategy: Social Tie “Social networks based on dyadic relationships are fundamentally important for understanding of human sociality.” [1] Social tie strength is defined by the frequency of communications (calls, messages) [2]. How frequently do you call your mother vs. your significant other? How frequently do you call your mother vs. your significant other?

13 Social Strategy: Social Tie X: age of one user. Y: age of the other user. Spectrum: #calls per month (a), (b), (c) are symmetric.

14 Social Strategy: Social Tie Social Strategies: Frequent cross-generation interactions are maintained to bridge age gaps. X: age of one user. Y: age of the other user. Spectrum: #calls per month (a), (b), (c) are symmetric. N PQ M,N,P,Q: 10~15 calls per month are made between parents and children. M

15 Social Strategy: Social Tie Social Strategies: Young male maintain more frequent and broader social connections than young females. E F E vs. F: E: Male: ±5 years old interactions F: Female: only same-age interactions. “Brother” phenomenon X: age of one user. Y: age of the other user. Spectrum: #calls per month (a), (b), (c) are symmetric.

16 Social Strategy: Social Tie Social Strategies: Opposite-gender interactions are much more frequent than those between young same-gender users. E F G E,F vs. G: G: f-m: >30 calls per months E/F: m-m or f-f: 10~15 calls X: age of one user. Y: age of the other user. Spectrum: #calls per month (a), (b), (c) are symmetric.

17 Social Strategy: Social Tie Social Strategies: When people become mature, reversely, same-gender interactions are more frequent than those between opposite-gender users. H I J H,I vs. J: X: age of one user. Y: age of the other user. Spectrum: #calls per month (a), (b), (c) are symmetric.

18 Social Strategy: Social Triad Social triad is one of the simplest grouping of individuals that can be studied and is mostly investigated by microsociology [1]. 1.D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge U. Press How do people maintain their social triadic relationships across their lifetime?

19 Social Strategy: Social Triad X: minimum age of 3 users. Y: maximum age of 3 users. Spectrum: distribution

20 Social Strategy: Social Triad Social Strategies: People expand both same-gender and opposite- gender social groups during the dating and reproductively active period. P MN Q M,N,P,Q: Intense red areas X: minimum age of 3 users. Y: maximum age of 3 users. Spectrum: distribution

21 Social Strategy: Social Triad E,H vs. F,G: #same-gender triads are ~6 times more than #opposite-gender triads. Social Strategies: People’s attention to opposite-gender groups quickly disappears, and the insistence and social investment on same-gender social groups lasts for a lifetime. E F GH X: minimum age of 3 users. Y: maximum age of 3 users. Spectrum: distribution

22 Infer user demographics based on social strategies social strategies + mobile social network  user demographics

23 Problem: Demographic Prediction Gender or Age Classification –Infer users’ gender Y and age Z separately. –Model correlations between gender Y and attributes X; –Model correlations between age Z and attributes X; Input: G = (V L, V U, E, Y L ), X Input: G = (V L, V U, E, Y L ), X Output: f(G, X)  (Y U ) Output: f(G, X)  (Y U ) Input: G = (V L, V U, E, Z L ), X Input: G = (V L, V U, E, Z L ), X Output: f(G, X)  ( Z U ) Output: f(G, X)  ( Z U ) Miss the interrelation between Y and Z !

24 Problem: Demographic Prediction Double Dependent-Variable Classification –Infer users’ gender Y and age Z simultaneously. –Model correlations between gender Y and attributes X; –Model correlations between age Z and attributes X; –Model interrelations between Y and Z; Gender: –Male (55%) / Female (45%) Age: –Young (18-24) / Young-Adult (25-34) / Middle-Age (35-49) / Senior (>49) Input: G = (V L, V U, E, Y L, Z L ), X Input: G = (V L, V U, E, Y L, Z L ), X Output: f(G, X)  (Y U, Z U ) Output: f(G, X)  (Y U, Z U )

25 WhoAmI Method ---A double dependent-variable factor graph Attribute factor f() Dyadic factor g()Triadic factor h() Random variable Y: Gender Random variable Z: Age Joint Distribution: Code is available at: Modeling social strategies on social ego Modeling interrelations between gender and age Modeling social strategies on social tie Modeling social strategies on social triad

26 WhoAmI: Model Initialization Attribute factor: Joint Distribution: Dyadic factor: Triadic factor: Interrelations between gender Y & age Z Code is available at:

27 WhoAmI: Objective Function Objective function: Model learning: gradient descent Circles?  LBP [1] 1.K. P. Murphy, Y. Weiss, M. I. Jordan. Loopy Belief Propagation for Approximate Inference: An Empirical Study. UAI’99. Code is available at:

28 Experiment Data: active users (#contacts >=5 in two months) >1.09 million users in CALL >304 thousand users in SMS 50% as training data 50% as test data Data: active users (#contacts >=5 in two months) >1.09 million users in CALL >304 thousand users in SMS 50% as training data 50% as test data

29 Experiment Baselines: LRC: Logistic Regression SVM: Support Vector Machine NB: Naïve Bayes RF: Random Forest BAG: Bagged Decision Tree RBF: Gaussian Radial Basis Function Neural Network FGM: Factor Graph Model DFG: WhoAmI: Double Dependent-Variable Factor Graph Baselines: LRC: Logistic Regression SVM: Support Vector Machine NB: Naïve Bayes RF: Random Forest BAG: Bagged Decision Tree RBF: Gaussian Radial Basis Function Neural Network FGM: Factor Graph Model DFG: WhoAmI: Double Dependent-Variable Factor Graph

30 Experiment Evaluation Metrics: Weighted Precision Weighted Recall Weighted F1 Measure Accuracy Evaluation Metrics: Weighted Precision Weighted Recall Weighted F1 Measure Accuracy

31 Experiment The proposed WhoAmI (DFG) outperforms baselines by up to 10% in terms of F1. We can infer 80% of the users’ GENDER in the CALL network correctly. The CALL behaviors reveal more users’ GENDER information than SMS. We can infer 73% of the users’ AGE in the SMS network correctly. The SMS behaviors reveal more users’ AGE information than CALL.

32 Experiment: Results DFG-d: stands for ignoring the interrelations between gender and age. DFG-df: stands for further ignoring tie features. DFG-dc: stands for further ignoring triad features. DFG-dcf: stands for further ignoring tie and triad features. The positive effects of interrelations between gender and age. Social Triad features are more powerful for inferring users’ gender. Social Tie features are more powerful for inferring users’ age.

33 Conclusion Unveil the demographic-based social strategies used by people to meet their social needs: Propose WhoAmI, a Double Dependent-Variable Factor Graph, for inferring users’ genders and ages simultaneously. Demonstrate the proposed WhoAmI method in a large-scale mobile social network. female More stable male Fewer friends Younger Older

34 Acknowledgements Army Research Laboratory U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA) National High-Tech R&D Program Natural Science Foundation of China National Basic Research Program of China

35 Inferring User Demographics and Social Strategies in Mobile Social Networks Yuxiao Dong #, Yang Yang +, Jie Tang +, Yang Yang #, Nitesh V. Chawla # # University of Notre Dame + Tsinghua University Code is available at: Thank You!

36 Big Network Data 1.26 billion users 700 billion minutes/month 1.26 billion users 700 billion minutes/month 280 million users 80% of users are 80-90’s 280 million users 80% of users are 80-90’s 560 million users influencing our daily life 560 million users influencing our daily life 800 million users ~50% revenue from network life 800 million users ~50% revenue from network life 555 million users.5 billion tweets/day 555 million users.5 billion tweets/day 79 million users per month 9.65 billion items/year 79 million users per month 9.65 billion items/year 500 million users 35 billion on 11/ million users 35 billion on 11/11

37 Big Network Data 280 million users 80% of users are 80-90’s 280 million users 80% of users are 80-90’s 560 million users influencing our daily life 560 million users influencing our daily life 800 million users ~50% revenue from network life 800 million users ~50% revenue from network life 555 million users.5 billion tweets/day 555 million users.5 billion tweets/day 79 million users per month 9.65 billion items/year 79 million users per month 9.65 billion items/year 500 million users 35 billion on 11/ million users 35 billion on 11/ billion users 700 billion minutes/month 1.26 billion users 700 billion minutes/month $19 billion acquisition Feb. 2014

International Telecommunications Union (ITU) at 2013 Mobile World Congress. Big Mobile Network Data 1.26 billion users 700 billion minutes/month 1.26 billion users 700 billion minutes/month 280 million users 80% of users are 80-90’s 280 million users 80% of users are 80-90’s 560 million users influencing our daily life 560 million users influencing our daily life 800 million users ~50% revenue from network life 800 million users ~50% revenue from network life 555 million users.5 billion tweets/day 555 million users.5 billion tweets/day 79 million users per month 9.65 billion items/year 79 million users per month 9.65 billion items/year 500 million users 35 billion on 11/ million users 35 billion on 11/ billion mobile devices in 2014 [1] >100% of global population 7.3 billion mobile devices in 2014 [1] >100% of global population

39 Big Mobile Network Data In 2013, 97% of adults have a mobile phone in the US [1] –made 3 billion phone calls per day –sent 6 billion text messages per day This talk (15 mins): –21 million calls & 42 million messages On average, in one day each mobile user in the US [2] –makes, receives or avoids 22 phone calls –sends or receives text messages 23 times –checks her/his phone 110 times

40 Related work Previous work on mobile social networks mainly focuses on macro-level models [1,2]. –No Demographics. Reality Mining [3] –The friendship network of 100 specific users (student of faculty in MIT). –Demographics + Human interactions. The 2012 Nokia Mobile Data Challenge [4] –Infer user demographics by using communication records of 200 users. 1.J.P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, A.-L. Barabasi. Structure and tie strengths in mobile communication networks. PNAS M. Seshadri, S. Machiraju, A. Sridharan, J. Bolot, C. Faloutsos, J. Leskovec. Mobile call graphs: Beyond power-law and lognormal distributions. KDD’

41 WhoAmI: Distributed Learning Slave Compute local gradient via random sampling Slave Compute local gradient via random sampling Master Global update Master Global update Graph Partition by Locations Master-Slave Computing Inevitable loss of correlation factors! 1.Jie Tang, Sen Wu, Jimeng Sun. Confluence: Conformity influence in large social networks. KDD’13.

42 Experiment: Features Given one node v and its ego network: –Individual feature: Individual attribute: degree, neighbor connectivity, clustering coefficient, embeddedness and weighted degree. –Friend feature: Friend attribute: # of connections to female/male, young/young-adult/middle-age/senior friends (from labeled friends). Dyadic factor: both labeled and unlabeled friends for social tie structures in v’s ego network. –Circle feature: Circle attribute: # of demographic triads, i.e., v-FF, v-FM, v-MM; v-AA, v-AB, v-AC, v-AD, v-BB, v-BC, v-BD, v-CC, v-CD, v-DD. (A/B/C/C denote the young/young-adult/middle-age/senior) Triadic factor: both labeled and unlabeled friends for social triad structures in v’s ego network. LCR/SVM/NB/RF/Bag/RBF: –Individual/Friend/Circle Attributes FGM/DFG –Individual/Friend/Circle Attributes –Structure feature: Dyadic factors –Structure feature: Triadic factors ? ? ? ? ?

43 Experiment: Results Performance of demographic prediction with different percentage of labeled data GenderAge

44 Social Strategy: Ego Network same-generation friends same-generation friends Social Strategies: The young put increasing focus on the same generation, but decrease it after entering middle-age.

45 Social Strategy: Ego Network same-generation friends same-generation friends older-generation friends older-generation friends Social Strategies: The young put decreasing focus on the older generation across their lifespans.

46 Social Strategy: Ego Network same-generation friends same-generation friends older-generation friends older-generation friends younger-generation friends younger-generation friends Social Strategies: The middle-age people devote more attention on the younger generation even along with the sacrifice of homophily.