Joint Label Inference in Networks

Slides:



Advertisements
Similar presentations
CS5038 The Electronic Society Lecture: Social Networking Lecture Outline Social Networking Service Social Networking Sites –Bebo –Friendster –MySpace Social.
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Unsupervised Learning
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Missing values problem in Data Mining
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Comments on Hierarchical models, and the need for Bayes Peter Green, University of Bristol, UK IWSM, Chania, July 2002.
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
NetSci07 May 24, 2007 Entity Resolution in Network Data Lise Getoor University of Maryland, College Park.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Modeling Relationship Strength in Online Social Networks Rongjian Xiang 1, Jennifer Neville 1, Monica Rogati 2 1 Purdue University, 2 LinkedIn WWW 2010.
Jie Gao Joint work with Amitabh Basu*, Joseph Mitchell, Girishkumar Stony Brook Distributed Localization using Noisy Distance and Angle Information.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Computing Trust in Social Networks
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Joint Inference of Multiple Label Types in Large Networks Deepayan Chakrabarti Stanislav Funiak Jonathan.
Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
電管碩一 R 凌伊亭 Social Media Use In a Mobile Broadband Environment : Examination of Determinants of Twitter and Facebook Use International Journal of.
Finding dense components in weighted graphs Paul Horn
Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Daniele Quercia, Michal Kosinski, David Stillwell, Jon Crowcroft COMP4332 Wong Po.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
CS5038 The Electronic Society Lecture: Social Networking Lecture Outline Social Networking Service Social Networking Sites –Bebo –Friendster –MySpace –Facebook.
User Interests Imbalance Exploration in Social Recommendation: A Fitness Adaptation Authors : Tianchun Wang, Xiaoming Jin, Xuetao Ding, and Xiaojun Ye.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 N -Queens via Relaxation Labeling Ilana Koreh ( ) Luba Rashkovsky ( )
1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver.
The Matrix: Using Intermediate Features to Classify and Predict Friends in a Social Network Michael Matczynski Status Report April 14, 2006.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Today Ensemble Methods. Recap of the course. Classifier Fusion
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
MS Sequence Clustering
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Paper: A. Kapoor, H. Ahn, and R. Picard, “Mixture of Gaussian Processes for Combining Multiple Modalities,” MIT Media Lab Technical Report, Paper.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
Alan Mislove Bimal Viswanath Krishna P. Gummadi Peter Druschel.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Learning Profiles from User Interactions
Experiments, Simulations Confidence Intervals
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook By: Lars Backstrom - Facebook Inc, Jon Kleinberg.
Author: Konstantinos Drakos Journal: Economica
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Discover How Your Business Can Benefit from a Facebook Fanpage
Discover How Your Business Can Benefit from a Facebook Fanpage
Collective Network Linkage across Heterogeneous Social Platforms
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
“Bayesian Identity Clustering”
Joint Inference of Multiple Label Types in Large Networks
COS 518: Advanced Computer Systems Lecture 12 Mike Freedman
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
RECOMMENDER SYSTEMS WITH SOCIAL REGULARIZATION
Data Understanding, Cleaning, Transforming
WHO ARE YOU?...HONESTLY! A study on inferring missing attributes in social networks Zeinab Mahdavifar Advisor: Prof. Martine De Cock.
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Regression Forecasting and Model Building
Hierarchical Relational Models for Document Networks
Multiple Regression – Split Sample Validation
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
GhostLink: Latent Network Inference for Influence-aware Recommendation
--WWW 2010, Hongji Bao, Edward Y. Chang
STAT 490DS1 Data Quality.
Presentation transcript:

Joint Label Inference in Networks Stanislav Funiak (sfuniak@cerebras.net) Jonathan Chang (slycoder@gmail.com) Sofus A. Macskassy (sofmac@gmail.com) Joint Label Inference in Networks Deepayan Chakrabarti (deepay@utexas.edu)

Profile Inference A complete profile is a boon: Profile: Hometown: Palo Alto High School: Gunn College: Stanford Employer: Facebook Current city: Sunnyvale Hobbies, Politics, Music, … A complete profile is a boon: People are easily searchable Tailored news recommendations Group recommendations Ad targeting (especially local) How can we fill in missing profile fields? ?

Previous Work u Label Propagation [Zhu+/02] “Propagate” labels through the network Aggregate hometowns of friends Iterate until convergence Repeat for current city, college, and all other label types H = Palo Alto (…) MPK (…) Atlanta (…) H = Palo Alto u v1 v2 v3 v4 v5 H = Palo Alto (0.5) MPK (0.25) Atlanta (0.25) H = Palo Alto H = ? H = MPK H = Atlanta

Previous Work Random walks [Talukdar+/09, Baluja+/08] Statistical Relational Learning [Lu+/03, Macskassy+/07] Relational Dependency Networks [Neville+/07] Latent models [Palla+/12] Either: too generic; require too much labeled data; do not handle multiple label types; are outperformed by label propagation [Macskassy+/07]

Interactions between label types are not considered Problem H = Kolkata CC = Bangalore CC = Austin u H = Kolkata H = ? CC = ? CC = Bangalore H = Kolkata Interactions between label types are not considered

The EdgeExplain Model Explain friendships using shared labels A friendship between two people is explained if: they share the same hometown OR current city OR high school OR college OR employer

We set H and CC so as to jointly explain all friendships The EdgeExplain Model H = Kolkata CC = Bangalore Hometown friends CC = Austin Current City friends u H = ? CC = ? H = Kolkata CC = Austin H = Kolkata We set H and CC so as to jointly explain all friendships

The EdgeExplain Model Latent profile for each person In practice, only some fields of some profiles are known Fill in missing profile entries to

#shared profile fields The EdgeExplain Model Diminishing returns from sharing many profile fields controls steepness #shared profile fields objective function

The EdgeExplain Model u objective function objective function #shared profile fields objective function #shared profile fields objective function u H = Kolkata CC = Austin H = Kolkata CC = Bangalore H = ? CC = ?

The EdgeExplain Model u objective function objective function #shared profile fields objective function #shared profile fields objective function 1 u H = Kolkata CC = Austin H = Kolkata CC = Bangalore H = Kolkata CC = ?

The EdgeExplain Model u objective function objective function #shared profile fields objective function #shared profile fields objective function Small gain with CC = Bangalore 2 1 u H = Kolkata CC = Austin H = Kolkata CC = Bangalore H = Kolkata CC = Bangalore

The EdgeExplain Model u objective function objective function #shared profile fields objective function #shared profile fields objective function 1 1 Larger gain with CC = Austin u H = Kolkata CC = Austin H = Kolkata CC = Bangalore H = Kolkata CC = Austin

The EdgeExplain Model This problem is combinatorial and difficult to solve Relaxation Labeling Solve a relaxed version with probabilistic profiles Variational Inference Maximize a lower bound on the objective Relaxation labeling works better in general Full comparison, and a hybrid method, in the journal paper We will show results with relaxation labeling

Experiments 1.1B users of the Facebook social network O(10M) labels Only public friendships and profiles Sparsify network For each person, keep links to top K closest friends by age Measure recall Did we get the correct label in our top prediction? Top-3? 5-fold cross-validation

Results (versus Label Propagation) Joint modeling helps most for employer Significant gains for high school and college as well Lift of EdgeExplain over Label Propagation Lift of EdgeExplain over Label Propagation Hometown College High school Employer Current city Hometown College Employer Current city High school Recall@1 Recall@3

Results (varying closest friends K) K=100 or K=200 closest friends is best K=400 hurts; these friendships are probably due to other factors Lift of EdgeExplain over K=20 Lift of EdgeExplain over K=20 Hometown College College Employer Hometown High school Employer Current city High school Current city Recall@1 Recall@3

Conclusions Assumption: each friendship needs only one reason Model: explain friendships via shared user profile attributes Results: up to 120% lift for recall@1 and 60% for recall@3 Extension to Twitter [C., Annals of Applied Stats., 2017] Each “follow” link has one reason The follower is interested in multiple topics The person being followed is perceived as an expert on one topic

Lift of EdgeExplain over α=0.1 Result (effect of α) High α is best  one reason per friendship is enough Lift of EdgeExplain over α=0.1 College Employer Hometown Current city High school

Profile Inference u Use the social network and the assumption of homophily Friendships form between “similar” people Infer missing labels to maximize similarity between friends H = Palo Alto E = Microsoft H = ? E = ? u v1 v2 v3 v4 v5 H = Palo Alto E = ? H = ? E = ? H = MPK E = FB H = Atlanta E = Google