Joint Label Inference in Networks

Joint Label Inference in Networks
Stanislav Funiak Jonathan Chang Sofus A. Macskassy Joint Label Inference in Networks Deepayan Chakrabarti

Profile Inference A complete profile is a boon: Profile:
Hometown: Palo Alto High School: Gunn College: Stanford Employer: Facebook Current city: Sunnyvale Hobbies, Politics, Music, … A complete profile is a boon: People are easily searchable Tailored news recommendations Group recommendations Ad targeting (especially local) How can we fill in missing profile fields? ?

Previous Work u Label Propagation [Zhu+/02]
“Propagate” labels through the network Aggregate hometowns of friends Iterate until convergence Repeat for current city, college, and all other label types H = Palo Alto (…) MPK (…) Atlanta (…) H = Palo Alto u v1 v2 v3 v4 v5 H = Palo Alto (0.5) MPK (0.25) Atlanta (0.25) H = Palo Alto H = ? H = MPK H = Atlanta

Previous Work Random walks [Talukdar+/09, Baluja+/08]
Statistical Relational Learning [Lu+/03, Macskassy+/07] Relational Dependency Networks [Neville+/07] Latent models [Palla+/12] Either: too generic; require too much labeled data; do not handle multiple label types; are outperformed by label propagation [Macskassy+/07]

Interactions between label types are not considered
Problem H = Kolkata CC = Bangalore CC = Austin u H = Kolkata H = ? CC = ? CC = Bangalore H = Kolkata Interactions between label types are not considered

The EdgeExplain Model Explain friendships using shared labels
A friendship between two people is explained if: they share the same hometown OR current city OR high school OR college OR employer

We set H and CC so as to jointly explain all friendships
The EdgeExplain Model H = Kolkata CC = Bangalore Hometown friends CC = Austin Current City friends u H = ? CC = ? H = Kolkata CC = Austin H = Kolkata We set H and CC so as to jointly explain all friendships

The EdgeExplain Model Latent profile for each person
In practice, only some fields of some profiles are known Fill in missing profile entries to

#shared profile fields
The EdgeExplain Model Diminishing returns from sharing many profile fields controls steepness #shared profile fields objective function

The EdgeExplain Model u objective function objective function
#shared profile fields objective function #shared profile fields objective function u H = Kolkata CC = Austin H = Kolkata CC = Bangalore H = ? CC = ?

#shared profile fields objective function #shared profile fields objective function 1 u H = Kolkata CC = Austin H = Kolkata CC = Bangalore H = Kolkata CC = ?

#shared profile fields objective function #shared profile fields objective function Small gain with CC = Bangalore 2 1 u H = Kolkata CC = Austin H = Kolkata CC = Bangalore H = Kolkata CC = Bangalore

#shared profile fields objective function #shared profile fields objective function 1 1 Larger gain with CC = Austin u H = Kolkata CC = Austin H = Kolkata CC = Bangalore H = Kolkata CC = Austin

The EdgeExplain Model This problem is combinatorial and difficult to solve Relaxation Labeling Solve a relaxed version with probabilistic profiles Variational Inference Maximize a lower bound on the objective Relaxation labeling works better in general Full comparison, and a hybrid method, in the journal paper We will show results with relaxation labeling

Experiments 1.1B users of the Facebook social network O(10M) labels
Only public friendships and profiles Sparsify network For each person, keep links to top K closest friends by age Measure recall Did we get the correct label in our top prediction? Top-3? 5-fold cross-validation

Results (versus Label Propagation)
Joint modeling helps most for employer Significant gains for high school and college as well Lift of EdgeExplain over Label Propagation Lift of EdgeExplain over Label Propagation Hometown College High school Employer Current city Hometown College Employer Current city High school

Results (varying closest friends K)
K=100 or K=200 closest friends is best K=400 hurts; these friendships are probably due to other factors Lift of EdgeExplain over K=20 Lift of EdgeExplain over K=20 Hometown College College Employer Hometown High school Employer Current city High school Current city

Conclusions Assumption: each friendship needs only one reason
Model: explain friendships via shared user profile attributes Results: up to 120% lift for and 60% for Extension to Twitter [C., Annals of Applied Stats., 2017] Each “follow” link has one reason The follower is interested in multiple topics The person being followed is perceived as an expert on one topic

Lift of EdgeExplain over α=0.1
Result (effect of α) High α is best  one reason per friendship is enough Lift of EdgeExplain over α=0.1 College Employer Hometown Current city High school

Profile Inference u Use the social network
and the assumption of homophily Friendships form between “similar” people Infer missing labels to maximize similarity between friends H = Palo Alto E = Microsoft H = ? E = ? u v1 v2 v3 v4 v5 H = Palo Alto E = ? H = ? E = ? H = MPK E = FB H = Atlanta E = Google

Joint Label Inference in Networks

Similar presentations

Presentation on theme: "Joint Label Inference in Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Joint Label Inference in Networks

Similar presentations

Presentation on theme: "Joint Label Inference in Networks"— Presentation transcript:

Similar presentations

About project

Feedback