Download presentation
Presentation is loading. Please wait.
Published byAubrie Barraclough Modified over 9 years ago
1
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1
2
Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 2
3
Background Homogeneous networks One type of objects One type of links Examples: Friendship network in Facebook Link prediction in homogeneous networks Predict whether a link between two objects will appear in the future, according to: Topological feature of the network Attribute feature of the objects (usually cannot be fully obtained) 3
4
Motivation In reality, heterogeneous networks are ubiquitous Multiple types of objects Multiple types of links Examples: bibliographic network movie network From link prediction to relationship prediction A relationship between two objects could be a composition of two or more links E.g., two authors have a co-author relationship if and only if they have co- written a paper Need to re-design topological features in heterogeneous information network Our goal: Study the topological features in heterogeneous networks in predicting the co-author relationship building 4
5
Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 5
6
Heterogeneous Information Networks 6
7
The DBLP Bibliographic Network The underneath meta structure is the network schema: 7
8
Co-author Relationship Prediction 8
9
Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 9
10
The PathPredict Model PathPredict: meta path-based relationship prediction model Propose meta path-based topological features in HIN Topological features used in homogeneous networks cannot be directly used Using logistic regression-based supervised learning methods to learn the coefficients associated with each feature 10
11
Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 11
12
Existing Topological Measure in Homogeneous Networks 12
13
Meta Path-based Topological Features 13
14
Meta Paths for Co-authorship Prediction in DBLP 14 List of all the meta paths between authors under length 4
15
Four Meta Path-based Measures 15
16
Example A-P-V-P-A meta path PC(J,M) = 7 NPC(J,M) = (7+7)/(7+9) RW(J,M) = ½ SRW(J,M) = ½ + 1/16 16
17
Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 17
18
Supervised Learning Framework Training: T0-T1 time framework T0: feature collection (x) T1: label of relationship collection (y) Testing: T0’-T1’ time framework, which may have a shift of time compared with training stage 18
19
Prediction Model 19
20
Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 20
21
Experiment Setting 21
22
Homogeneous measures vs. heterogeneous measures Homogeneous measures: Only consider co-author sub-network : common neighbor; rooted PageRank Consider the whole network and mix all types together: total path count Heterogeneous measure: Heterogeneous path count Heterogeneous Path Count produces the best accuracy! 22
23
Over Four Datasets 23
24
Compare among Different Heterogeneous Measures Normalized path count is slightly better and the hybrid measure that combines all measures is the best 24
25
Over four datasets 25
26
Impacts of Collaboration Frequency on Different Measures Symmetric random walk is better for predicting frequent co-author relationship 26
27
Model Generalization Over Time Conclusion: Historical training can help predict future relationship building 27
28
Learned Significance for Each Topological Feature The co-attending venues and the shared co-authors are very critical in determining two authors’ future collaboration 28
29
29 Case Studies for Queries
30
Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 30
31
Conclusions Problem: Extend link prediction problem in homogeneous networks into relationship prediction in HIN, using co-authorship prediction as a case study Solution Propose meta path-based topological features and measures in HIN Using logistic regression-based supervised learning methods to learn the coefficients associated with each feature Results Hetero. measures beats homo. measures Hybrid measure beats single measures 31
32
Thank you! 32 Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.