Download presentation
Presentation is loading. Please wait.
Published byRafe Sherman Modified over 8 years ago
1
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang
2
What’s it all about? There’s a growing interest in Clustering a social network of people based on their social relationships and their participation in information networks. This paper makes use of the concept of social influence to improve the clustering quality. Social Influence studies how the impact of people’s activity /opinions propagating towards members of a social network, via direct and indirect social connections.
3
Keywords Graph Clustering Heterogeneous Network Kernels Social Influence
4
Today’s Presentation Part One: Definitions Concepts Kernels Similarity Measurement Part Two: Clustering Algorithm – SI CLUSTERING Parameter-based Optimization Experiments Conclusions
5
Problem Statement Model activities/events/experiences as information networks in addition to social relationships of people. Social influence can propagate through networks: 1. Self – influence: people influence one another based solely on the social network; 2. Co – influence: people influence one another through individuals’ participation in some activity/event networks. TWO KINDS OF INFLUENCE
6
Problem Statement Social Collaboration Network (Social Graph/ SG) THREE TYPES OF GRAPHS/NETWORKS SG = (U, E) U: set of vertices, members of the social network (e.g., authors, customers.) E: Set of edges denoting the collaborative relationships between the members. N SG : the size of U.
7
Problem Statement Associated Activity Network (Activity Graph/ AG i ) THREE TYPES OF GRAPHS/NETWORKS AG i = (V i, S i ) V i : Activity vertices in the i th associated activity network AG i. S i : Weighted edges representing the similarity between two activity vertices. N AG i : the size of each activity vertex set.
8
Problem Statement Influence Network (Influence Graph/ IG i ) THREE TYPES OF GRAPHS/NETWORKS
9
Problem Statement HETEROGENEOUS NETWORK When you consider both Self-influence and Co- influence networks, the network as a whole is Heterogeneous.
10
Problem Statement HETEROGENEOUS NETWORK
11
Problem Statement Given a social graph, multiple activity graphs and corresponding influence graphs. Problem: Partition the member vertices U into K disjoint clusters U i A desired clustering result should achieve a good balance: (1) Vertices within one cluster should have similar collaborative patterns among themselves and similar interaction patterns with activity networks; (2) Vertices in different clusters should have dissimilar collaborative patterns and dissimilar interaction patterns with activities S ocial I nfluence-based graph Cluster ing (SI-Cluster)
12
Problem Statement Clustering algorithm should be fast and scalable to the number of influence graphs and the size of the activity graphs S ocial I nfluence-based graph Cluster ing (SI-Cluster)
13
Dataset DBLP Dataset It consists of two types of entities: authors and conferences and three types of links: co-authorship, author-conference, conference similarity.
14
Influence-based Similarity Step 1: Heat Diffusion on Social Graph
15
Influence-based Similarity Step 2: Compute Self-influence Similarity
16
Influence-based Similarity Co-influence Kernel on Influence Graph Non-propagating heat diffusion kernel Hi for each influence graph IG i (one hop)
17
Influence-based Similarity Co-influence Kernel on Influence Graph
18
Influence-based Similarity Step 3: Compute Propagating Co-influence Kernel on Influence Graph Philip S. Yu and his co- authors with more than 45 co-publications
19
Influence-based Similarity Step 4: Partition Activities into Clusters Philip S. Yu and his co- authors with more than 45 co-publications
20
Influence-based Similarity Propagate Heat Distribution Initial the heat distribution f ij (0) for each cluster c ij in each influence graph IG i
21
Influence-based Similarity Step 5: Compute Influence Score Based on Co-influence Model
22
Influence-based Similarity Step 6: Compute Co-influence Similarity Philip S. Yu and his co- authors with more than 45 co-publications
23
Influence-based Similarity Step 6: Compute Co-influence Similarity Co-influence Similarity Matrix Wi for each influence graph IGi Step 7: Compute Unified Co-influence based Similarity
24
SI- Clustering Algorithm What is it? Initialization the most centrally located point in a cluster as a centroid assign the rest of points to their closest centroids Clustering convergence Clustering objective Calculate Update N + 1 weights iteration
25
SI- Clustering Algorithm Cont. Initialization
26
SI- Clustering Algorithm Cont. Vertex Assignment and Centroid Update Update centroid with the most centrally located vertex in each cluster
27
SI- Clustering Algorithm Cont. Clustering Objective Function
28
SI- Clustering Algorithm Cont. Clustering Objective Function Cont.
29
Simplified: (1) cluster assignment (2) centroid update (3) weight adjustment SI- Clustering Algorithm Cont. Clustering Objective Function Cont. common to all partitioning clustering algorithms
30
SI- Clustering Algorithm Cont. Parameter-based Optimization
31
SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.
32
SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.
33
SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.
34
SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.
35
SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.
36
SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.
37
The procedure of solving this NPPP optimization problem includes two parts: (1) find such a reasonable parameter β (F(β) = 0), making NPPP equivalent to NFPP; (2) given the parameter β, solve a polynomial programming problem about the original variables. SI- Clustering Algorithm Cont. Adaptive Weight Adjustment & Clustering Algorithm
38
Amazon product co-purchasing network 20,000 products activity graphs: product category graph and customer review graph DBLP bibliography data - A full version: 964,166 authors activity graphs: Conference and Keyword - A subset of DBLP data: 100,000 authors activity graphs: Conference and Keyword Evaluation Datasets
39
Algorithms to be compare - BAGC - SA-Cluster - Inc-Cluster - W-Cluster Measures - Density: - Entropy - Davies-Bouldin Index Evaluation Cont. Baseline Methods
40
Dataset: 200,000 Amazon products. The number of clusters: K = 40, 60, 80, 100. Evaluation Cont. Cluster quality evaluation
41
Dataset: DBI on DBLP with 100, 000 authors. The number of clusters: K = 400, 600, 800, 1000. Evaluation Cont. Cluster quality evaluation Cont.
42
Dataset: DBI on DBLP with 964, 166 authors. The number of clusters: K = 4000, 6000, 8000, 10000. Evaluation Cont. Cluster quality evaluation Cont.
43
Evaluation Cont. Cluster efficiency evaluation
44
Observation: Both the social weight and the keyword weight are increasing but the conference weight is decreasing with more iterations. Explanation: People who have many publications in the same conferences may have different research topics but people who have many papers with the same keywords usually have the same research topics, and thus have a higher collaboration probability as co-authors. Evaluation Cont. Cluster convergence
45
Evaluation Cont. Case Study
46
Undefined influence- based model Webs Evaluation Compute vertex similarity Update Centroid Conclusion link entities Static activities Dynamic activities SI-Clustering a sophisticated nonlinear fractional programming problem a straightforward nonlinear parametric programming problem
47
Integrated different types of links, entities, static attributes and dynamic activities from different networks into a unifying influence-based model. Proposed an iterative learning algorithm. Transformed a sophisticated nonlinear fractional programming problem of multiple weights into a straightforward nonlinear parametric programming problem of single variable to speed up the clustering process. Conclusion Cont.
48
Thanks ! Q&A ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.