Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

Similar presentations


Presentation on theme: "Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida."— Presentation transcript:

1 Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida

2 Motivation  The conventional relational classification model focuses on the single-label classification problem.  Real-world relational datasets contain instances associated with multiple labels.  Connections between instances in multi-label networks are driven by various casual reasons. Example: Scientific collaboration network Machine Learning Data Mining Artificial Intelligence 1

3 Problem Formulation  Node classification in multi-relational networks  Input:  Network structure (i.e., connectivity information)  Labels of some actors in the network  Output:  Labels of the other actors 2

4 Classification in Networked Data  Homophily: nodes with similar labels are more likely to be connected  Markov assumption:  The label of one node depends on that of its immediate neighbors in the graph  Relational models are built based on the labels of neighbors.  Predictions are made using collective inference. 3

5 Contribution  A new multi-label iterative relational neighbor classifier (SCRN)  Extract social context features using edge clustering to represent a node’s potential group membership  Use of social features boosts classification performance over benchmarks on several real-world collaborative networked datasets 4

6 Relational Neighbor Classifier  The Relational Neighbor (RN) classifier proposed by Macskassy et al. (MRDM’03), is a simple relational probabilistic model that makes predictions for a given node based solely on the class labels of its neighbors. Iteration 1 Iteration 2 Training Graph 5

7 Relational Neighbor Classifier  Weighted-vote relational neighbor classifier (wvRN) estimates prediction probability as: Here is the usual normalization factor, and is the weight of the link between node and 6

8 Apply RN in Multi-relational Network Ground truth : nodes with both labels (red, green) : nodes with green label only : nodes with red label only 7

9 Edge-Based Social Feature Extraction  Connections in human networks are mainly affiliation- driven.  Since each connection can often be regarded as principally resulting from one affiliation, links possess a strong correlation with a single affiliation class.  The edge class information is not readily available in most social media datasets, but an unsupervised clustering algorithm can be applied to partition the edges into disjoint sets (KDD’09,CIKM’09). 8

10 Cluster edges using K-Means  Scalable edge clustering method proposed by Tang et al. (CIKM’09).  Each edge is represented in a feature-based format, where each edge is characterized by its adjacent nodes.  K-means clustering is used to separate the edges into groups, and the social feature (SF) vector is constructed based on edge cluster IDs. Original network Step1 : Edge representations Step2: Construct social features 9

11 Edge-Clustering Visualization Figure: A subset of DBLP with 95 instances. Edges are clustered into 10 groups, with each shown in a different color. 10

12 Proposed Method: SCRN  The initial set of reference features for class c can be defined as the weighted sum of social feature vectors for nodes known to be in class c:  Then node ’s class propagation probability for class c conditioned on its social features: 11

13 SCRN  SCRN estimates the class-membership probability of node belonging to class c using the following equation: class propagation probability similarity between connected nodes (link weight) class probability of its neighbors 12

14 SCRN Overview Input:, Max_Iter Output: for nodes in 1.Construct nodes’ social feature space 2.Initialize the class reference vectors for each class 3.Calculate the class-propagation probability for each test node 4.Repeat until # of iterations > Max_Iter or predictions converge  Estimate test node’s class probability  Update the test node’s class probability in collective inference  Update the class reference vectors  Re-calculate each node’s class-propagation probability 13

15 SCRN Visualization Figure: SCRN on synthetic multi-label network with 1000 nodes and 32 classes (15 iterations). 14

16 Datasets  DBLP  We construct a weighted collaboration network for authors who have published at least 2 papers during the 2000 to 2010 time- frame.  We selected 15 representative conferences in 6 research areas: DataBase: ICDE,VLDB, PODS, EDBT Data Mining: KDD, ICDM, SDM, PAKDD Artificial Intelligence: IJCAI, AAAI Information Retrieval: SIGIR, ECIR Computer Vision: CVPR Machine Learning: ICML, ECML 15

17 Datasets  IMDb  We extract movies and TV shows released between 2000 and 2010, and those directed by the same director are linked together.  We only retain movies and TV programs with greater than 5 links.  Each movie can be assigned to a subset of 27 different candidate movie genres in the database such as “Drama", “Comedy", “Documentary" and “Action”. 16

18 Datasets  YouTube  A subset of data (15000 nodes) from the original YouTube dataset [1] using snowball sampling.  Each user in YouTube can subscribe to different interest groups and add other users as his/her contacts.  Class labels are 47 interest groups. [1] http://www.public.asu.edu/~ltang9/social_ dimension.html 17

19 Comparative Methods  Edge (EdgeCluster)  wvRN  Prior  Random 18

20 Experiment Setting  Size of social feature space :  1000 for DBLP and YouTube; 10000 for IMDb  Class propagation probability is calculated with the Generalized Histogram Intersection Kernel.  Relaxation Labeling is used in the collective inference framework for SCRN and wvRN.  We assume the number of labels for testing nodes is known. 19

21 Experiment Setting  We employ the network cross-validation (NCV) method (KAIS’11) to reduce the overlap between test samples.  Classification performance is evaluated based on Micro-F1, Macro-F1 and Hamming Loss. 20

22 Results (Micro-F1)  DBLP 21

23 Results (Macro-F1)  DBLP 22

24 Results (Hamming Loss)  DBLP 23

25 Results (Hamming Loss)  IMDb 24

26 Results (Hamming Loss)  YouTube 25

27 Conclusion  Links in multi-relational networks are heterogeneous.  SCRN exploits label homophily while simultaneously leveraging social feature similarity through the introduction of class propagation probabilities.  Significantly boosts classification performance on multi- label collaboration networks.  Our open-source implementation of SCRN is available at: http://code.google.com/p/multilabel-classification-on-social-network/ 26

28 Reference  MACSKASSY, S. A., AND PROVOST, F. A simple relational classifier. In Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM) at KDD, 2003, pp. 64–76.  TANG, L., AND LIU, H. Relational learning via latent social dimensions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2009, pp. 817–826.  TANG, L., AND LIU, H. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of International Conference on Information and Knowledge Management (CIKM), 2009, pp. 1107-1116.  NEVILLE, J., GALLAGHER, B., ELIASSI-RAD, T., AND WANG, T. Correcting evaluation bias of relational classifiers with network cross validation. Knowledge and Information Systems (KAIS), 2011, pp. 1–25. 27

29 Thank you! 28


Download ppt "Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida."

Similar presentations


Ads by Google