Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
gSpan: Graph-based substructure pattern mining
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Mauro Sozio and Aristides Gionis Presented By:
Progressive Approach to Relational Entity Resolution Yasser Altowim, Dmitri Kalashnikov, Sharad Mehrotra Progressive Approach to Relational Entity Resolution.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
Data Mining and Machine Learning Lab Document Clustering via Matrix Representation Xufei Wang, Jiliang Tang and Huan Liu Arizona State University.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Visual Mining of Communities in Complex Networks: Bringing Humans Into the Loop Perceptual Science and Technology REU Jack Murtagh & Florentina Ferati.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
1 Branch and Bound Searching Strategies 2 Branch-and-bound strategy 2 mechanisms: A mechanism to generate branches A mechanism to generate a bound so.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Fast Random Walk with Restart and Its Applications
SUDOKU Via Relaxation Labeling
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Optimal Parallelogram Selection for Hierarchical Tiling Authors: Xing Zhou, Maria J. Garzaran, David Padua University of Illinois Presenter: Wei Zuo.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Non Negative Matrix Factorization
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Recursion and Dynamic Programming. Recursive thinking… Recursion is a method where the solution to a problem depends on solutions to smaller instances.
Mining High Utility Itemset in Big Data
Progressive Approach to Relational Entity Resolution Yasser Altowim, Dmitri Kalashnikov, Sharad Mehrotra Progressive Approach to Relational Entity Resolution.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
An Efficient Greedy Method for Unsupervised Feature Selection
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Graph preprocessing. Framework for validating data cleaning techniques on binary data.
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Efficient Semi-supervised Spectral Co-clustering with Constraints
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
2010 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT) Hierarchical Cost-sensitive Web Resource Acquisition.
Unsupervised Streaming Feature Selection in Social Media
Ultra-high dimensional feature selection Yun Li
Faster Symmetry Discovery using Sparsity of Symmetries Paul T. Darga Karem A. Sakallah Igor L. Markov The University of Michigan.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Finding Dense and Connected Subgraphs in Dual Networks
Large Graph Mining: Power Tools and a Practitioner’s guide
MEIKE: Influence-based Communities in Networks
Kernels Usman Roshan.
Hierarchical clustering approaches for high-throughput data
The Importance of Communities for Learning to Influence
Adaptive entity resolution with human computation
Jiawei Han Department of Computer Science
Graph Clustering Based on Structural/Attribute Similarities
Usman Roshan CS 675 Machine Learning
Non-Negative Matrix Factorization
Presentation transcript:

Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd): Framework, algorithms, and applications." SIGKDD, 2013.

Background Role Discovery – Find groups of nodes that share similar topological structure in the graph (e.g. hub nodes, members of clique, peripheral nodes) Feature matrix V for a graph – Pre-computed given a graph – Examples of features Node Degree The number of triangles a node participates in Maximal neighbor degree Average neighbor degree – Existing algorithm to compute feature matrix ReFex [1] 1. Henderson, Keith, et al. "It's who you know: graph mining using recursive structural features." SIGKDD, 2011.

Contribution of this paper Existing work for role discovery problem – RolX [1] : Achieve special solution by adding convex constraints – E.g. sparsity, diversity, alternativeness where n*r matrix G is role assignment matrix and r*f matrix F is the role explanation matrix 1. Henderson, Keith, et al. "Rolx: structural role extraction & mining in large graphs." SIGKDD, 2012.

Constraints-sparsity Sparsity – Nodes are assigned to as few roles as possible – Roles are defined with respect to as few features as possible – Simple explanation of the data

Constraints-diversity Diversity – Prevent role definitions and role assignment to be highly overlapping – Each role uses a different set of features and nodes are assigned to different combinations of roles

Constraints-Alternativeness There may exist multiple explanations of data Returned explanation may be undesirable – Find another good explanation that is different to those already found

Algorithm Solve the following general form Basic strategy – Alternating Least Square (ALS) – Solve for one column of G or one row of F at a time – Original problem become a series of the following subproblem (convex! Easy to solve) where

Experiment—Identity Resolution DBLP Dataset: – 6 co-author graphs from 6 different conferences KDD, ICDM, SDM, CIKM, SIGMOD, VLDB Steps for evaluation 4. Recall icdm = num_match / num_total ; where num_total is the size of set S ( S include authors shared by ICDM and KDD ), num_match is number of authors from S satisfying: consider author, G kdd (i, :)’s k nearest neighbors (rows) from G icdm include the original author i.

Experiment–Identity Resolution Conclusion: GLRD is better on data mining conferences such as CIKM, SDM, ICDM, not on other conferences such as SIGMOD and VLDB Reason: The same authors play similar roles in KDD and other data mining conferences (CIKM, SDM, ICDM). Their role assignment vectors should be similar which results in high recall.

Experiment–Alternative roles Dataset: KDD co-author graph Use RoIX to get the original role definition Use GLRD to find role definition different from original one