Distributed Representations of Subgraphs

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
DAVA: Distributing Vaccines over Networks under Prior Information
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Distributed Representations of Sentences and Documents
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
Graph Classification.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
On Node Classification in Dynamic Content-based Networks.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Melbourne, Australia, Oct., 2015 gSparsify: Graph Motif Based Sparsification for Graph Clustering Peixiang Zhao Department of Computer Science Florida.
Kijung Shin Jinhong Jung Lee Sael U Kang
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Unsupervised Streaming Feature Selection in Social Media
Controlling Propagation at Group Scale on Networks Yao Zhang*, Abhijin Adiga +, Anil Vullikanti + *, and B. Aditya Prakash* *Department of Computer Science.
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
LINE: Large-scale Information Network Embedding
DeepWalk: Online Learning of Social Representations
Graph clustering to detect network modules
Inferring Networks of Diffusion and Influence
Finding Dense and Connected Subgraphs in Dual Networks
A Viewpoint-based Approach for Interaction Graph Analysis
Analysis of University Researcher Collaboration Network Using Co-authorship Jiadi Yao School of Electronic and Computer Science,
MEIKE: Influence-based Communities in Networks
Tutorial: Big Data Algorithms and Applications Under Hadoop
Sofus A. Macskassy Fetch Technologies
Computing and Compressive Sensing in Wireless Sensor Networks
DM-Group Meeting Liangzhe Chen, Nov
Correlative Multi-Label Multi-Instance Image Annotation
Supervised Time Series Pattern Discovery through Local Importance
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
DEMON A Local-first Discovery Method For Overlapping Communities
IDENTIFICATION OF DENSE SUBGRAPHS FROM MASSIVE SPARSE GRAPHS
NetMine: Mining Tools for Large Graphs
Kijung Shin1 Mohammad Hammoud1
Efficient Estimation of Word Representation in Vector Space
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS
Large Graph Mining: Power Tools and a Practitioner’s guide
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
The Watts-Strogatz model
Knowledge Base Completion
Discovering Functional Communities in Social Media
Mining E-Commerce Query Relations using Customer Interaction Networks
Department of Computer Science University of York
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Biased Random Walk based Social Regularization for Word Embeddings
Asymmetric Transitivity Preserving Graph Embedding
Automatic Segmentation of Data Sequences
GANG: Detecting Fraudulent Users in OSNs
Pei Lee, ICDE 2014, Chicago, IL, USA
Deep Interest Network for Click-Through Rate Prediction
Human-object interaction
Keshav Balasubramanian
Heterogeneous Graph Attention Network
Peng Cui Tsinghua University
Visual Grounding.
Presentation transcript:

Distributed Representations of Subgraphs Bijaya Adhikari, Yao Zhang, Naren Ramakrishnan, and B. Aditya Prakash Department of Computer Science Virginia Tech IEEE ICDM DaMNet, New Orleans, Nov 18th, 2017

Adhikari, Zhang, Ramakrishnan, Prakash Outline Motivation Problem Formulation Method Experiments Conclusion Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Motivation Network Embedding Framework Input Network Embeddings Data Mining Tasks Classification Community Detection Link Prediction Anomaly Detection Sense Making … Many Possible Applications! Adhikari, Zhang, Ramakrishnan, Prakash

Motivation: Previous work Most existing works are on node embeddings DeepWalk[Perozzi+, KDD2014] Node2vec[Grover+, KDD 2016] SDNE[Wang+, KDD 2016] LINE[Tang+,WWW 2015] Graph 𝐺(𝑉,𝐸) Vectors How to embed entire subgraphs? Adhikari, Zhang, Ramakrishnan, Prakash

Motivation: Our Approach Given a set of subgraphs from the same graph Learn feature representations of each subgraph Set of Subgraphs Subgraph Embedding “Preserve” pre-defined “subgraph property” Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Outline Motivation Problem Formulation Method Experiments Conclusion Adhikari, Zhang, Ramakrishnan, Prakash

Problem Formulation: Setting Given A set S= 𝑔 1 , 𝑔 2 , …, 𝑔 𝑛 of subgraphs Typically for the same graph An integer 𝑑 Learn 𝑑-dimensional embedding for each subgraph Such that pre-defined subgraph property is preserved Set of Subgraphs Subgraph Embedding Adhikari, Zhang, Ramakrishnan, Prakash

Problem formulation: Challenges What subgraph property to preserve? How to characterize the property? 𝑔 1 𝑔 2 𝑔 3 Adhikari, Zhang, Ramakrishnan, Prakash

Idea: Neighborhood property Captures neighborhood information within the subgraph 𝑔 1 𝑔 2 𝑔 3 Subgraph 𝑔 1 and 𝑔 2 share neighborhood Subgraph 𝑔 3 does not Adhikari, Zhang, Ramakrishnan, Prakash

Capturing neighborhood property Neighborhood property of a subgraph is defined as the set of all paths annotated by node ids (ID- Paths) in the subgraph {(a,b,a,c), (c,e,a,e), (e,c,a,c), (b,e,c,e), … } {(c,d,d,c), (c,e,a,e), (e,c,a,c), (d,c,d,e), … } {(i,h,j,k), (h,k,i,h), (k,h,j,i), (i,h,k,j), … } Able to capture similarity in the neighborhood Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Problem Statement Set of Subgraphs Given: A set of subgraph S= 𝑔 1 , 𝑔 2 , …, 𝑔 𝑛 An integer 𝑑 Learn: An embedding function 𝑓: 𝑔 𝑖 → 𝒚 𝑖 ∈ 𝑹 𝒅 Subgraph Embedding Such that: The neighborhood property of subgraphs is preserved Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Outline Motivation Problem Formulation Method Experiments Conclusion Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Subvec Framework Overview Generate samples of Id-paths Enumerating all path is not possible Generate samples of paths Leverage the Id-Paths to learn embeddings Learn the embedding such that nodes in the subgraph can be predicted Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Samples of id-paths How to efficiently generate samples of Id-Paths? Subgraph Truncated Random Walks Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Feature learning How to learn feature vectors for each subgraphs? Leverage Paragraph2vec’s idea [Quoc+, ICML 2014] SubVec: Distributed Memory Model DM SubVec: Distributed Bag of Nodes DBON Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Subvec: DM Models the probability of node occurring in the Id- Path Probability depends on Embedding of the node Embedding of other nodes in the Id-Path Embedding of the subgraph Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Subvec: DM Objective The overall objective of SubVec DM is to maximize the log-likelihood Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Subvec: DBON Models the probability of a short walk 𝜃 appearing in the Id-Path of a subgraph Probability depends on Embedding of the nodes in the walk Embedding of the subgraph Adhikari, Zhang, Ramakrishnan, Prakash

Subvec: DBON Objective The overall objective of SubVec DBON is to maximize the log-likelihood Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Complete algorithm The pseudo-code is as following Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Outline Motivation Problem Formulation Method Experiments Conclusion Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash datasets Dataset |V| |E| Domain Workplace 92 757 Contact Cornell 195 304 Web HighSchool 182 2221 Texas 187 328 Washington 230 446 Wisconsin 265 530 PolBlogs 1490 16783 Youtube 1.13M 2.97M Social Adhikari, Zhang, Ramakrishnan, Prakash

Community detection using subvec Problem: Give a network find partitions of the network Such that intra-partition density is high and inter-partitions density is low Adhikari, Zhang, Ramakrishnan, Prakash

Community detection: Method Graph Ego-Nets Embeddings Clusters Adhikari, Zhang, Ramakrishnan, Prakash

Community detection: Baselines Newman [Newman, 2006] Classical Modularity based Community Detection algorithm Louvian [Bondel+, 2008] Fast Modularity based Community Detection algorithm DeepWalk [Perozzi+, 2014] Node embeddings based on vanilla random walk Node2Vec [Grover+, 2014] Node embeddings based on second order random walk Adhikari, Zhang, Ramakrishnan, Prakash

Community detection: results More results in paper Measure Average F1-Score of the communities SubVec outperforms competitors in most datasets Adhikari, Zhang, Ramakrishnan, Prakash

Community Detection: Visualization Ground Truth Communities in HighSchool Dataset Node2vec SubVec Our Framework works well even for dense graphs Adhikari, Zhang, Ramakrishnan, Prakash

Case-study: MeMetracker Memetracker dataset Consists of cascades of memes A meme is a short phrase Cascades flows though news and blog websites Steps Each cascade induces a subgraph in the network Embed the subgraphs enduced by the cascades Cluster the embedding Observe the common ‘topics’ in each cluster Lipstick on a pig Lipstick on a pig Lipstick on a pig NBC BBC CNN Adhikari, Zhang, Ramakrishnan, Prakash

Case-study: MeMetracker Religious Entertainment Spanish Politics SubVec vectors from meaningful clusters Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Case-study: DBLP DBLP is a co-authorship Network We extract subgraphs based on keywords in the title of the papers Keywords include ‘classification’, ‘clustering’, ‘XML’, and so on Each subgraph is annotated by a keyword Steps Embed the subgraphs using SubVec Visualize in 2-dimensions Observe similarity between the keywords Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Case-study: DBLP SubVec vectors are meaningful Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Scalability More results in paper SubVec scales linearly w.r.t number of subgraphs Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Outline Motivation Problem Formulation Method Experiments Conclusion Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Conclusion Problem Formulated novel Subgraph Embedding Problem Introduced the Neighborhood Property Algorithm Proposed effective and efficient SubVec Experiments Large Datasets, Performance, Scalability Applications Community Detections Sense Making Adhikari, Zhang, Ramakrishnan, Prakash

Adhikari, Zhang, Ramakrishnan, Prakash Any questions? Funding: Code at: http://people.cs.vt.edu/~bijaya Set of Subgraphs Subgraph Embedding Data Mining Tasks Classification Community Detection Link Prediction Anomaly Detection Sense Making … Adhikari, Zhang, Ramakrishnan, Prakash