LINE: Large-scale Information Network Embedding

Slides:

Advertisements

Similar presentations

Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.

Advertisements

BiG-Align: Fast Bipartite Graph Alignment

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.

Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.

Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.

Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.

CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.

Lecture 21: Spectral Clustering

Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.

Distributed Representations of Sentences and Documents

Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.

Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.

Efficient Image Search and Retrieval using Compact Binary Codes

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

1 Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction Zequian shen, Kwan-Liu Ma, Tina Eliassi-Rad Department.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.

Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.

Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.

IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Source-Selection-Free Transfer Learning

Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.

1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science

Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.

Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.

Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.

1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.

Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.

Measuring Behavioral Trust in Social Networks

Data Structures and Algorithms in Parallel Computing Lecture 7.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.

Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning.

Unsupervised Streaming Feature Selection in Social Media

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi

GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.

Warren Shen, Xin Li, AnHai Doan Database & AI Groups University of Illinois, Urbana Constraint-Based Entity Matching.

DeepWalk: Online Learning of Social Representations

Ke (Kevin) Wu1,2, Philip Watters1, Malik Magdon-Ismail1

Cohesive Subgraph Computation over Large Graphs

Learning Deep Generative Models by Ruslan Salakhutdinov

Sofus A. Macskassy Fetch Technologies

Distributed Representations of Subgraphs

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

Latent Space Model for Road Networks to Predict Time-Varying Traffic

Learning with information of features

Weakly Learning to Match Experts in Online Community

Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph

Peer-to-Peer and Social Networks Fall 2017

Scaling up Link Prediction with Ensembles

Asymmetric Transitivity Preserving Graph Embedding

Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.

GANG: Detecting Fraudulent Users in OSNs

Modelling and Searching Networks Lecture 2 – Complex Networks

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Keshav Balasubramanian

Heterogeneous Graph Convolutional Network

Peng Cui Tsinghua University

Presentation transcript:

LINE: Large-scale Information Network Embedding Jian Tang Microsoft Research Asia Acknowledgements: Meng Qu, Mingzhe Wang, Qiaozhu Mei, Ming Zhang

Ubiquitous Large-scale Information Networks Social network World Wide Web Internet of Things (IOT) Citation network … Real-world networks are very large, e.g., Facebook social network: ~ 1 billion users WWW: ~50 billion webpages Internet of Things Very challenging in analyzing large-scale networks Sparse High-dimension

Deep Learning is very Successful in Many Domains Natural language Speech Network ？ Deep Learning Image

Deep Learning for Network Embedding + Deep Learning Sparse, high-dimension Dense, low-dimension Potentially useful in many domains&applications Text embedding Link prediction Ranking, Recommendation Node classification Network Visualization

Natural Language/Text Unsupervised text (e.g., words and documents) embedding degree network edge node word document classification text embedding Text representation, e.g., word and document representation, … … Deep learning has been attracting increasing attention … A future direction of deep learning is to integrate unlabeled data … The Skip-gram model is quite effective and efficient … Information networks encode the relationships between the data objects … Word co-occurrence network text information network word … classification doc_1 doc_2 doc_3 doc_4 Word-document network Free text

Natural Language/Text Predictive text (e.g., words and documents) embedding degree network edge node word document classification text embedding text information network word … classification doc_1 doc_2 doc_3 doc_4 Text representation, e.g., word and document representation, … … label document Deep learning has been attracting increasing attention … A future direction of deep learning is to integrate unlabeled data … The Skip-gram model is quite effective and efficient … Information networks encode the relationships between the data objects … null Word co-occurrence network Word-document network text information network word … classification label_2 label_1 label_3 Free text Word-label network

Social Network User embedding Friend recommendation User classification

Academic Network Author, paper, venue embedding Recommend related authors, papers, venues Author, paper, venue classification Author Paper Venue

Enterprise Network People, document, project, embedding Recommend related people, documents, projects People, document, project classification

Related Work Classical graph embedding algorithms MDS, IsoMap, LLE, Laplacian Eigenmap etc Hard to scale up Graph factorization (Ahmed et al. 2013) Not specifically designed for network embedding Usually for undirected graphs DeepWalk (Perozzi et al. 2014) Lack a clear objective function Only designed for networks with binary edges

Our Approach: LINE Applicable for various types of networks Directed, undirected, and/or weighted Has a clear objective function Preserve the first-order and second-order proximity between the vertices Very scalable Effective and efficient optimization algorithm through asynchronous stochastic gradient descent Only take a couple of hours to embed network with millions of nodes, billions of edges on a single machine

What LINE has DONE so Far Unsupervised text embedding (Tang et al. WWW’15) Outperforms Skipgram through embedding the word co-occurrence network Outperforms ParagraphVEC through embedding the word-document network Predictive text embedding (Tang et al. KDD’15) Outperforms CNN on long documents, comparable on short documents More scalable than CNN Has few parameters to tune Social & Citation Network embedding (Tang et al. WWW’15) Outperforms DeepWalk and graph factorization Tang et al. LINE: Large-scale Information Network Embedding. WWW’15 Tang et al. PTE: Predictive text embedding through large-scale heterogeneous text networks. KDD’15

First-order Proximity The local pairwise proximity between the vertices Determined by the observed links However, many links between the vertices are missing Not sufficient for preserving the entire network structure 1 2 3 4 5 6 7 8 9 10 Vertex 6 and 7 have a large first-order proximity

Second-order Proximity The proximity between the neighborhood structures of the vertices Mathematically, the second-order proximity between each pair of vertices (u,v) is determined by: 1 2 3 4 5 6 7 8 9 10 𝑝 𝑢 =( 𝑤 𝑢1 , 𝑤 𝑢2 , …, 𝑤 𝑢 𝑉 ) Vertex 5 and 6 have a large second-order proximity 𝑝 𝑣 =( 𝑤 𝑣1 , 𝑤 𝑣2 , …, 𝑤 𝑣 𝑉 ) 𝑝 5 =(1,1, 1,1,0,0,0,0,0,0) “The degree of overlap of two people’s friendship networks correlates with the strength of ties between them” --Mark Granovetter 𝑝 6 =(1,1, 1,1,0,0,5,0,0,0) “You shall know a word by the company it keeps” --John Rupert Firth

Preserving the First-order Proximity Given an undirected edge 𝑣 𝑖 , 𝑣 𝑗 , the joint probability of 𝑣 𝑖 , 𝑣 𝑗 𝑝 1 𝑣 𝑖 , 𝑣 𝑗 = 1 1+exp⁡(− 𝑢 𝑖 𝑇 ⋅ 𝑢 𝑗 ) 𝑢 𝑖 : Embedding of vertex 𝑣 𝑖 𝑣 𝑖 𝑝 1 𝑣 𝑖 , 𝑣 𝑗 = 𝑤 𝑖𝑗 ( 𝑖 ′ , 𝑗 ′ ) 𝑤 𝑖 ′ 𝑗 ′ Objective: KL-divergence 𝑂 1 =𝑑( 𝑝 1 ⋅,⋅ , 𝑝 1 ⋅,⋅ ) ∝− 𝑖,𝑗 ∈𝐸 𝑤 𝑖𝑗 log 𝑝 1 ( 𝑣 𝑖 , 𝑣 𝑗 )

Preserving the Second-order Proximity Given a directed edge ( 𝑣 𝑖 , 𝑣 𝑗 ), the conditional probability of 𝑣 𝑗 given 𝑣 𝑖 is: 𝑝 2 𝑣 𝑗 | 𝑣 𝑖 = exp ( 𝑢 𝑗 ′𝑇 ⋅ 𝑢 𝑖 ) 𝑘=1 |𝑉| exp ( 𝑢 𝑘 ′𝑇 ⋅ 𝑢 𝑖 ) 𝑢 𝑖 : Embedding of vertex i when i is a source node; 𝑢 𝑖 ′ : Embedding of vertex i when i is a target node. 𝑝 2 𝑣 𝑗 | 𝑣 𝑖 = 𝑤 𝑖𝑗 𝑘∈𝑉 𝑤 𝑖𝑘 Objective: 𝑂 2 = 𝑖∈𝑉 𝜆 𝑖 𝑑( 𝑝 2 ⋅ 𝑣 𝑖 , 𝑝 2 ⋅ 𝑣 𝑖 ) 𝜆 𝑖 : Prestige of vertex in the network 𝜆 𝑖 = 𝑗 𝑤 𝑖𝑗 ∝− 𝑖,𝑗 ∈𝐸 𝑤 𝑖𝑗 log 𝑝 2 ( 𝑣 𝑗 | 𝑣 𝑖 )

Preserving both Proximity Concatenate the embeddings individually learned by the two proximity First-order Second-order

Multiplied by the weight of the edge 𝑤 𝑖𝑗 Optimization Stochastic gradient descent + Negative Sampling Randomly sample an edge and multiple negative edges The gradient w.r.t the embedding with edge (i, j) Multiplied by the weight of the edge 𝑤 𝑖𝑗 𝜕 𝑂 2 𝜕 𝑢 𝑖 = 𝑤 𝑖𝑗 ⋅ 𝜕 log 𝑝 2 ( 𝑣 𝑗 | 𝑣 𝑖 ) 𝜕 𝑢 𝑖 Problematic when the weights of the edges diverge The scale of the gradients with different edges diverges Solution: edge sampling Sample the edges according to their weights and treat the edges as binary 𝑂(𝑑K 𝐸 ) Complexity: Linear to the dimension d, the number of negative samples K, and the number of edges |E|

Discussion Embedding Vertices of small degrees Embedding New Vertices Sparse information in the neighborhood Solution: expand the neighbors by adding higher-order neighbors e.g., neighbors of neighbors breadth-first search only consider the second-order neighbors Fix existing embeddings, and optimize w.r.t the new ones Objective − 𝑗∈𝑁 𝑖 𝑤 𝑗𝑖 log p 1 ( v j , v i ) or − 𝑗∈𝑁 𝑖 𝑤 𝑗𝑖 log p 2 ( v j | v i )

Unsupervised Text Embedding Word analogy Entire Wikipedia articles => word co-occurrence network (~2M words, 1B edges) Algorithm Semantic(%) Syntactic(%) Overall Running time GF 61.38 44.08 51.93 2.96h DeepWalk 50.79 37.70 43.65 16.64h Skipgram 69.14 57.94 63.02 2.82h LINE(1st) 58.08 49.42 53.35 2.44h LINE(2nd ) 73.79 59.72 66.10 2.55h Effectiveness: LINE(2nd) >LINE(1st)>GF>DeepWalk LINE(2nd) >Skipgram!! Efficiency: LINE(1st)>LINE(2nd)> Skipgram>GF>DeepWalk

Unsupervised Text Embedding Example of nearest words Word Proximity Type Top Similar Words good 1st luck, bad, faith, assume, nice 2nd decent, bad, excellent, lousy, reasonable Information provide, provides, detailed, facts, verifiable information, ifnormaiton, informations, nonspammy, animecons graph graphs, algebraic, finite, symmetric, topology graphs, subgraph, matroid, hypergraph, undirected learn teach, learned, inform, educate, how learned, teach, relearn, learnt, understand

Unsupervised Text Embedding Text classification Word co-occurrence network (w-w) , word-document network (w-d) to learn the word embedding Document embedding as the average word embeddings in the document Results on long documents 20 newsgroup, Wikipedia article, IMDB 20NG Wikipedia IMDB Type Algorithm Micro-F1 Macro-F1 Unsupervised Embedding Skipgram 70.62 68.99 75.80 75.77 85.34 PV 75.13 73.48 76.68 76.75 86.76 LINE(w-w) 72.78 70.95 77.72 86.16 LINE(w-d) 79.73 78.40 80.14 80.13 89.14 LINE (w-w +w-d) 78.74 77.39 79.91 79.94 89.07 LINE(w-w) > Skipgram(Google) LINE(w-d) > PV (Google) LINE(w-d) > LINE(w-w)

Unsupervised Text Embedding Text classification Word co-occurrence network (w-w) , word-document network (w-d) to learn the word embedding Document embedding as the average word embeddings in the document Results on short documents DBLP paper title (DBLP), movie review (MR), Tweets (Twitter) DBLP MR Twitter Type Algorithm Micro-F1 Macro-F1 Unsupervised Embedding SkipGram 73.08 68.92 67.05 73.02 73.00 PV 67.19 62.46 67.78 71.29 71.18 LINE(w-w) 73.98 69.92 71.07 71.06 73.19 73.18 LINE(w-d) 71.50 67.23 69.25 69.24 LINE (w-w +w-d) 74.22 70.12 71.13 71.12 73.84 LINE(w-w) > Skipgram LINE(w-d) > PV LINE(w-w) > LINE(w-d)

Predictive Text Embedding Predictive text embedding through embedding heterogeneous text network Word co-occurrence network (w-w), word-document network (w-d), word-label network (w-l) Results on long documents 20NG Wikipedia IMDB Type Algorithm Micro-F1 Macro-F1 Unsupervised Embedding LINE(w-d) 79.73 78.40 80.14 80.13 89.14 Predictive CNN 80.15 79.43 79.25 79.32 89.00 LINE(w-l) 82.70 81.97 79.00 79.02 85.98 LINE(ALL) 84.20 83.39 82.51 82.49 89.80 LINE(ALL) >CNN

Predictive Text Embedding Predictive text embedding through embedding heterogeneous text network Word co-occurrence network (w-w), word-document network (w-d), word-label network (w-l) Results on short documents DBLP MR Twitter Type Algorithm Micro-F1 Macro-F1 Unsupervised Embedding LINE (w-w + w-d) 74.22 70.12 71.13 71.12 73.84 Predictive CNN 76.16 73.08 72.71 72.69 75.97 75.96 LINE(w-l) 76.45 72.74 73.44 73.42 73.92 73.91 LINE(ALL) 77.15 73.61 73.58 73.57 75.21 LINE(ALL) ≈ CNN

Document Visualization Train(LINE(l-w)) Train(LINE(d-w)) Test(LINE(l-w)) Test(LINE(d-w))

Social Network Embedding Node classification Community as the ground truth Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90% GF 53.23 53.68 53.98 54.14 54.32 54.38 54.43 54.50 54.48 DeepWalk 60.38 60.77 60.90 61.05 61.13 61.18 61.19 61.29 61.22 DeepWalk(256dim) 60.41 61.09 61.35 61.52 61.69 61.76 61.80 61.91 61.83 LINE(1st) 63.27 63.69 63.82 63.92 63.96 64.03 64.06 64.17 64.10 LINE(2nd) 62.83 63.24 63.34 63.44 63.55 63.59 63.66 LINE(1st+2nd) 63.20** 63.97** 64.25** 64.39** 64.53** 64.55** 64.61** 64.75** 64.74** LINE(1st+2nd)>LINE(1st)>LINE(2nd )>DeepWalk>GF

Author Citation Network Author classification Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90% DeepWalk 63.98 64.51 64.75 64.81 64.92 64.99 65.00 64.90 LINE-SGD(2nd) 56.64 58.95 59.89 60.20 60.44 60.61 60.58 60.73 60.59 LINE(2nd ) 62.49 (64.69**) 63.30 (65.47**) 63.63 (65.85**) 63.77 (66.04**) 63.84 (66.19**) 63.94 (66.25**) 63.96 (66.30**) 64.00 (66.12**) (66.06**) LINE(2nd )>DeepWalk>LINE-SGD(2nd )

Paper Citation Network Paper classification Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90% DeepWalk 52.83 53.80 54.34 54.75 55.07 55.13 55.48 55.42 55.90 LINE(2nd ) 58.42 (60.10**) 59.58 (61.06**) 60.29 (61.46**) 60.78 (61.73**) 60.94 (61.85**) 61.20 (62.10**) 61.39 (62.21**) (62.25**) 61.79 (62.80**) LINE(2nd ) > DeepWalk

Network Layouts Coauthor network 18,561 authors and 207,074 edges “Data mining” “Machine learning” “Computer vision” (a) Graph factorization (b) DeepWalk (c) LINE(2nd )

Scalability (a) Speed up v.s. #threads (b) Micro-F1 v.s. #threads

Take Away Deep learning for networks! A large-scale network embedding model LINE Preserves the first-order and second-order proximity General, scalable Useful in many applications Outperforms unsupervised word embedding algorithm Skipgram Outperforms unsupervised document embedding algorithm paragraphVEC Outperforms supervised document embedding approach CNN on long documents State-of-the-art performance in social & citation network embedding

Thanks! Open Source: https://github.com/tangjianpku/LINE Jian Tang jiatang@microsoft.com And the source code of LINE is available online. Thanks for your attention!