Presentation is loading. Please wait.

Presentation is loading. Please wait.

LINE: Large-scale Information Network Embedding

Similar presentations


Presentation on theme: "LINE: Large-scale Information Network Embedding"— Presentation transcript:

1 LINE: Large-scale Information Network Embedding
Jian Tang Microsoft Research Asia Acknowledgements: Meng Qu, Mingzhe Wang, Qiaozhu Mei, Ming Zhang

2 Ubiquitous Large-scale Information Networks
Social network World Wide Web Internet of Things (IOT) Citation network Real-world networks are very large, e.g., Facebook social network: ~ 1 billion users WWW: ~50 billion webpages Internet of Things Very challenging in analyzing large-scale networks Sparse High-dimension

3 Deep Learning is very Successful in Many Domains
Natural language Speech Network Deep Learning Image

4 Deep Learning for Network Embedding
+ Deep Learning Sparse, high-dimension Dense, low-dimension Potentially useful in many domains&applications Text embedding Link prediction Ranking, Recommendation Node classification Network Visualization

5 Natural Language/Text
Unsupervised text (e.g., words and documents) embedding degree network edge node word document classification text embedding Text representation, e.g., word and document representation, … Deep learning has been attracting increasing attention … A future direction of deep learning is to integrate unlabeled data … The Skip-gram model is quite effective and efficient … Information networks encode the relationships between the data objects … Word co-occurrence network text information network word classification doc_1 doc_2 doc_3 doc_4 Word-document network Free text

6 Natural Language/Text
Predictive text (e.g., words and documents) embedding degree network edge node word document classification text embedding text information network word classification doc_1 doc_2 doc_3 doc_4 Text representation, e.g., word and document representation, … label document Deep learning has been attracting increasing attention … A future direction of deep learning is to integrate unlabeled data … The Skip-gram model is quite effective and efficient … Information networks encode the relationships between the data objects … null Word co-occurrence network Word-document network text information network word classification label_2 label_1 label_3 Free text Word-label network

7 Social Network User embedding Friend recommendation
User classification

8 Academic Network Author, paper, venue embedding
Recommend related authors, papers, venues Author, paper, venue classification Author Paper Venue

9 Enterprise Network People, document, project, embedding
Recommend related people, documents, projects People, document, project classification

10 Related Work Classical graph embedding algorithms
MDS, IsoMap, LLE, Laplacian Eigenmap etc Hard to scale up Graph factorization (Ahmed et al. 2013) Not specifically designed for network embedding Usually for undirected graphs DeepWalk (Perozzi et al. 2014) Lack a clear objective function Only designed for networks with binary edges

11 Our Approach: LINE Applicable for various types of networks
Directed, undirected, and/or weighted Has a clear objective function Preserve the first-order and second-order proximity between the vertices Very scalable Effective and efficient optimization algorithm through asynchronous stochastic gradient descent Only take a couple of hours to embed network with millions of nodes, billions of edges on a single machine

12 What LINE has DONE so Far
Unsupervised text embedding (Tang et al. WWW’15) Outperforms Skipgram through embedding the word co-occurrence network Outperforms ParagraphVEC through embedding the word-document network Predictive text embedding (Tang et al. KDD’15) Outperforms CNN on long documents, comparable on short documents More scalable than CNN Has few parameters to tune Social & Citation Network embedding (Tang et al. WWW’15) Outperforms DeepWalk and graph factorization Tang et al. LINE: Large-scale Information Network Embedding. WWW’15 Tang et al. PTE: Predictive text embedding through large-scale heterogeneous text networks. KDD’15

13 First-order Proximity
The local pairwise proximity between the vertices Determined by the observed links However, many links between the vertices are missing Not sufficient for preserving the entire network structure 1 2 3 4 5 6 7 8 9 10 Vertex 6 and 7 have a large first-order proximity

14 Second-order Proximity
The proximity between the neighborhood structures of the vertices Mathematically, the second-order proximity between each pair of vertices (u,v) is determined by: 1 2 3 4 5 6 7 8 9 10 𝑝 𝑢 =( 𝑤 𝑢1 , 𝑤 𝑢2 , …, 𝑤 𝑢 𝑉 ) Vertex 5 and 6 have a large second-order proximity 𝑝 𝑣 =( 𝑤 𝑣1 , 𝑤 𝑣2 , …, 𝑤 𝑣 𝑉 ) 𝑝 5 =(1,1, 1,1,0,0,0,0,0,0) “The degree of overlap of two people’s friendship networks correlates with the strength of ties between them” --Mark Granovetter 𝑝 6 =(1,1, 1,1,0,0,5,0,0,0) “You shall know a word by the company it keeps” --John Rupert Firth

15 Preserving the First-order Proximity
Given an undirected edge 𝑣 𝑖 , 𝑣 𝑗 , the joint probability of 𝑣 𝑖 , 𝑣 𝑗 𝑝 1 𝑣 𝑖 , 𝑣 𝑗 = 1 1+exp⁡(− 𝑢 𝑖 𝑇 ⋅ 𝑢 𝑗 ) 𝑢 𝑖 : Embedding of vertex 𝑣 𝑖 𝑣 𝑖 𝑝 1 𝑣 𝑖 , 𝑣 𝑗 = 𝑤 𝑖𝑗 ( 𝑖 ′ , 𝑗 ′ ) 𝑤 𝑖 ′ 𝑗 ′ Objective: KL-divergence 𝑂 1 =𝑑( 𝑝 1 ⋅,⋅ , 𝑝 1 ⋅,⋅ ) ∝− 𝑖,𝑗 ∈𝐸 𝑤 𝑖𝑗 log 𝑝 1 ( 𝑣 𝑖 , 𝑣 𝑗 )

16 Preserving the Second-order Proximity
Given a directed edge ( 𝑣 𝑖 , 𝑣 𝑗 ), the conditional probability of 𝑣 𝑗 given 𝑣 𝑖 is: 𝑝 2 𝑣 𝑗 | 𝑣 𝑖 = exp ( 𝑢 𝑗 ′𝑇 ⋅ 𝑢 𝑖 ) 𝑘=1 |𝑉| exp ( 𝑢 𝑘 ′𝑇 ⋅ 𝑢 𝑖 ) 𝑢 𝑖 : Embedding of vertex i when i is a source node; 𝑢 𝑖 ′ : Embedding of vertex i when i is a target node. 𝑝 2 𝑣 𝑗 | 𝑣 𝑖 = 𝑤 𝑖𝑗 𝑘∈𝑉 𝑤 𝑖𝑘 Objective: 𝑂 2 = 𝑖∈𝑉 𝜆 𝑖 𝑑( 𝑝 2 ⋅ 𝑣 𝑖 , 𝑝 2 ⋅ 𝑣 𝑖 ) 𝜆 𝑖 : Prestige of vertex in the network 𝜆 𝑖 = 𝑗 𝑤 𝑖𝑗 ∝− 𝑖,𝑗 ∈𝐸 𝑤 𝑖𝑗 log 𝑝 2 ( 𝑣 𝑗 | 𝑣 𝑖 )

17 Preserving both Proximity
Concatenate the embeddings individually learned by the two proximity First-order Second-order

18 Multiplied by the weight of the edge 𝑤 𝑖𝑗
Optimization Stochastic gradient descent + Negative Sampling Randomly sample an edge and multiple negative edges The gradient w.r.t the embedding with edge (i, j) Multiplied by the weight of the edge 𝑤 𝑖𝑗 𝜕 𝑂 2 𝜕 𝑢 𝑖 = 𝑤 𝑖𝑗 ⋅ 𝜕 log 𝑝 2 ( 𝑣 𝑗 | 𝑣 𝑖 ) 𝜕 𝑢 𝑖 Problematic when the weights of the edges diverge The scale of the gradients with different edges diverges Solution: edge sampling Sample the edges according to their weights and treat the edges as binary 𝑂(𝑑K 𝐸 ) Complexity: Linear to the dimension d, the number of negative samples K, and the number of edges |E|

19 Discussion Embedding Vertices of small degrees Embedding New Vertices
Sparse information in the neighborhood Solution: expand the neighbors by adding higher-order neighbors e.g., neighbors of neighbors breadth-first search only consider the second-order neighbors Fix existing embeddings, and optimize w.r.t the new ones Objective − 𝑗∈𝑁 𝑖 𝑤 𝑗𝑖 log p 1 ( v j , v i ) or − 𝑗∈𝑁 𝑖 𝑤 𝑗𝑖 log p 2 ( v j | v i )

20 Unsupervised Text Embedding
Word analogy Entire Wikipedia articles => word co-occurrence network (~2M words, 1B edges) Algorithm Semantic(%) Syntactic(%) Overall Running time GF 61.38 44.08 51.93 2.96h DeepWalk 50.79 37.70 43.65 16.64h Skipgram 69.14 57.94 63.02 2.82h LINE(1st) 58.08 49.42 53.35 2.44h LINE(2nd ) 73.79 59.72 66.10 2.55h Effectiveness: LINE(2nd) >LINE(1st)>GF>DeepWalk LINE(2nd) >Skipgram!! Efficiency: LINE(1st)>LINE(2nd)> Skipgram>GF>DeepWalk

21

22 Unsupervised Text Embedding
Example of nearest words Word Proximity Type Top Similar Words good 1st luck, bad, faith, assume, nice 2nd decent, bad, excellent, lousy, reasonable Information provide, provides, detailed, facts, verifiable information, ifnormaiton, informations, nonspammy, animecons graph graphs, algebraic, finite, symmetric, topology graphs, subgraph, matroid, hypergraph, undirected learn teach, learned, inform, educate, how learned, teach, relearn, learnt, understand

23 Unsupervised Text Embedding
Text classification Word co-occurrence network (w-w) , word-document network (w-d) to learn the word embedding Document embedding as the average word embeddings in the document Results on long documents 20 newsgroup, Wikipedia article, IMDB 20NG Wikipedia IMDB Type Algorithm Micro-F1 Macro-F1 Unsupervised Embedding Skipgram 70.62 68.99 75.80 75.77 85.34 PV 75.13 73.48 76.68 76.75 86.76 LINE(w-w) 72.78 70.95 77.72 86.16 LINE(w-d) 79.73 78.40 80.14 80.13 89.14 LINE (w-w +w-d) 78.74 77.39 79.91 79.94 89.07 LINE(w-w) > Skipgram(Google) LINE(w-d) > PV (Google) LINE(w-d) > LINE(w-w)

24 Unsupervised Text Embedding
Text classification Word co-occurrence network (w-w) , word-document network (w-d) to learn the word embedding Document embedding as the average word embeddings in the document Results on short documents DBLP paper title (DBLP), movie review (MR), Tweets (Twitter) DBLP MR Twitter Type Algorithm Micro-F1 Macro-F1 Unsupervised Embedding SkipGram 73.08 68.92 67.05 73.02 73.00 PV 67.19 62.46 67.78 71.29 71.18 LINE(w-w) 73.98 69.92 71.07 71.06 73.19 73.18 LINE(w-d) 71.50 67.23 69.25 69.24 LINE (w-w +w-d) 74.22 70.12 71.13 71.12 73.84 LINE(w-w) > Skipgram LINE(w-d) > PV LINE(w-w) > LINE(w-d)

25 Predictive Text Embedding
Predictive text embedding through embedding heterogeneous text network Word co-occurrence network (w-w), word-document network (w-d), word-label network (w-l) Results on long documents 20NG Wikipedia IMDB Type Algorithm Micro-F1 Macro-F1 Unsupervised Embedding LINE(w-d) 79.73 78.40 80.14 80.13 89.14 Predictive CNN 80.15 79.43 79.25 79.32 89.00 LINE(w-l) 82.70 81.97 79.00 79.02 85.98 LINE(ALL) 84.20 83.39 82.51 82.49 89.80 LINE(ALL) >CNN

26 Predictive Text Embedding
Predictive text embedding through embedding heterogeneous text network Word co-occurrence network (w-w), word-document network (w-d), word-label network (w-l) Results on short documents DBLP MR Twitter Type Algorithm Micro-F1 Macro-F1 Unsupervised Embedding LINE (w-w + w-d) 74.22 70.12 71.13 71.12 73.84 Predictive CNN 76.16 73.08 72.71 72.69 75.97 75.96 LINE(w-l) 76.45 72.74 73.44 73.42 73.92 73.91 LINE(ALL) 77.15 73.61 73.58 73.57 75.21 LINE(ALL) ≈ CNN

27 Document Visualization
Train(LINE(l-w)) Train(LINE(d-w)) Test(LINE(l-w)) Test(LINE(d-w))

28 Social Network Embedding
Node classification Community as the ground truth Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90% GF 53.23 53.68 53.98 54.14 54.32 54.38 54.43 54.50 54.48 DeepWalk 60.38 60.77 60.90 61.05 61.13 61.18 61.19 61.29 61.22 DeepWalk(256dim) 60.41 61.09 61.35 61.52 61.69 61.76 61.80 61.91 61.83 LINE(1st) 63.27 63.69 63.82 63.92 63.96 64.03 64.06 64.17 64.10 LINE(2nd) 62.83 63.24 63.34 63.44 63.55 63.59 63.66 LINE(1st+2nd) 63.20** 63.97** 64.25** 64.39** 64.53** 64.55** 64.61** 64.75** 64.74** LINE(1st+2nd)>LINE(1st)>LINE(2nd )>DeepWalk>GF

29 Author Citation Network
Author classification Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90% DeepWalk 63.98 64.51 64.75 64.81 64.92 64.99 65.00 64.90 LINE-SGD(2nd) 56.64 58.95 59.89 60.20 60.44 60.61 60.58 60.73 60.59 LINE(2nd ) 62.49 (64.69**) 63.30 (65.47**) 63.63 (65.85**) 63.77 (66.04**) 63.84 (66.19**) 63.94 (66.25**) 63.96 (66.30**) 64.00 (66.12**) (66.06**) LINE(2nd )>DeepWalk>LINE-SGD(2nd )

30 Paper Citation Network
Paper classification Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90% DeepWalk 52.83 53.80 54.34 54.75 55.07 55.13 55.48 55.42 55.90 LINE(2nd ) 58.42 (60.10**) 59.58 (61.06**) 60.29 (61.46**) 60.78 (61.73**) 60.94 (61.85**) 61.20 (62.10**) 61.39 (62.21**) (62.25**) 61.79 (62.80**) LINE(2nd ) > DeepWalk

31 Network Layouts Coauthor network 18,561 authors and 207,074 edges
“Data mining” “Machine learning” “Computer vision” (a) Graph factorization (b) DeepWalk (c) LINE(2nd )

32 Scalability (a) Speed up v.s. #threads (b) Micro-F1 v.s. #threads

33 Take Away Deep learning for networks!
A large-scale network embedding model LINE Preserves the first-order and second-order proximity General, scalable Useful in many applications Outperforms unsupervised word embedding algorithm Skipgram Outperforms unsupervised document embedding algorithm paragraphVEC Outperforms supervised document embedding approach CNN on long documents State-of-the-art performance in social & citation network embedding

34 Thanks! Open Source: https://github.com/tangjianpku/LINE Jian Tang
And the source code of LINE is available online. Thanks for your attention!


Download ppt "LINE: Large-scale Information Network Embedding"

Similar presentations


Ads by Google