Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variational Knowledge Graph Reasoning

Similar presentations


Presentation on theme: "Variational Knowledge Graph Reasoning"โ€” Presentation transcript:

1 Variational Knowledge Graph Reasoning
Wenhu Chen, Wenhan Xiong, Xifeng Yan, William Wang Department of Computer Science UC Santa Barbara

2 Outline Introduction to Knowledge Graph Completion
Reinterpret the problem as a generative model How to resolve the new intractable objective using variational inference Experimental Results and Conclusion

3 Knowledge Graph English Las Vegas serviceLanguage CA personLanguages
Caesars Entertainโ€ฆ Neal McDonough Tom Hanks serviceLocation nationality castActor awardWinner countryOfOrigin United States Band of Brothers

4 Knowledge Graph Completion
serviceLocation United States Caesars Entertain countryOfOrigin serviceLanguage Query: ?(Band of Brothers, English) Band of Brothers English castActor personLanguages Neal McDonough

5 Problem Formulation During Training, we intentionally mask some relations as missing links and use them as training triples: During Test, we are interested in filling the relation slot given entity pair: ๐ท ๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘› =( ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,๐‘Ÿ) ๐พ๐ต=( โ„Ž๐‘’๐‘Ž๐‘‘ ๐‘– , ๐‘ก๐‘Ž๐‘–๐‘™ ๐‘– , ๐‘Ÿ๐‘’๐‘™ ๐‘– ) ๐ท ๐‘ก๐‘’๐‘ ๐‘ก =( ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,?) ๐พ๐ต=( โ„Ž๐‘’๐‘Ž๐‘‘ ๐‘– , ๐‘ก๐‘Ž๐‘–๐‘™ ๐‘– , ๐‘Ÿ๐‘’๐‘™ ๐‘– )

6 Existing KGC methods Embedding-based methods (fast and efficient)
TransE, Bordes et al, 2013 TransR/CTransR, Lin et al, 2015 DistMult, Yang et al, 2015 ComplEx, Trouillon et al., 2016 Path-based methods (accurate and explainable) Path-Ranking Algorithm (PRA), Lao et al. 2011 Compositional Vector, Neelakantan et al. 2015 DeepPath, Xiong et al, 2017 Chains of Reasoning, Das et al, 2017 MINERVA, Das et al, 2018

7 KGC from a generative perspective
English ๐‘’ ๐‘‘ ๐ฟ tvProgram Language ๐‘(๐ฟ| ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ) ๐‘ ๐‘Ÿ ๐ฟ ๐‘Ÿ ๐‘’ ๐‘  Band of Brothers KG Condition Observed Variable Latent Variable ๐‘= ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐‘ ๐‘(๐‘Ÿ| ๐‘’ ๐‘  , ๐‘’ ๐‘‘ )= ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐‘ log ๐ฟ ๐‘ ๐‘Ÿ ๐ฟ ๐‘(๐ฟ| ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ) where prior: ๐‘ ๐ฟ ๐‘’ ๐‘  , ๐‘’ ๐‘‘ , and likelihood: ๐‘ ๐‘Ÿ ๐ฟ

8 Variational Inference
Variational Bayesian methods: optimizing intractableย integrals:ย  Maximize ELBO as surrogate objective. ๐ฅ๐จ๐  ๐ฉ(๐ฑ)=๐ฅ๐จ๐  ๐ฉ ๐ฑ|๐ณ ๐ฉ ๐ณ ๐๐ณ ๐„๐ฏ๐ข๐๐ž๐ง๐œ๐ž ๐‹๐จ๐ฐ๐ž๐ซ ๐๐จ๐ฎ๐ง๐ (๐„๐‹๐๐Ž) KLโˆ’divergenceโ‰ฅ0 ย DM Blei et. al 2016

9 Variational Auto-Encoder (VAE)
Variational Auto-Encoder provides an efficient and practical way to perform variational inference. Encoder Decoder DP Kingma et al. โ€Ž2013ย 

10 Challenge of VAE in KG Existing VAE methods only consider continuous latent vectors: NLP applications: Machine translation (Biao et al. 2016) Text generation (K Guu et al. โ€Ž2017) Dialogue generation (TH Wen et al. 2017) CV applications: Image classification (DP Kingma et al. โ€Ž2013) Image captioning (Liwei et al. 2017) Visual question generation (Unnat et al. 2017) We are tackling sequential discrete variables.

11 KG Variational Inference (KG-VI)
No re-parameterization for ๐‘ž ๐œ‘ ๐ฟ ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,๐‘Ÿ Our prior distribution ๐‘ ๐›ฝ ๐ฟ ๐‘’ ๐‘  , ๐‘’ ๐‘‘ is trainable We view the sampling of latent variable as a Markov Decision Process ๐‘Ž ๐œ+2 ๐‘’ ๐œ+2 ๐‘Ž ๐œ+1 ๐‘’ ๐œ+1 ๐‘’ ๐œ+2 ๐‘’ ๐‘  ๐‘’ 1 ๐‘’ ๐œ ๐‘’ ๐œ+1 ๐‘’ ๐œ+2 ๐‘Ž ๐œ ๐‘’ ๐œ+1

12 KG Variational Inference (KG-VI)
We view likelihood ๐‘ ๐œƒ (๐‘Ÿ|๐ฟ) as a sequence classification model. ๐‘’ ๐‘  ๐‘’ 1 ๐‘’ ๐‘‘ ๐‘Ÿ๐‘’๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› 1 ๐ถ๐‘๐‘/๐‘…๐‘๐‘ ๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ ๐‘Ÿ๐‘’๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› 2 ๐‘’ ๐‘  ๐‘’ 2 ๐‘’ ๐‘‘ ๐‘Ÿ๐‘’๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› 3 ๐‘’ ๐‘  ๐‘’ 3 ๐‘’ ๐‘‘ ๐‘›/๐‘Ž 3

13 ๐พ๐ฟ(๐‘ž ๐ฟ ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,๐‘Ÿ ||๐‘(๐ฟ| ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ))
Evidence Lower Bound ELBO = Reconstruction + KL-divergence Reconstruction Loss ๐”ผ ๐‘ž ๐ฟ ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,๐‘Ÿ log ๐‘ ๐‘Ÿ ๐ฟ log p r ๐‘’ ๐‘  , ๐‘’ ๐‘‘ โ‰ฅ๐ธ๐ฟ๐ต๐‘‚ โˆ’ ๐พ๐ฟ(๐‘ž ๐ฟ ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,๐‘Ÿ ||๐‘(๐ฟ| ๐‘’ ๐‘  , ๐‘’ ๐‘‘ )) KL-divergence where posterior distribution: ๐‘ž ๐ฟ ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,๐‘Ÿ DP Kingma et al. โ€Ž2013ย 

14 KG Variational Inference (KG-VI)
Training with Gradient Descent ๐”ผ ๐‘ž ๐œ‘ ๐ฟ ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,๐‘Ÿ log ๐‘ ๐œƒ ๐‘Ÿ ๐ฟ KG connected Path ๐‘’ ๐‘  ๐‘’ ๐‘‘ r ๐‘ž ๐œ‘ (๐ฟ) ๐‘ ๐œƒ (๐‘Ÿ|๐ฟ) r ๐‘’ ๐‘‘ ๐‘’ ๐‘  ๐พ๐ฟ( ๐‘ž ๐œ‘ ๐ฟ ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,๐‘Ÿ || ๐‘ ๐›ฝ (๐ฟ| ๐‘’ ๐‘  , ๐‘’ ๐‘‘ )) KG connected Path ๐‘’ ๐‘  ๐‘’ ๐‘‘ ๐‘ ๐›ฝ (๐ฟ)

15 KG Variational Inference (KG-VI)
Testing KG connected Path ๐‘’ ๐‘  ๐‘’ ๐‘‘ ๐‘ ๐›ฝ (๐ฟ) r ๐‘ ๐œƒ (๐‘Ÿ|๐ฟ) ๐‘’ ๐‘‘ ๐‘’ ๐‘  posterior: ๐‘ž ๐œ‘ ,likelihood: ๐‘ ๐œƒ , prior: ๐‘ ๐›ฝ

16 Comparison with MINERVA (Path-Finder)
๐‘’ ๐‘  X โœ” ๐‘…=1.0 ๐‘…=0.0 Length/ Success MINERVA Das el al.2018 Path-Finder: ๐œ•๐ธ๐ฟ๐ต๐‘‚ ๐œ•๐œ‘ = ๐”ผ ๐ฟ~ ๐‘ž ๐œ‘ [โˆ’๐‘“(๐ฟ) ๐œ•๐‘™๐‘œ๐‘” ๐‘ž ๐œ‘ (๐ฟ| ๐‘’ ๐‘  , ๐‘’ ๐‘‘ ,๐‘Ÿ) ๐œ•๐œ‘ ] ๐‘’ ๐‘  X โœ” ๐‘“(๐ฟ)=0.33 ๐‘“(๐ฟ)=0.0 ๐‘“(๐ฟ)=0.8 Path- Reasoner Our Model ๐‘“ ๐ฟ = ๐‘ ๐œƒ ๐‘Ÿ ๐ฟ โˆ’๐‘™๐‘œ๐‘” ๐‘ž ๐œ‘ ๐‘ ๐›ฝ

17 Dataset FB15k, link prediction for 20 relations.
NELL-995, link predication for 12 relations. FB15k has more complex reasoning environment Dataset Entity Relation Triple Relations FB15k 14505 237 310116 20 NELL995 75492 200 154213 12 Dataset ๐‘‡๐‘Ÿ๐‘–๐‘๐‘™๐‘’ ๐ธ๐‘›๐‘ก๐‘–๐‘ก๐‘ฆ Path Length Potential links FB15k 22.1 4 =238๐พ NELL995 2 2 4 =16

18 Evaluation Given a list of entity pairs, compute the rank of positive sample as evaluation score ( ๐‘’ ๐‘  ,๐‘Ÿ, ๐‘’ 1 + ) ( ๐‘’ ๐‘  ,๐‘Ÿ, ๐‘’ 2 โˆ’ ) ( ๐‘’ ๐‘  ,๐‘Ÿ, ๐‘’ 3 โˆ’ ) ( ๐‘’ ๐‘  ,๐‘Ÿ, ๐‘’ 4 โˆ’ ) ( ๐‘’ ๐‘  ,๐‘Ÿ, ๐‘’ 5 โˆ’ ) ๐‘ ๐›ฝ (๐ฟ) ๐ฟ 1 ๐ฟ 2 ๐ฟ 3 ๐‘๐‘œ๐‘›๐‘’ Beam-Search ๐‘ ๐‘Ÿ ๐ฟ 1 =0.14 ๐‘ ๐‘Ÿ ๐ฟ 2 =0.2 ๐‘ ๐‘Ÿ ๐ฟ 3 =0.1 ๐‘ ๐‘Ÿ ๐ฟ 3 =0 ๐‘€๐ด๐‘ƒ= 1 #๐‘Ÿ๐‘Ž๐‘›๐‘˜( ๐‘’ + ) = 1 2 =0.5

19 Experimental Results on NELL-995/FB-15k
Variational inference framework performs better under more noisy environment Model NELL-995 FB15k PRA (Lao el al. 2011) 67.5 54.1 TransE (Bordes et al. 2013) 75.0 53.2 TransR (Lin et al. 2015) 74.0 54.0 TransD (Ji et al. 2015) 77.3 - DeepPath (Xiong et al. 2017) 81.2 57.2 RNN-Chain (Das et al. 2017) 79.0 51.2 MINERVA (Das et al. 2018) 88.8 55.2 CNN Path-Reasoner 82 54.2 Our model 88.6 59.8

20 Conclusion and Future Work
Conclusions Our framework can be seen as a new variational inference framework to deal with sequential latent variables. Our model shows its strength to deal with more complex reasoning envrionments. Future Directions Extend our model to resolve more tasks with sequential latent variables. Das el al. 2017

21 Thanks! PPT Link: https://wenhuchen.github.io/images/naacl2018.pptx
Dataset link:

22 Error Analysis Error Type Positive Sample Negative Sample
Path-finder Error โœ– (find no paths) โœ” (find paths) Path-reasoner Error ๐‘(๐‘Ÿ| ๐ฟ + ) < ๐‘(๐‘Ÿ| ๐ฟ โˆ’ )

23 Prior & Posterior Posterior distribution L rel1 rel2 rel3
Prior distribution L


Download ppt "Variational Knowledge Graph Reasoning"

Similar presentations


Ads by Google