Variational Knowledge Graph Reasoning

Slides:

Advertisements

Similar presentations

Jose-Luis Blanco, Javier González, Juan-Antonio Fernández-Madrigal University of Málaga (Spain) Dpt. of System Engineering and Automation May Pasadena,

Advertisements

Patch to the Future: Unsupervised Visual Prediction

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.

Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Visual Recognition Tutorial

Representation learning for Knowledge Bases LivesIn BornIn LocateIn Friendship Nationality Nicole Kidman PerformIn Nationality Sydney Hugh Jackman Australia.

. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Introduction to Machine Learning for Information Retrieval Xiaolong Wang.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Lecture 2: Statistical learning primer for biologists

Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

A Review of Relational Machine Learning for Knowledge Graphs CVML Reading Group Xiao Lin.

Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.

R-NET: Machine Reading Comprehension With Self-Matching Networks

Neural Machine Translation

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Learning Recommender Systems with Adaptive Regularization

Deep Predictive Model for Autonomous Driving

An Empirical Study of Learning to Rank for Entity Search

Wenhan Xiong, Thien Hoang, William Wang Department of Computer Science

Neural Machine Translation by Jointly Learning to Align and Translate

Supervised Time Series Pattern Discovery through Local Importance

Intelligent Information System Lab

Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

Deep Learning based Machine Translation

Learning Hierarchical Features from Generative Models

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Knowledge Base Completion

Knowledge Graph Embedding

Overview of Machine Learning

Bayesian Inference for Mixture Language Models

Table Cell Search for Question Answering Huan Sun

Stochastic Optimization Maximization for Latent Variable Models

John Lafferty, Chengxiang Zhai School of Computer Science

Bayesian Nonparametric Matrix Factorization for Recorded Music

Topic Models in Text Processing

Speech recognition, machine learning

Topological Signatures For Fast Mobility Analysis

Jointly Generating Captions to Aid Visual Question Answering

Presented by: Anurag Paul

Report Yang Zhang.

Prince Wang, William Wang UC Santa Barbara

Background Task Fashion image inpainting Some conceptions

Speech recognition, machine learning

Jianbo Chen*, Le Song†✦, Martin J. Wainwright*◇ , Michael I. Jordan*

Presentation transcript:

Variational Knowledge Graph Reasoning Wenhu Chen, Wenhan Xiong, Xifeng Yan, William Wang Department of Computer Science UC Santa Barbara

Outline Introduction to Knowledge Graph Completion Reinterpret the problem as a generative model How to resolve the new intractable objective using variational inference Experimental Results and Conclusion

Knowledge Graph English Las Vegas serviceLanguage CA personLanguages Caesars Entertain… Neal McDonough Tom Hanks serviceLocation nationality castActor awardWinner countryOfOrigin United States Band of Brothers

Knowledge Graph Completion serviceLocation United States Caesars Entertain countryOfOrigin serviceLanguage Query: ?(Band of Brothers, English) Band of Brothers English castActor personLanguages Neal McDonough

Problem Formulation During Training, we intentionally mask some relations as missing links and use them as training triples: During Test, we are interested in filling the relation slot given entity pair: 𝐷 𝑡𝑟𝑎𝑖𝑛 =( 𝑒 𝑠 , 𝑒 𝑑 ,𝑟) 𝐾𝐵=( ℎ𝑒𝑎𝑑 𝑖 , 𝑡𝑎𝑖𝑙 𝑖 , 𝑟𝑒𝑙 𝑖 ) 𝐷 𝑡𝑒𝑠𝑡 =( 𝑒 𝑠 , 𝑒 𝑑 ,?) 𝐾𝐵=( ℎ𝑒𝑎𝑑 𝑖 , 𝑡𝑎𝑖𝑙 𝑖 , 𝑟𝑒𝑙 𝑖 )

Existing KGC methods Embedding-based methods (fast and efficient) TransE, Bordes et al, 2013 TransR/CTransR, Lin et al, 2015 DistMult, Yang et al, 2015 ComplEx, Trouillon et al., 2016 Path-based methods (accurate and explainable) Path-Ranking Algorithm (PRA), Lao et al. 2011 Compositional Vector, Neelakantan et al. 2015 DeepPath, Xiong et al, 2017 Chains of Reasoning, Das et al, 2017 MINERVA, Das et al, 2018

KGC from a generative perspective English 𝑒 𝑑 𝐿 tvProgram Language 𝑝(𝐿| 𝑒 𝑠 , 𝑒 𝑑 ) 𝑝 𝑟 𝐿 𝑟 𝑒 𝑠 Band of Brothers KG Condition Observed Variable Latent Variable 𝑝= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝 𝑝(𝑟| 𝑒 𝑠 , 𝑒 𝑑 )= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝 log 𝐿 𝑝 𝑟 𝐿 𝑝(𝐿| 𝑒 𝑠 , 𝑒 𝑑 ) where prior: 𝑝 𝐿 𝑒 𝑠 , 𝑒 𝑑 , and likelihood: 𝑝 𝑟 𝐿

Variational Inference Variational Bayesian methods: optimizing intractable integrals: Maximize ELBO as surrogate objective. 𝐥𝐨𝐠 𝐩(𝐱)=𝐥𝐨𝐠 𝐩 𝐱|𝐳 𝐩 𝐳 𝐝𝐳 𝐄𝐯𝐢𝐝𝐞𝐧𝐜𝐞 𝐋𝐨𝐰𝐞𝐫 𝐁𝐨𝐮𝐧𝐝 (𝐄𝐋𝐁𝐎) KL−divergence≥0 DM Blei et. al 2016

Variational Auto-Encoder (VAE) Variational Auto-Encoder provides an efficient and practical way to perform variational inference. Encoder Decoder DP Kingma et al. ‎2013

Challenge of VAE in KG Existing VAE methods only consider continuous latent vectors: NLP applications: Machine translation (Biao et al. 2016) Text generation (K Guu et al. ‎2017) Dialogue generation (TH Wen et al. 2017) CV applications: Image classification (DP Kingma et al. ‎2013) Image captioning (Liwei et al. 2017) Visual question generation (Unnat et al. 2017) We are tackling sequential discrete variables.

KG Variational Inference (KG-VI) No re-parameterization for 𝑞 𝜑 𝐿 𝑒 𝑠 , 𝑒 𝑑 ,𝑟 Our prior distribution 𝑝 𝛽 𝐿 𝑒 𝑠 , 𝑒 𝑑 is trainable We view the sampling of latent variable as a Markov Decision Process 𝑎 𝜏+2 𝑒 𝜏+2 𝑎 𝜏+1 𝑒 𝜏+1 𝑒 𝜏+2 𝑒 𝑠 𝑒 1 𝑒 𝜏 𝑒 𝜏+1 𝑒 𝜏+2 𝑎 𝜏 𝑒 𝜏+1

KG Variational Inference (KG-VI) We view likelihood 𝑝 𝜃 (𝑟|𝐿) as a sequence classification model. 𝑒 𝑠 𝑒 1 𝑒 𝑑 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 1 𝐶𝑁𝑁/𝑅𝑁𝑁 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 2 𝑒 𝑠 𝑒 2 𝑒 𝑑 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 3 𝑒 𝑠 𝑒 3 𝑒 𝑑 𝑛/𝑎 3

𝐾𝐿(𝑞 𝐿 𝑒 𝑠 , 𝑒 𝑑 ,𝑟 ||𝑝(𝐿| 𝑒 𝑠 , 𝑒 𝑑 )) Evidence Lower Bound ELBO = Reconstruction + KL-divergence Reconstruction Loss 𝔼 𝑞 𝐿 𝑒 𝑠 , 𝑒 𝑑 ,𝑟 log 𝑝 𝑟 𝐿 log p r 𝑒 𝑠 , 𝑒 𝑑 ≥𝐸𝐿𝐵𝑂 − 𝐾𝐿(𝑞 𝐿 𝑒 𝑠 , 𝑒 𝑑 ,𝑟 ||𝑝(𝐿| 𝑒 𝑠 , 𝑒 𝑑 )) KL-divergence where posterior distribution: 𝑞 𝐿 𝑒 𝑠 , 𝑒 𝑑 ,𝑟 DP Kingma et al. ‎2013

KG Variational Inference (KG-VI) Training with Gradient Descent 𝔼 𝑞 𝜑 𝐿 𝑒 𝑠 , 𝑒 𝑑 ,𝑟 log 𝑝 𝜃 𝑟 𝐿 KG connected Path 𝑒 𝑠 𝑒 𝑑 r 𝑞 𝜑 (𝐿) 𝑝 𝜃 (𝑟|𝐿) r 𝑒 𝑑 𝑒 𝑠 𝐾𝐿( 𝑞 𝜑 𝐿 𝑒 𝑠 , 𝑒 𝑑 ,𝑟 || 𝑝 𝛽 (𝐿| 𝑒 𝑠 , 𝑒 𝑑 )) KG connected Path 𝑒 𝑠 𝑒 𝑑 𝑝 𝛽 (𝐿)

KG Variational Inference (KG-VI) Testing KG connected Path 𝑒 𝑠 𝑒 𝑑 𝑝 𝛽 (𝐿) r 𝑝 𝜃 (𝑟|𝐿) 𝑒 𝑑 𝑒 𝑠 posterior: 𝑞 𝜑 ,likelihood: 𝑝 𝜃 , prior: 𝑝 𝛽

Comparison with MINERVA (Path-Finder) 𝑒 𝑠 X ✔ 𝑅=1.0 𝑅=0.0 Length/ Success MINERVA Das el al.2018 Path-Finder: 𝜕𝐸𝐿𝐵𝑂 𝜕𝜑 = 𝔼 𝐿~ 𝑞 𝜑 [−𝑓(𝐿) 𝜕𝑙𝑜𝑔 𝑞 𝜑 (𝐿| 𝑒 𝑠 , 𝑒 𝑑 ,𝑟) 𝜕𝜑 ] 𝑒 𝑠 X ✔ 𝑓(𝐿)=0.33 𝑓(𝐿)=0.0 𝑓(𝐿)=0.8 Path- Reasoner Our Model 𝑓 𝐿 = 𝑝 𝜃 𝑟 𝐿 −𝑙𝑜𝑔 𝑞 𝜑 𝑝 𝛽

Dataset FB15k, link prediction for 20 relations. NELL-995, link predication for 12 relations. FB15k has more complex reasoning environment Dataset Entity Relation Triple Relations FB15k 14505 237 310116 20 NELL995 75492 200 154213 12 Dataset 𝑇𝑟𝑖𝑝𝑙𝑒 𝐸𝑛𝑡𝑖𝑡𝑦 Path Length Potential links FB15k 22.1 4 22.1 4 =238𝐾 NELL995 2 2 4 =16

Evaluation Given a list of entity pairs, compute the rank of positive sample as evaluation score ( 𝑒 𝑠 ,𝑟, 𝑒 1 + ) ( 𝑒 𝑠 ,𝑟, 𝑒 2 − ) ( 𝑒 𝑠 ,𝑟, 𝑒 3 − ) ( 𝑒 𝑠 ,𝑟, 𝑒 4 − ) ( 𝑒 𝑠 ,𝑟, 𝑒 5 − ) 𝑝 𝛽 (𝐿) 𝐿 1 𝐿 2 𝐿 3 𝑁𝑜𝑛𝑒 Beam-Search 𝑝 𝑟 𝐿 1 =0.14 𝑝 𝑟 𝐿 2 =0.2 𝑝 𝑟 𝐿 3 =0.1 𝑝 𝑟 𝐿 3 =0 𝑀𝐴𝑃= 1 #𝑟𝑎𝑛𝑘( 𝑒 + ) = 1 2 =0.5

Experimental Results on NELL-995/FB-15k Variational inference framework performs better under more noisy environment Model NELL-995 FB15k PRA (Lao el al. 2011) 67.5 54.1 TransE (Bordes et al. 2013) 75.0 53.2 TransR (Lin et al. 2015) 74.0 54.0 TransD (Ji et al. 2015) 77.3 - DeepPath (Xiong et al. 2017) 81.2 57.2 RNN-Chain (Das et al. 2017) 79.0 51.2 MINERVA (Das et al. 2018) 88.8 55.2 CNN Path-Reasoner 82 54.2 Our model 88.6 59.8

Conclusion and Future Work Conclusions Our framework can be seen as a new variational inference framework to deal with sequential latent variables. Our model shows its strength to deal with more complex reasoning envrionments. Future Directions Extend our model to resolve more tasks with sequential latent variables. Das el al. 2017

Thanks! PPT Link: https://wenhuchen.github.io/images/naacl2018.pptx Dataset link: http://cs.ucsb.edu/~xwhan/datasets/NELL-995.zip

Error Analysis Error Type Positive Sample Negative Sample Path-finder Error ✖ (find no paths) ✔ (find paths) Path-reasoner Error 𝑝(𝑟| 𝐿 + ) < 𝑝(𝑟| 𝐿 − )

Prior & Posterior Posterior distribution L rel1 rel2 rel3 Prior distribution L