Wenhan Xiong, Thien Hoang, William Wang Department of Computer Science

Slides:



Advertisements
Similar presentations
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Advertisements

Partially Observable Markov Decision Process (POMDP)
Exact Inference in Bayes Nets
Patch to the Future: Unsupervised Visual Prediction
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Zach Ramaekers Computer Science University of Nebraska at Omaha Advisor: Dr. Raj Dasgupta 1.
Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Representation learning for Knowledge Bases LivesIn BornIn LocateIn Friendship Nationality Nicole Kidman PerformIn Nationality Sydney Hugh Jackman Australia.
CHURN PREDICTION MODEL IN RETAIL BANKING USING FUZZY C- MEANS CLUSTERING Džulijana Popović Consumer Finance, Zagrebačka banka d.d. Consumer Finance, Zagrebačka.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Search and Planning for Inference and Learning in Computer Vision
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
12/07/2008UAI 2008 Cumulative Distribution Networks and the Derivative-Sum-Product Algorithm Jim C. Huang and Brendan J. Frey Probabilistic and Statistical.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Neural Networks Chapter 7
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Semantic Compositionality through Recursive Matrix-Vector Spaces
Kijung Shin Jinhong Jung Lee Sael U Kang
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
ConvNets for Image Classification
A Review of Relational Machine Learning for Knowledge Graphs CVML Reading Group Xiao Lin.
Brief Intro to Machine Learning CS539
Deep Reinforcement Learning
Reinforcement Learning
CNN-RNN: A Unified Framework for Multi-label Image Classification
End-To-End Memory Networks
Deep Feedforward Networks
Online Multiscale Dynamic Topic Models
Syntax-based Deep Matching of Short Texts
Deep Predictive Model for Autonomous Driving
Adversarial Learning for Neural Dialogue Generation
Mastering the game of Go with deep neural network and tree search
Reinforcement learning (Chapter 21)
AlphaGo with Deep RL Alpha GO.
Reinforcement learning (Chapter 21)
Reinforcement Learning
Deep reinforcement learning
AlphaGO from Google DeepMind in 2016, beat human grandmasters
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Hybrid computing using a neural network with dynamic external memory
"Playing Atari with deep reinforcement learning."
Bayesian inference Presented by Amir Hadadi
Basic Intro Tutorial on Machine Learning and Data Mining
Bird-species Recognition Using Convolutional Neural Network
Variational Knowledge Graph Reasoning
UAV Route Planning in Delay Tolerant Networks
General Aspects of Learning
RL methods in practice Alekh Agarwal.
Knowledge Base Completion
CSCI 5822 Probabilistic Models of Human and Machine Learning
Knowledge Graph Embedding
Department of Computer Science University of York
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Reinforcement Learning
Introduction to Reinforcement Learning and Q-Learning
Deep Reinforcement Learning
Designing Neural Network Architectures Using Reinforcement Learning
Discriminative Probabilistic Models for Relational Data
Learning to Rank Typed Graph Walks: Local and Global Approaches
CS 416 Artificial Intelligence
Topological Signatures For Fast Mobility Analysis
Deep Learning for the Soft Cutoff Problem
Attention for translation
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Distributed Reinforcement Learning for Multi-Robot Decentralized Collective Construction Gyu-Young Hwang
Presentation transcript:

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning Wenhan Xiong, Thien Hoang, William Wang Department of Computer Science UC Santa Barbara

Knowledge Graphs English serviceLanguage Caesars Entertain… personLanguages Actor countrySpokenIn-1 personLanguages Neal McDonough profession serviceLocation-1 nationality-1 Tom Hanks United States castActor As for Knowledge graphs, you can think of it as a very large graph data structure 1:10 countryOfOrigin awardWorkWinner Band of Brothers

Reasoning on Knowledge Graph Query Node: “Band of Brothers” Query Relation: “tvProgramLanguage” Query: tvProgramLanguage(Band of Brothers, ?) English serviceLanguage Caesars Entertain… personLanguages Actor countrySpokenIn-1 personLanguages Neal McDonough profession serviceLocation-1 nationality-1 Tom Hanks United States castActor 3:10 countryOfOrigin awardWorkWinner Band of Brothers

Related Work Path-based methods Path-Ranking Algorithm (PRA), Lao et al. 2011 Subgraph Feature Extraction, Gardner et al, 2015 RNN + PRA, Neelakantan et al, 2015 Chains of Reasoning, Das et al, 2017 PRA: 1. finds potential path types between entity pairs 2. Computes random walk probabilities SFE: replace random walk probabilities with binary features

Related Work Embedding-based method TransE, Bordes et al, 2013 Neural Tensor Network, Socher et al, 2013 TransR/CTransR, Lin et al, 2015 Complex Embeddings, Trouillon et al, 2016

Our Approach Learning the paths instead of using random walks Model the path finding as a MDP Train a RL agent to find paths Represent the KG with pretrained KG embeddings Use the learned paths as horn clauses

Reinforcement Learning Agent 𝑟𝑒𝑤𝑎𝑟𝑑: 𝑟 𝑡 𝑠𝑡𝑎𝑡𝑒: 𝑠 𝑡 𝑎𝑐𝑡𝑖𝑜𝑛: 𝑎𝑡 𝑠 𝑡+1 Environment 𝑟 𝑡+1 Agent Environment Multi-layer neural nets ѱ(st ) KG modeled as a MDP Richard Sutton, Reinforcement Learning An Introduction

Components of MDP Markov decision process <𝑆,𝐴,𝑃,𝑅> 𝑆: continuous states represented with KG embeddings 𝐴: action space 𝑃 𝑆 𝑡+1 = 𝑠 ′ 𝑆 𝑡 =𝑠, 𝐴 𝑡 =𝑎 : transition probability 𝑅 𝑠,𝑎 : reward received for each taken action With pretrained KG embeddings 𝑠 𝑡 = 𝑒 𝑡 ⊕( 𝑒 𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑒 𝑡 ) 𝐴= 𝑟 1 , 𝑟 2 ,…, 𝑟 𝑛 , all relations in the KG

Framework ReLU ReLU Softmax State 𝛑(a|s) English serviceLanguage Next State ReLU personLanguages personLanguages Caesars Entertain… Actor Reward countrySpokenIn-1 Neal McDonough ReLU serviceLocation-1 profession Reason Step nationality-1 United States Tom Hanks Softmax castActor 𝛑(a|s) countryOfOrigin Band of Brothers awardWorkWinner

Reward Functions Global Accuracy Path Efficiency Path Diversity

Training with Policy Gradient Monte-Carlo Policy Gradient (REINFORCE, William, 1992) Expected cumulative rewards

Challenge Typical RL problems Atari games (Mnih et al., 2015): 4~18 valid actions AlphaGo (Silver et al. 2016): ~250 valid actions Knowledge graph reasoning: >= 400 actions Issue: large action (search) space -> poor convergence properties

Supervised (Imitation) Policy Learning Use randomized BFS to retrieve a few paths Do imitation learning using the retrieved paths All the paths are assigned with +1 reward

Datasets and Preprocessing # of Entities # of Relations # of Triples # of Tasks FB15k-2371 14,505 237 310,116 20 NELL-995 75,492 200 154,213 12 FB15k-237: Constructed from FB15k (Bordes et al., 2013), redundant relations removed NELL-995: Constructed from the 995th iteration of NELL system (Carlson et al., 2010b) Dataset processing Remove useless relations: haswikipediaurl, generalizations, etc Add inverse relation links to the knowledge graph Remove the triples with task relations create a new dataset, say more about the new NELL subset 1 . Toutanova et al. Representing text for joint embedding of text and knowledge bases

Effect of Supervised Policy Learning x-axis: number of training episodes y-axis: success rate calculated on a held-out test set -> Re-train the agent using reward functions

Inference Using Learned Paths Path as logical formula (horn-clauses) FilmCountry: actorFilm-1 -> personNationality PersonNationality: placeOfBirth -> locationContains-1 etc … Bi-directional path-constrained search Check whether the formulas hold for entity pairs … … Uni-directional search bi-directional search

Link Prediction Result Tasks PRA Ours TransE TransR worksFor 0.681 0.711 0.677 0.692 atheletPlaysForTeam 0.987 0.955 0.896 0.784 athletePlaysInLeague 0.841 0.960 0.773 0.912 athleteHomeStadium 0.859 0.890 0.718 0.722 teamPlaysSports 0.791 0.738 0.761 0.814 orgHirePerson 0.599 0.742 0.719 0.737 personLeadsOrg 0.700 0.795 0.751 0.772 … Overall 0.675 0.796 0.789 do not say much about specific reasoning task, talk more general very natural way to combine Mean average precision on NELL-995

Path length distributions Qualitative Analysis Path length distributions

Qualitative Analysis Example learned paths personNationality: placeOfBirth -> locationContains-1 placeOfBirth -> locationContains personNationality: peoplePlaceLived -> locationContains-1 peopleMariage -> locationOfCeremony -> locationContains-1 tvCountryOfOrigin -> countryOfficialLanguage tvProgramLanguage: tvCountryOfOrigin -> filmReleaseRegion-1 -> filmLanguage tvCastActor -> personLanguage

Conclusion and Future Work Conclusions Propose a RL framework for KG reasoning Controllable paths finder (walker) in KG Combine path-based and embedding-based methods Future Directions Adversarial learning to give rewards Joint reasoning with KG triples and text

Thanks! Code: https://github.com/xwhan/DeepPath invite people to see the code and dataset Code: https://github.com/xwhan/DeepPath Dataset: http://cs.ucsb.edu/~xwhan/datasets/NELL-995.zip

Bi-directional search Relation link 𝑟 𝑡 belongs to the reasoning path 𝑝 Think about supernode 𝒂 which has neighbors { 𝑏 1 , 𝑏 2 ,…, 𝑏 𝑁 } linked by relation 𝑟 𝑡 But only the path with 𝑏 𝑚 can reach the tail entity If we go from tail to head, we will encounter 𝑏 𝑚 first, by relation link 𝑟 𝑡 −1 , 𝒂 is the only neighbor

Feature Engineering in PRA vs DRL Reward learning is different from feature engineering DRL enable learning on the symbolic space We can try to learn better reward functions in the future, which can further boost the performance