Presenter: Yu Chen Computer Science Department

Slides:



Advertisements
Similar presentations
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Advertisements

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Distributed Representations of Sentences and Documents
Architectural Design.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Querying Structured Text in an XML Database By Xuemei Luo.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
COP Introduction to Database Structures
Naifan Zhuang, Jun Ye, Kien A. Hua
A systematic literature review of empirical evidence on computer games and serious games Wakana Ishimaru Leo Liang.
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Automatically Labeled Data Generation for Large Scale Event Extraction
Convolutional Sequence to Sequence Learning
Unsupervised Learning of Video Representations using LSTMs
Deep Learning for Dual-Energy X-Ray
Learning to Compare Image Patches via Convolutional Neural Networks
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
End-To-End Memory Networks
The Relationship between Deep Learning and Brain Function
A Brief Introduction to Distant Supervision
Database Management System
Syntax-based Deep Matching of Short Texts
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Adversarial Learning for Neural Dialogue Generation
An Empirical Study of Learning to Rank for Entity Search
Chinese Academy of Sciences, Beijing, China
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Compositional Human Pose Regression
Program comprehension during Software maintenance and evolution Armeliese von Mayrhauser , A. Marie Vans Colorado State University Summary By- Fardina.
Title: Validating a theoretical framework for describing computer programming processes 29 November 2017.
Latent Semantic Indexing
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Hybrid computing using a neural network with dynamic external memory
Distributed Representation of Words, Sentences and Paragraphs
CSc4730/6730 Scientific Visualization
Weakly Learning to Match Experts in Online Community
Table Cell Search for Question Answering Huan Sun
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Deep Cross-media Knowledge Transfer
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Unsupervised Pretraining for Semantic Parsing
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Natural Language to SQL(nl2sql)
Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.
Ying Dai Faculty of software and information science,
Word embeddings (continued)
Attention for translation
David Kauchak CS158 – Spring 2019
Human-object interaction
Approximate Graph Mining with Label Costs
Topic: Semantic Text Mining
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Function-oriented Design
Sequence-to-Sequence Models
Learning to Detect Human-Object Interactions with Knowledge
Visual Grounding.
Presentation transcript:

Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases Presenter: Yu Chen Computer Science Department Rensselaer Polytechnic Institute, Troy, NY 12180 Joint work with Lingfei Wu & Mohammed J. Zaki 15+3’

Task Definition Given questions in natural language (NL), the goal of KBQA is to automatically find answers from the underlying KB. http://wjter.com/Research%20Papers/January%202018/PDF/Survey%20on%20Community%20Question%20Answering%20Systems.pdf

Semantic Parsing-based Approaches for KBQA Converting NL questions into intermediate logic forms, which can be further executed against the underlying KB.

Information Retrieval-based Approaches for KBQA Directly retrieving answers from the KB by mapping questions and KB into the same embedding space and ranking answers based on the matching scores between a question and the candidate answers. Dong, Li, et al. "Question answering over freebase with multi-column convolutional neural networks." Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015. Dong, Li, et al., 2015

SP-based vs. IR-based Approaches Most SP-based approaches require annotated logical forms as supervision, or hand-crafted rules/features to prune the program search space. Most IR-based approaches require no or very few hand-crafted rules/features. End-to-end trainable. (1’)

Typical IR-based Approaches for KBQA

Fetching answers from a KB subgraph Question: who is the secretary of Ohio in 2011? Topic entity linking Nodes within a k-hop KB subgraph are considered as candidate answers. Figure 1. A 2-hop KB subgraph from FreeBase.

Encoding KB Subgraphs For each candidate answer from the KB subgraph, we encode three types of information: Answer type: entity type of the candidate answer. Answer path: a sequence of relations from the candidate answer to the topic entity. Answer context: surrounding entities of the candidate answer. (2’) Answer type: PERSON Answer path: [office_holder, governing_officials] Answer context: {2011-01-09, secretary of state}

Bidirectional Attentive Memory Network

Motivation Most existing IR-based methods encode questions and KB independently without considering the inter-relationships between them. When answering NL questions over a KB, different question components and KB aspects play different roles. Query: Who did France surrender to in ww2? KB aspects: entity type, relation path, context. (1’) distill the information that is the most relevant to answering the questions on both sides of the question and KB.

Motivation (cont’d) We propose to directly model the two-way flow of interactions between the questions and the KB via a novel Bidirectional Attentive Memory Network (BAMnet). Our method generally models the bidirectional interactions between one body of text and a KB. (1’) distill the information that is the most relevant to answering the questions on both sides of the question and KB.

Overall Architecture of the BAMnet Model (1’) Input module Encoding the input question. Answer module Ranking candidate answers based on their matching scores with the question. Reasoning module Updating question and candidate answer representations. Memory module Encoding candidate answers. Employing a key-value memory to store candidate answers.

Key-Value Memory Network to Store Candidate Answers Memory networks are neural networks equipped with a long-term memory component that can be read from and written to. Unlike a basic memory network, in a key-value memory network, the addressing stage is based on the key memory while the reading stage uses the value memory. We use a KV memory network to store candidate answers . (1’)

Two-layered Bidirectional Attention Network Primary attention network KB-aware attention module Focusing on important question parts. Importance module Focusing on important KB aspects. Secondary attention network Enhancing KB and question representations via two-way attention. (1’)

KB-aware Attention Module Goal Focusing on important question components. Motivation Each word in a question serves a specific purpose. The importance of question components depends on the KB! Not all KB triples are equally helpful for detecting important question components. Only those relevant to the question are! Our solution Computing a question summary q using self attention. Computing a summary for each of the KB aspect, i.e., answer type mt, path mp and context mc . Computing the question-to-KB attention (i.e., a |Q| by 3 matrix) between each question word and KB aspect. 4) Applying max pooling over the KB aspect dimension of the question-to-KB attention to get an attention vector over the question words. (1’) The idea behind it is that each word in a question serves a specific purpose (i.e., indicating answer type, path or context), and max pooling can help find out that purpose. It is a KB-aware question attention vector since it indicates the importance of qi in light of the KB.

Importance Module Goal Motivation Our solution Focusing on important KB aspects. Motivation The importance of KB aspects is measured by their relevance to the question. Our solution Computing a |Q| × |A| × 3 attention tensor AQM which indicates the strength of connection between each pair of . 2) Taking the max of the question word dimension of AQM and normalize it to get a |A| × 3 attention matrix , which indicates the importance of each answer aspect for each candidate answer. 3) Computing question-aware memory representations . (1’)

Enhancing Module Goal Our solution Enhancing the question and KB representations by exploiting the two-way attention. Our solution Computing a question-KB attention matrix by applying max pooling over the answer aspect dimension of AQM, Computing the question-aware KB summary and incorporating it into the question representation Obtaining a KB-enhanced question vector . Similarly, we compute a question-enhanced KB representation which incorporates the relevant question information. (1’)

Training Loss function Sampling candidate answers We force positive candidates to have higher scores than negative candidates by using a triplet-based loss function. where and is a hinge loss. and denote the positive (i.e., correct) and negative (i.e., incorrect) answer sets, respectively. is a matching function which computes the dot product between question and candidate representations. Sampling candidate answers We extract the KB subgraph surrounding the gold topic entity. We randomly sample candidates from the KB subgraph. Intermediate modules such as the enhancing module generate “premature” representations of questions (e.g., q ̃) and candidate answers (e.g., Mk). Even though these intermediate representations are not optimal for answer prediction, we can still use them along with the final representations to jointly train the model, which we find helps the training probably by providing more supervision since we are directly forcing intermediate representations to be helpful for prediction.

Prediction We extract all the candidates from the KB subgraph surrounding the predicted topic entity (returned by a topic entity predictor). We compute the matching score (dot product) between every pair of (question, candidate). The candidates are then ranked by their scores. We apply certain threshold to only return a list of most likely answers. (2’)

Experiment Results WebQuestions Freebase 3,778 training and 2,032 test examples An example comprises Natural language utterance Answer provided by Mturk workers Freebase URL for the answer (topic entity) About 85% questions can be answered via a single Freebase predicate Questions can have multiple answers. Freebase Facts in RDF triples: subject-predicate-object 41M Entities, 19K Properties, 596M Assertions our method significantly outperforms previous IR-based methods while remaining competitive with hand-crafted SP-based methods. Gap of topic entity linking, If we assume gold-topic entities are given then BAMnet achieves an F1 of 0.557. Table 1. Results on WebQuestions test set. Bold: best in-category performance.

Ablation Study How do the different modules impact performance. Two-layered bidirectional attention network is important. Importance module (i.e., query-to-KB attention flow) KB-aware attention module (i.e., KB-to-query attention flow) Topic entity delexicalization is important. Note that gold topic entity is assumed to be known when we do this ablation study, because the error intro- duced by topic entity prediction might reduce the real performance impact of a module or strategy significant performance drops were observed after turning off some key attention modules, which confirms that the real power of our method comes from the idea of hierarchical two-way attention. when turning off the two-layered bidirectional attention network, the model performance drops from 0.557 to 0.534. Table 2. Ablation results on WebQuestions. Gold topic entity is assumed to be known.

Interpretability Analysis We divide the questions into three categories based on which kind of KB aspect is the most crucial for answering them. compared to the simplified version which is not equipped with bidirectional attention, our model is more capable of answering all the three types of questions. Table 3. Predicted answers of BAMnet w/ and w/o bidirectional attention on the WebQuestions test set.

Interpretability Analysis (cont) (2’) the attention network successfully detects the interactions between “who” and answer type, “surrender to” and answer path, and focuses more on those words when encoding the question. Figure 1. Attention heatmap generated by the reasoning module.

Conclusions and Future Work Most existing embedding-based methods for KBQA ignore the subtle inter-relationships between the question and the KB. We proposed to directly model the two-way flow of interactions between the questions and the KB. Our method significantly outperforms previous IR-based methods while remaining competitive with hand-crafted SP-based methods on a popular KBQA benchmark. Both ablation study and interpretability analysis verify the effectiveness of the idea of modeling mutual interactions. In the future, we would like to explore effective ways of modeling more complex types of constraints (e.g., ordinal, comparison and aggregation). (1’)

Thank you! Q&A Code: https://github.com/hugochan/BAMnet

Topic Entity Prediction Goal Given a question, the goal of topic entity prediction is to find the best topic entity from the candidate set returned by external topic entity linking tools. Motivation We know an entity better by looking at its KB subgraph. E.g., an entity “Apple” with type “ORGANIZATION” and surrounding relations “founder”, “CEO”, etc. Our solution Encoding the question into a vector. Encoding each candidate (i.e., entity name, entity type and surrounding relations) into a vector. Employing a memory module to store candidates. Applying the aforementioned generalization module to update the question representation. Training: We force positive candidates to have higher scores than negative candidates by using a triplet-based loss function. Testing: The candidate with the highest score is returned as the best topic entity.

Error Analysis We randomly sampled 100 questions on which our method performed poorly and categorized the errors. label issues of gold answers (33%) E.g., incomplete and erroneous labels, and also alternative correct answers. Constraints errors (11%) Temporal constraints account for most. Type errors (13%) Our method generates more answers than needed because of poorly utilizing answer type information. Lexical gap (5%) Other sources of errors (38%) E.g., question ambiguity, incomplete answers and other miscellaneous errors.