Reading Report: Open QA Systems 瞿裕忠 南京大学计算机系
Articles Anthony Fader, Luke Zettlemoyer, Oren Etzioni: Open question answering over curated and extracted knowledge bases. KDD 2014: 1156-1165 Anthony Fader, Luke S. Zettlemoyer, Oren Etzioni: Paraphrase-Driven Learning for Open Question Answering. ACL (1) 2013: 1608-1618
Paraphrase-Driven Learning for Open QA Introduction The problem of answering questions with the noisy knowledge bases that IE systems produce. An approach for learning to map questions to formal queries over a large, open-domain database of extracted facts Learn from a large, noisy, question paraphrase corpus, where question clusters have a common but unknown query WikiAnswers corpus: syntactic and lexical variations Learning to answer a broad class of factual questions.
Paraphrase-Driven Learning for Open QA authored(milne,winnie-the-pooh) treat(bloody-mary, hangover-symptoms)
Paraphrase-Driven Learning for Open QA Lexicon: mappings from NL to DB concepts The lexicon is used to generate a derivation y from an input question x to a database query z.
Paraphrase-Driven Learning for Open QA A derivation of x under L0 New Lexicon Entry Word alignment in (x, x’)
An initial seed lexicon L0 16 hand-written 2-argument question patterns
Compare the following systems Experiment Compare the following systems PARALEX: the lexical learning + parameter learning NoParam: PARALEX without the learned parameters. InitOnly: PARALEX using only the initial Test data 698 questions from WikiAnswers, Create 37 clusters . A gold standard set of approximately 48000 (x, a, l) triples. Database (noisy KB) 15 million REVERB extractions The full set of REVERB extractions: over six billion triples word-alignment for each paraphrase pair MGIZA++ implementation of IBM Model 4
Results
Question patterns that are used to derive a correct query Results Question patterns that are used to derive a correct query
Relation and entity synonyms learned from the WikiAnswers Results Relation and entity synonyms learned from the WikiAnswers
Error Analysis How long does it take to drive from Sacramento to Cancun? What do cats and dogs have in common? How do you make axes in minecraft? When were Bobby Orr’s children born?
Open QA Over Curated and Extracted KB
Open QA Over Curated and Extracted KB Techniques The operators and KB are noisy, so it is possible to construct many different sequences of operations (called derivations) An inference algorithm for deriving high confidence answers and a hidden-variable structured perceptron algorithm for learning a scoring function from data Automatically mining paraphrase operators from a question corpus and KB-query rewrite operators from multiple KBs.
Open QA Over Curated and Extracted KB Question Templates
Open QA Over Curated and Extracted KB Mine paraphrase operators from the WikiAnswers WikiAnswers consists of 23 million question-clusters
Open QA Over Curated and Extracted KB Mine query rewrite rules. Focus on handling the mismatch between relation words in the question and relation words in the KB
Experiments Three question sets Multiple knowl-edge sources: Open IE, Freebase, and Probase.
Experiments
Experiments
Experiments
Related papers Jonathan Berant, Andrew Chou, Roy Frostig, Percy Liang: Semantic Parsing on Freebase from Question-Answer Pairs. EMNLP 2013: 1533-1544 Jonathan Berant, Percy Liang: Semantic Parsing via Paraphrasing. ACL (1) 2014: 1415-1425 Yushi Wang, Jonathan Berant, Percy Liang. Building a Semantic Parser Overnight. Association for Computational Linguistics (ACL), 2015. Roy Bar-Haim, Ido Dagan, Jonathan Berant: Knowledge-Based Textual Inference via Parse-Tree Transformations. J. Artif. Intell. Res. (JAIR) 54: 1-57 (2015)
致谢 欢迎提问!