What is the Jeopardy Model? A Quasi-Synchronous Grammar for Question Answering Mengqiu Wang, Noah A. Smith and Teruko Mitamura Language Technology Institute Carnegie Mellon University
2 The task High-efficiency document retrieval High-precision answer ranking Who is the leader of France? 1. Bush later met with French president Jacques Chirac. 2. Henri Hadjenberg, who is the leader of France ’s Jewish community, … 3. … 1. Henri Hadjenberg, who is the leader of France ’s Jewish community, … 2. Bush later met with French president Jacques Chirac. (as of May ) 3. …
3 Challenges High-efficiency document retrieval High-precision answer ranking Who is the leader of France ? 1. Bush later met with French president Jacques Chirac. 2. Henri Hadjenberg, who is the leader of France ’s Jewish community, … 3. …
4 Semantic Tranformations Q:“Who is the leader of France?” A: Bush later met with French president Jacques Chirac.
5 Syntactic Transformations WholeadertheFranceofis? BushmetFrenchwithpresidentJacquesChirac mod
6 Syntactic Variations WholeadertheFranceofis? HenriHadjenberb,wholeaderistheofFrance’sJewishcommunity mod
7 Two key phenomena in QA Semantic transformation leader president Syntactic transformation leader of France French president Q A
8 Existing work in QA Semantics Use WordNet as thesaurus for expansion Syntax Use dependency parse trees, but merely transform the feature space into dependency parse feature space. No fundamental changes in the algorithms (edit-distance, classifier, similarity measure).
9 Where else have we seen these transformations? Machine Translation (especially in syntax-based MT) Paraphrasing Sentence compression Textual entailment F E
10 Noisy-channel Machine Translation Question Answering S E Q A Language modelTranslation model retrieval model Jeopardy model
11 From wikipedia.org: Jeopardy! is a popular international television quiz game show ( #2 of the 50 Greatest Game Show of All Times ). 3 contestants select clues in the form of an answer, to which they must supply correct responses in the form of a question. The concept of "questioning answers" is original to Jeopardy!. What is Jeopardy! ?
12 Jeopardy Model We make use of a formalism called quasi-synchronous grammar [ D. Smith & Eisner ’06 ], originally developed for MT
13 Quasi-Synchronous Grammars Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in entirety. the strictness of the isomorphism may vary across words or syntactic rules. Key idea: Unlike some synchronous grammars (e.g. SCFG, which is more strict and rigid), QG defines a monolingual grammar for the target tree, “inspired” by the source tree.
14 Quasi-Synchronous Grammars In other words, we model the generation of the target tree, influenced by the source tree (and their alignment) QA can be thought of as extremely free translation within the same language. The linkage between question and answer trees in QA is looser than in MT, which gives a bigger edge to QG.
15 Jeopardy Model Works on labeled dependency parse trees Learn the hidden structure (alignment between Q and A trees) by summing out ALL possible alignments One particular alignment tells us both the syntactic configurations and the word-to-word semantic correspondences An example… question answer parse tree question parse tree an alignment
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person is VB Q:A: $ root $ root subjwith nmod Our model makes local Markov assumptions to allow efficient computation via Dynamic Programming (details in paper) given its parent, a word is independent of all other words (including siblings).
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword is VB Q:A: $ root $ root subj root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB Q:A: $ root $ root subjobj root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT Q:A: $ root $ root subjobj det root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
23 6 types of syntactic configurations Parent-child
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Parent-child configuration
26 6 types of syntactic configurations Parent-child Same-word
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Same-word configuration
29 6 types of syntactic configurations Parent-child Same-word Grandparent-child
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Grandparent-child configuration
32 6 types of syntactic configurations Parent-child Same-word Grandparent-child Child-parent Siblings C-command (Same as [D. Smith & Eisner ’06])
34 Modeling alignment Base model
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
37 Modeling alignment cont. Base model Log-linear model Lexical-semantic features from WordNet, Identity, hypernym, synonym, entailment, etc. Mixture model
38 Parameter estimation Things to be learnt Multinomial distributions in base model Log-linear model feature weights Mixture coefficient Training involves summing out hidden structures, thus non-convex. Solved using conditional Expectation- Maximization
39 Experiments Trec8-12 data set for training Trec13 questions for development and testing
40 Candidate answer generation For each question, we take all documents from the TREC doc pool, and extract sentences that contain at least one non-stop keywords from the question. For computational reasons (parsing speed, etc.), we only took answer sentences <= 40 words.
41 Dataset statistics Manually labeled 100 questions for training Total: 348 positive Q/A pairs 84 questions for dev Total: 1415 Q/A pairs 3.1+, 100 questions for testing Total: 1703 Q/A pairs 3.6+, Automatically labeled another 2193 questions to create a noisy training set, for evaluating model robustness
42 Experiments cont. Each question and answer sentence is tokenized, POS tagged (MX-POST), parsed (MSTParser) and labeled with named-entity tags (Identifinder)
43 Baseline systems (replications) [Cui et al. SIGIR ‘05] The algorithm behind one of the best performing systems in TREC evaluations. It uses a mutual information-inspired score computed over dependency trees and a single fixed alignment between them. [Punyakanok et al. NLE ’04] measures the similarity between Q and A by computing tree edit distance. Both baselines are high-performing, syntax-based, and most straight-forward to replicate We further enhanced the algorithms by augmenting them with WordNet.
44 Results Mean Average Precision Mean Reciprocal Rank of Top 1 Statistically significantly better than the 2 nd best score in each column 28.2% 23.9% 41.2% 30.3%
45 Summing vs. Max
46 Conclusion We developed a probabilistic model for QA based on quasi-synchronous grammar Experimental results showed that our model is more accurate and robust than state-of- the-art syntax-based QA models The mixture model is shown to be powerful. The log-linear model allows us to use arbitrary features. Provides a general framework for many other NLP applications (compression, textual entailment, paraphrasing, etc.)
47 Future Work Higher-order Markovization, both horizontally and vertically, allows us to look at more context, at the expense of higher computational cost. More features from external resources, e.g. paraphrasing database Extending it for Cross-lingual QA Avoid the paradigm of translation as pre- of post-processing We can naturally fit in a lexical or phrase translation probability table into our model to model the translation inherently Taking into account parsing uncertainty
48 Thank you! Questions?