1 Quasi-Synchronous Grammars Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in entirety. the strictness of the isomorphism may vary across words or syntactic rules. Key idea: Unlike some synchronous grammars (e.g. SCFG, which is more strict and rigid), QG defines a monolingual grammar for the target tree, “inspired” by the source tree.
2 Quasi-Synchronous Grammars In other words, we model the generation of the target tree, influenced by the source tree (and their alignment) QA can be thought of as extremely free monolingual translation. The linkage between question and answer trees in QA is looser than in MT, which gives a bigger edge to QG.
3 Model Works on labeled dependency parse trees Learn the hidden structure (alignment between Q and A trees) by summing out ALL possible alignments One particular alignment tells us both the syntactic configurations and the word-to-word semantic correspondences An example… question answer parse tree question parse tree an alignment
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person is VB Q:A: $ root $ root subjwith nmod Our model makes local Markov assumptions to allow efficient computation via Dynamic Programming (details in paper) given its parent, a word is independent of all other words (including siblings).
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword is VB Q:A: $ root $ root subj root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB Q:A: $ root $ root subjobj root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT Q:A: $ root $ root subjobj det root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
11 6 types of syntactic configurations Parent-child
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Parent-child configuration
14 6 types of syntactic configurations Parent-child Same-word
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Same-word configuration
17 6 types of syntactic configurations Parent-child Same-word Grandparent-child
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Grandparent-child configuration
20 6 types of syntactic configurations Parent-child Same-word Grandparent-child Child-parent Siblings C-command (Same as [D. Smith & Eisner ’06])
22 Modeling alignment Base model
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod
25 Modeling alignment cont. Base model Log-linear model Lexical-semantic features from WordNet, Identity, hypernym, synonym, entailment, etc. Mixture model
26 Parameter estimation Things to be learnt Multinomial distributions in base model Log-linear model feature weights Mixture coefficient Training involves summing out hidden structures, thus non-convex. Solved using conditional Expectation- Maximization
27 Experiments Trec8-12 data set for training Trec13 questions for development and testing
28 Candidate answer generation For each question, we take all documents from the TREC doc pool, and extract sentences that contain at least one non-stop keywords from the question. For computational reasons (parsing speed, etc.), we only took answer sentences <= 40 words.
29 Dataset statistics Manually labeled 100 questions for training Total: 348 positive Q/A pairs 84 questions for dev Total: 1415 Q/A pairs 3.1+, 100 questions for testing Total: 1703 Q/A pairs 3.6+, Automatically labeled another 2193 questions to create a noisy training set, for evaluating model robustness
30 Experiments cont. Each question and answer sentence is tokenized, POS tagged (MX-POST), parsed (MSTParser) and labeled with named-entity tags (Identifinder)
31 Baseline systems (replications) [Cui et al. SIGIR ‘05] The algorithm behind one of the best performing systems in TREC evaluations. It uses a mutual information-inspired score computed over dependency trees and a single fixed alignment between them. [Punyakanok et al. NLE ’04] measures the similarity between Q and A by computing tree edit distance. Both baselines are high-performing, syntax-based, and most straight-forward to replicate We further enhanced the algorithms by augmenting them with WordNet.
32 Results Mean Average Precision Mean Reciprocal Rank of Top 1 Statistically significantly better than the 2 nd best score in each column 28.2% 23.9% 41.2% 30.3%
33 Summing vs. Max
34 Switching back Tree-edit CRFs