Traditional Question Answering System: an Overview 汤顺雷
Category NLIDB (Natural Language Interface over Database) QA over text Document-base: TREC, given a corpus Web-base: START(AILab MIT, 1993) Semantic ontology-base … Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2): 125-155.
NLI over Database Early: Current: … Domain-specific Pattern matching e.g. Baseball 1961, Lunar 1971 Current: … Androutsopoulos I, Ritchie G D, Thanisch P. Natural language interfaces to databases–an introduction[J]. Natural language engineering, 1995, 1(01): 29-81.
NLI over Database Early: example COUNTRY CAPITAL LANGUAGE France Paris French Italy Rome Italian … Early: example pattern1: ... "capital" ... <country> pattern2: ... "capital" ... "country" “What is the capital of Italy?” pattern1 Rome "List the capital of every country” pattern2 <Paris, France>, <Rome, Italy> Androutsopoulos I, Ritchie G D, Thanisch P. Natural language interfaces to databases–an introduction[J]. Natural language engineering, 1995, 1(01): 29-81.
NLI over Database Current: Intermediate representation Front end / back end E.g. PRECISE 2003 (http://www.cs.washington.edu/research/nli) Popescu A M, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases[C]//Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 2003: 149-157.
PRECISE UW, 2003 Ambiguity resolution Popescu A M, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases[C]//Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 2003: 149-157.
PRECISE UW, 2003 Question: What are the HP jobs on a Unix system? SQL: SELECT DISTINCT Description FROM JOB WHERE { Platform = ‘Unix’ AND Company = ‘HP’ } Popescu A M, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases[C]//Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 2003: 149-157.
QA over text Input: NL queries (Natural Language questions) Resource: plain text: Large, uncontrolled text / Web pages Weak knowledge: e.g. WordNet, gazetteers Problem: Semantic meaning of NL questions and text sentences Logical reasoning Errors, redundancies, ambiguity in text … Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2): 125-155.
QA over text Question type: Factoids: “How many people live in Israel?” List questions: “what countries speak Spanish?” Definition questions: “what is question answering system?” Complex interactive QA (ciQA): “What [familial ties] exist between [dinosaurs] and [birds]?” Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2): 125-155.
QA over text – START START: SynTactic Analysis using Reversible Transformations The world’s first Web-based QA system – 1993 Developed by AI Lab, MIT http://start.csail.mit.edu/ Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2): 125-155.
QA over text – START Step 1 Step 2 Answer Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2): 125-155.
QA over text – Architecture Query analysis: preprocess and light (or deep) semantic analysis NL queries Query object Retrieval engine: IR engine(documents)/search engine (over the Web) Query object Passages list (containing candidate answers) Answer generator: filtering, merging of candidate passages Passage list Answer Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2): 125-155.
QA over text – Architecture CMU – JAVELIN in TREC11, 2002 IBM – PIQUANT II in TREC13, 2004 Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322. (left) Ittycheriah A, Franz M, Zhu W J, et al. IBM's Statistical Question Answering System[C]//TREC. 2000. (right)
QA over text – Question analysis Goal: generate Query object (contains constraints on answer finding) Query object: Keywords extended keywords Question type hierarchy: Question type Expected answer type Methods: tokenize, POS tag, parser, Name-entity(NE) recognition, relation extraction … Example: LASSO, FALCON Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322.
LASSO – Question analysis SMU in TREC8, 1999 55% short answer, 64.5% long answer Keyword drop Moldovan D I, Harabagiu S M, Pasca M, et al. LASSO: A Tool for Surfing the Answer Net[C]//TREC. 1999, 8: 65-73.
LASSO – Question analysis “What is the largest city in Germany?” Question type: WHAT Question focus: “largest city” Answer type: LOCATION Keywords: … NL Question Wh-term Question type Disambiguate: who/whom, where, when, why Ambiguate: what, how, which, name(v.) Question focus Keywords Combine Answer type Moldovan D I, Harabagiu S M, Pasca M, et al. LASSO: A Tool for Surfing the Answer Net[C]//TREC. 1999, 8: 65-73.
LASSO – Question analysis “What is the largest city in Germany?” Question type: WHAT Question focus: “largest city” Answer type: LOCATION Keywords: … Modifier noun Keyword drop Name-entity Quotation Moldovan D I, Harabagiu S M, Pasca M, et al. LASSO: A Tool for Surfing the Answer Net[C]//TREC. 1999, 8: 65-73.
FALCON – Question analysis Question caching Cached answers Question reformulation Parse + NE recognize Answer type WordNet keywords yes no Question semantic form Best of TREC9, in 2000 58% of short answer 76% of long answer Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9: 479-488.
FALCON – Question analysis Question caching Cached answers Question reformulation Parse + NE recognize Answer type WordNet keywords yes no Question semantic form Question similarity similarity matrix transaction closure Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9: 479-488.
FALCON – Question analysis Question caching Cached answers Question reformulation Parse + NE recognize Answer type WordNet keywords yes no Question semantic form Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9: 479-488.
FALCON – Question analysis Question caching Cached answers Question reformulation Parse + NE recognize Answer type WordNet keywords yes no Question semantic form Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9: 479-488.
QA over text – Retrieval Engine Input: Query object Output: passage list (contain candidate answers) Example: JAVELIN Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322.
JAVELIN – Retrieval Engine Input: Query object (question type + answer type + keyword set) Relax algorithm: Passage POS tagging get word data type Indexing pair <passage word, data type> While not meet time && space constraints Retrieval keyword set // comparing word and data type between passage and question Add to candidate passage list Keyword set relax // add synonym keywords Return candidate passage list Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322.
FDUQA – Retrieval Engine “Name cities that have an Amtrak terminal.” Dealing with List questions Treat as a factoid question, get "New York" "New York" as a seed appearing at <A> in patterns Validate matched sentences “New York” “Preliminary plans by Amtrak that were released yesterday call for stops of its high-speed express service in Boston , Providence , R.I. , New Haven , Conn. , New York , Philadelphia , Baltimore and Washington .” Patterns: P1. (including|include|included) (<A>)+ and <A> P2. such as (<A>)+ and <A> P3. between <A> and <A> P4. (<A>)+ as well as <A> Type validate “Boston”, “Philadelphia”, “Baltimore”, “Washington” Wu L, Huang X. Lan You, Zhushuo Zhang, Xin Li, and Yaqian Zhou. 2004. FDUQA on TREC2004 QA Track[C]//Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004).
QA over text – Answer Generator Input: Query object (by question analysis) + passage list Generator answer by Merge: candidates that contain different part of answers Filter: drop redundancies and wrong answers Rank: rank candidates Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2): 125-155.
PIQUANT II – Answer Generator Input passage list from: Statistic method based candidates: Wide coverage Redundancy robust Template method based candidates: Fail sometime Precise Strategy: take top 5 from statistic method candidates, merge into template candidates Chu-Carroll J, Czuba K, Prager J M, et al. IBM's PIQUANT II in TREC 2004[C]//TREC. 2004.
JAVELIN – Answer Generator Input: candidate documents containing candidate passages Each doc same topic Strategy: merge top 1 rank of each doc Document 1 Passage 1 Passage 2 Passage n ⋮ Document 2 Document 3 ⋯ Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322.
Reference: Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2): 125-155. Androutsopoulos I, Ritchie G D, Thanisch P. Natural language interfaces to databases–an introduction[J]. Natural language engineering, 1995, 1(01): 29-81. (NLIDB) Popescu A M, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases[C]//Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 2003: 149-157. (PRECISE)
Reference: Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question- answering system at trec 2002[J]. Computer Science Department, 2002: 322. (CMU JAVELIN) Ittycheriah A, Franz M, Zhu W J, et al. IBM's Statistical Question Answering System[C]//TREC. 2000. (IBM PIQUANT) Chu-Carroll J, Czuba K, Prager J M, et al. IBM's PIQUANT II in TREC 2004[C]//TREC. 2004. (IBM PIQUANT II) Moldovan D I, Harabagiu S M, Pasca M, et al. LASSO: A Tool for Surfing the Answer Net[C]//TREC. 1999, 8: 65-73. (SMU LASSO)
Reference: Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9: 479-488. (FALCON) Wu L, Huang X. Lan You, Zhushuo Zhang, Xin Li, and Yaqian Zhou. 2004. FDUQA on TREC2004 QA Track[C]//Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004). (FDU QA)
Thank you