Presentation is loading. Please wait.

Presentation is loading. Please wait.

Traditional Question Answering System: an Overview

Similar presentations


Presentation on theme: "Traditional Question Answering System: an Overview"— Presentation transcript:

1 Traditional Question Answering System: an Overview
汤顺雷

2 Category NLIDB (Natural Language Interface over Database) QA over text
Document-base: TREC, given a corpus Web-base: START(AILab MIT, 1993) Semantic ontology-base Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2):

3 NLI over Database Early: Current: … Domain-specific Pattern matching
e.g. Baseball 1961, Lunar 1971 Current: … Androutsopoulos I, Ritchie G D, Thanisch P. Natural language interfaces to databases–an introduction[J]. Natural language engineering, 1995, 1(01):

4 NLI over Database Early: example COUNTRY CAPITAL LANGUAGE France Paris
French Italy Rome Italian Early: example pattern1: ... "capital" ... <country> pattern2: ... "capital" ... "country" “What is the capital of Italy?”  pattern1  Rome "List the capital of every country”  pattern2  <Paris, France>, <Rome, Italy> Androutsopoulos I, Ritchie G D, Thanisch P. Natural language interfaces to databases–an introduction[J]. Natural language engineering, 1995, 1(01):

5 NLI over Database Current: Intermediate representation
Front end / back end E.g. PRECISE 2003 ( Popescu A M, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases[C]//Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 2003:

6 PRECISE UW, 2003 Ambiguity resolution
Popescu A M, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases[C]//Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 2003:

7 PRECISE UW, 2003 Question: What are the HP jobs on a Unix system? SQL:
SELECT DISTINCT Description FROM JOB WHERE { Platform = ‘Unix’ AND Company = ‘HP’ } Popescu A M, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases[C]//Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 2003:

8 QA over text Input: NL queries (Natural Language questions) Resource:
plain text: Large, uncontrolled text / Web pages Weak knowledge: e.g. WordNet, gazetteers Problem: Semantic meaning of NL questions and text sentences Logical reasoning Errors, redundancies, ambiguity in text Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2):

9 QA over text Question type:
Factoids: “How many people live in Israel?” List questions: “what countries speak Spanish?” Definition questions: “what is question answering system?” Complex interactive QA (ciQA): “What [familial ties] exist between [dinosaurs] and [birds]?” Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2):

10 QA over text – START START: SynTactic Analysis using Reversible Transformations The world’s first Web-based QA system – 1993 Developed by AI Lab, MIT Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2):

11 QA over text – START Step 1  Step 2  Answer 
Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2):

12 QA over text – Architecture
Query analysis: preprocess and light (or deep) semantic analysis NL queries  Query object Retrieval engine: IR engine(documents)/search engine (over the Web) Query object  Passages list (containing candidate answers) Answer generator: filtering, merging of candidate passages Passage list  Answer Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2):

13 QA over text – Architecture
CMU – JAVELIN in TREC11, 2002 IBM – PIQUANT II in TREC13, 2004 Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322. (left) Ittycheriah A, Franz M, Zhu W J, et al. IBM's Statistical Question Answering System[C]//TREC (right)

14 QA over text – Question analysis
Goal: generate Query object (contains constraints on answer finding) Query object: Keywords  extended keywords Question type hierarchy: Question type Expected answer type Methods: tokenize, POS tag, parser, Name-entity(NE) recognition, relation extraction … Example: LASSO, FALCON Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322.

15 LASSO – Question analysis
SMU in TREC8, 1999 55% short answer, 64.5% long answer Keyword drop Moldovan D I, Harabagiu S M, Pasca M, et al. LASSO: A Tool for Surfing the Answer Net[C]//TREC. 1999, 8:

16 LASSO – Question analysis
“What is the largest city in Germany?” Question type: WHAT Question focus: “largest city” Answer type: LOCATION Keywords: … NL Question Wh-term Question type Disambiguate: who/whom, where, when, why Ambiguate: what, how, which, name(v.) Question focus Keywords Combine Answer type Moldovan D I, Harabagiu S M, Pasca M, et al. LASSO: A Tool for Surfing the Answer Net[C]//TREC. 1999, 8:

17 LASSO – Question analysis
“What is the largest city in Germany?” Question type: WHAT Question focus: “largest city” Answer type: LOCATION Keywords: … Modifier noun  Keyword drop Name-entity  Quotation  Moldovan D I, Harabagiu S M, Pasca M, et al. LASSO: A Tool for Surfing the Answer Net[C]//TREC. 1999, 8:

18 FALCON – Question analysis
Question caching Cached answers Question reformulation Parse + NE recognize Answer type WordNet keywords yes no Question semantic form Best of TREC9, in 2000 58% of short answer 76% of long answer Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9:

19 FALCON – Question analysis
Question caching Cached answers Question reformulation Parse + NE recognize Answer type WordNet keywords yes no Question semantic form Question similarity similarity matrix transaction closure Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9:

20 FALCON – Question analysis
Question caching Cached answers Question reformulation Parse + NE recognize Answer type WordNet keywords yes no Question semantic form Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9:

21 FALCON – Question analysis
Question caching Cached answers Question reformulation Parse + NE recognize Answer type WordNet keywords yes no Question semantic form Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9:

22 QA over text – Retrieval Engine
Input: Query object Output: passage list (contain candidate answers) Example: JAVELIN Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322.

23 JAVELIN – Retrieval Engine
Input: Query object (question type + answer type + keyword set) Relax algorithm: Passage POS tagging  get word data type Indexing pair <passage word, data type> While not meet time && space constraints Retrieval keyword set // comparing word and data type between passage and question Add to candidate passage list Keyword set relax // add synonym keywords Return candidate passage list Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322.

24 FDUQA – Retrieval Engine
“Name cities that have an Amtrak terminal.” Dealing with List questions Treat as a factoid question, get "New York" "New York" as a seed appearing at <A> in patterns Validate matched sentences “New York” “Preliminary plans by Amtrak that were released yesterday call for stops of its high-speed express service in Boston , Providence , R.I. , New Haven , Conn. , New York , Philadelphia , Baltimore and Washington .” Patterns: P1. (including|include|included) (<A>)+ and <A> P2. such as (<A>)+ and <A> P3. between <A> and <A> P4. (<A>)+ as well as <A> Type validate “Boston”, “Philadelphia”, “Baltimore”, “Washington” Wu L, Huang X. Lan You, Zhushuo Zhang, Xin Li, and Yaqian Zhou FDUQA on TREC2004 QA Track[C]//Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004).

25 QA over text – Answer Generator
Input: Query object (by question analysis) + passage list Generator answer by Merge: candidates that contain different part of answers Filter: drop redundancies and wrong answers Rank: rank candidates Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2):

26 PIQUANT II – Answer Generator
Input passage list from: Statistic method based candidates: Wide coverage Redundancy robust Template method based candidates: Fail sometime Precise Strategy: take top 5 from statistic method candidates, merge into template candidates Chu-Carroll J, Czuba K, Prager J M, et al. IBM's PIQUANT II in TREC 2004[C]//TREC

27 JAVELIN – Answer Generator
Input: candidate documents containing candidate passages Each doc  same topic Strategy: merge top 1 rank of each doc Document 1 Passage 1 Passage 2 Passage n Document 2 Document 3 Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question-answering system at trec 2002[J]. Computer Science Department, 2002: 322.

28 Reference: Lopez V, Uren V, Sabou M, et al. Is question answering fit for the semantic web?: a survey[J]. Semantic Web, 2011, 2(2): Androutsopoulos I, Ritchie G D, Thanisch P. Natural language interfaces to databases–an introduction[J]. Natural language engineering, 1995, 1(01): (NLIDB) Popescu A M, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases[C]//Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 2003: (PRECISE)

29 Reference: Nyberg E, Mitamura T, Carbonell J G, et al. The javelin question- answering system at trec 2002[J]. Computer Science Department, 2002: 322. (CMU JAVELIN) Ittycheriah A, Franz M, Zhu W J, et al. IBM's Statistical Question Answering System[C]//TREC (IBM PIQUANT) Chu-Carroll J, Czuba K, Prager J M, et al. IBM's PIQUANT II in TREC 2004[C]//TREC (IBM PIQUANT II) Moldovan D I, Harabagiu S M, Pasca M, et al. LASSO: A Tool for Surfing the Answer Net[C]//TREC. 1999, 8: (SMU LASSO)

30 Reference: Harabagiu S M, Moldovan D I, Pasca M, et al. FALCON: Boosting Knowledge for Answer Engines[C]//TREC. 2000, 9: (FALCON) Wu L, Huang X. Lan You, Zhushuo Zhang, Xin Li, and Yaqian Zhou FDUQA on TREC2004 QA Track[C]//Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004). (FDU QA)

31 Thank you


Download ppt "Traditional Question Answering System: an Overview"

Similar presentations


Ads by Google