Presentation is loading. Please wait.

Presentation is loading. Please wait.

Selectively using linguistic resources in the QA Raffaella Bernardi Gilad Mishne Valentin Jijkoun Maarten de Rijke Projects 220-80-001, 612.13.001, 365-20-005,

Similar presentations


Presentation on theme: "Selectively using linguistic resources in the QA Raffaella Bernardi Gilad Mishne Valentin Jijkoun Maarten de Rijke Projects 220-80-001, 612.13.001, 365-20-005,"— Presentation transcript:

1 Selectively using linguistic resources in the QA Raffaella Bernardi Gilad Mishne Valentin Jijkoun Maarten de Rijke Projects 220-80-001, 612.13.001, 365-20-005, 612.069.006, 612.000.106, 612.000.207,612.066.302 pipline pipeline

2 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Outline Quartz: a multistream QA system Where’s linguistics here? Turning it on and off Streams: redundancy vs. linguistic knowlege Conclusions

3 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Generic QA system question analysis extracting candidate answers answer selection questionanswer collectionweb

4 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Quartz-e: a multistream approach question analysis answer selection questionanswerTable LookupPattern MatchNgram miningTequesta KBscollection web collection web

5 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Why a multistream system? Different approaches to QA have proved successful Using multiple sources of information can improve both precision and recall Combining often helps (known from IR)

6 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Quartz-e: question analysis Classify a question with respect to the information need and/or the expected answer type  What does ACLU stand for? (T-1959) expand-abbreviation  What continent is the world’s largest dessert on? (T-2023) location, continent

7 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Quartz-e: question analysis Surface text patterns ... [Ww](hich|hat) date... date PoS patterns + WordNet  What famous model was married to Billy Joel? WH JJ NN VBD... model IS-A person person-ident  What fruit’s stone does Laetrile come from? person-ident

8 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Quartz-e: list questions Answer is a list of entities  What Chinese provinces have a McDonald’s restaurant? (T-2207) PoS patterns + WordNet + wordforms to convert to who-, which- or what-question  What Chinese province has a McDonald’s restaurant? This helps to keep the system modular

9 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Quartz-e question analysis answer selection questionanswerTable LookupPattern MatchNgram miningTequesta KBscollection web collection web

10 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Off-line information extraction Preprocessing the collection to build semi- structured databases  roles, leaders, geography,dates, capitals, acronyms, inhabitants, languages,...  Where is Davil’s Tower? (T-1432) Davil’s Tower in northeastern Wyoming became the first national monument... located(Davil’s Tower, in northeastern Wyoming)

11 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Extracting role information Employing common IE techniques:  surface patterns  NE tags  PoS patterns  WordNet (professions & occupations) Phenomena:  modifiers, appositions, relative clauses  copular constructions

12 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Extraction: filtering with WordNet Using WordNet hyponyms  first, extract all matches  second, filter out semantically irrelevant First tried in (Fleischman et al, 2003)  sophisticated filtering using ML methods  improvement over another state-of-the-art system But does filtering improve a specific QA system?  pragmatic approach of (Katz and Lin, 2003)

13 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Extraction: does WordNet help? Two runs of the stream on 95 role questions WordNet filtering # facts in table Total answers Correct answers Stream precision yes396,5584116 0.39 no1,614,3094917 0.35 (+8)(-0.04)(+1) Q: Which baseball star stole 130 bases in 1982? (T-1619) A:...Henderson, who stole a record 130 bases in 1982...

14 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium WordNet filtering: analysis Filtering does not make dramatic difference  noise in the table is typically not asked for (no definitions)  statistical and content-based answer selection in the end  other streams are not good enough? Precision vs. Recall vs. Confidence  Quartz-e favours recall  Real users would probably prefer honesty (confidence)  TREC evaluation: no difference between wrong and NIL

15 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Quartz-e: pattern matching Look for declarative reformulations of the question  When was the telegraph invented? (T-1400) ...the telegraph invented in answer... Use a set of rewriting rules, conditioned on the output of the question classifier

16 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Rewriting: using linguistic features Only information from the question classifier  What year did Alaska become a state? (T-1419)  Alaska become a state (in|on|) answer PoS patterns and wordforms dictionary  Alaska becomes a state (in|on|) answer  Alaska became a state (in|on|) answer Use PoS, wfTotal answersCorrect answersIncorrect answers no30426 yes391326

17 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Rewriting: analysis Improves recall, does not hurt precision Pattern Match is a low recall/high precision stream

18 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Quartz-e question analysis answer selection questionanswerTable LookupPattern MatchNgram miningTequesta KBscollection web collection web

19 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Quartz-e: comparing streams QuestionsCorrect answers type# q’s Web Ngrams Web patterns Table Lookup Tequesta date8221 (26%)15 (18%)20 (24%)22 (27%) location10121 (21%)14 (14%)7 (7%)19 (19%) pers-ident5419 (35%)5 (9%) 7 (13%) agent3510 (29%)2 (6%)4 (12%)3 (9%) object242 (8%)1 (4%)0 (0%) thing-ident599 (15%)2 (3%)0 (0%)

20 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Quartz-e: comparing streams All streams contribute to the performance  removing any stream hurts Different performance on different q-types  linguistic information seem to help for questions with clear structure and answer type (location, person, date)  pure statistics is better for ``vague’’ questions What is the Stanley Cup made of?

21 Selectively Using Linguistic Resources throughout the Question Answering Pipeline 2nd CoLogNET-ElsNET Symposium Conclusions Linguistically informed methods help in some parts of the QA pipeline but may hurt in others  this depends on the system architecture, performance of other components, robustness of the linguistic methods used End-to-end performance evaluation is essential Statistical and language-aware methods are complimentary rather than competing


Download ppt "Selectively using linguistic resources in the QA Raffaella Bernardi Gilad Mishne Valentin Jijkoun Maarten de Rijke Projects 220-80-001, 612.13.001, 365-20-005,"

Similar presentations


Ads by Google