Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.

Similar presentations


Presentation on theme: "1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University."— Presentation transcript:

1 1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University of Pittsburgh

2 2/18 Abstract Introduction: Multi-Perspective Question Answering (MPQA) OpQA Corpus Analysis of characteristics of opinion answers Answer length Partial answers Syntactic constituent of the answer Experiments on Filters: Subjectivity filters Opinion source filters Conclusion

3 3/18 Introduction of MPQA Fact-based QA: “ When did McDonald ’ s open its first restaurant? ” A lot of research has been done already Multi-Perspective QA (MPQA): “ How do the Chinese regard the human rights record of the United States? ” Relatively little research has been done here (Will successful approaches in fact-based QA work well for MPQA?)

4 4/18 Introduction of OpQA Corpus 98 documents: June 2001~May 2002 Phrase-level opinion info 4 general and controversial topics: President Bush ’ s alternative to the Kyoto protocol (kyoto) The US annual human rights report (humanrights) The 2002 coup d ’ etat in Venezuela (venezuela) The 2002 elections in Zimbabwe and Mugabe ’ s reelection (mugabe) 19~33 docs per topic Questions: 6~8 Qs per topic evenly  30 Qs totally Questions

5 5/18 Introduction of OpQA Corpus Answers: every text segment that contributes to an answer to any of the 30 questions Mark the minimum answer spans “ a Tokyo organization representing about 150 Japanese groups ”  “ a Tokyo organization ” Partial answer Lack the specificity needed to constitute a full answer Q: “ When was the Kyoto protocol ratified? ” A: “ before May 2004. ” (when a specific date is known) Need to be combined with at least one additional answer segment to fully answer the question Q: “ Are the Japanese unanimous in their opposition of Bush ’ s position on the Kyoto protocol? “ Only partially by a segment expressing a single opinion

6 6/18 Characteristics of opinion answers Use the OpQA corpus to analyze and compare the characteristics of fact vs. opinion Qs. Traditional QA architectures: IR module Linguistic filters Semantic filters : when  date/time; who  person/organization Syntactic filters : who  noun phrase

7 7/18 Answer length Approximately twice as long as those of fact questions  likely to span more than a single syntactic constituent  rendering the syntactic filters and the semantic filters less effective

8 8/18 Partial answers Much more likely to represent partial answers rather than complete answers Answer generator: Distinguish between partial and full answers Recognize redundant partial answers Identify which subset of the partial answers Determine whether additional documents need to be examined to find a complete answer Assemble the final answer from partial pieces of information

9 9/18 Syntactic constituent of the answer Use Abney ’ s (1996) CASS partial parser, and count the number of times an answer segment for the question matches each constituent type 4 constituent types: noun phrase (n) verb phrase (v) prepositional phrase (p) clause (c)

10 10/18 Syntactic constituent of the answer 3 matching criteria ex: answer segments whose spans exactly correspond to a constituent in the CASS output up: the constituent completely contains the answer and no more than three additional (non- answer) tokens up/dn: the answer matches according to the up criterion or if the answer completely contains the constituent and no more than three additional tokens Results

11 11/18 Characteristics of opinion answers_ Overview Approximately twice as long as those of fact questions Much more likely to represent partial answers rather than complete answers Vary much more widely with respect to syntactic category; in contrast, fact answers are overwhelming associated with noun phrases Roughly half as likely to correspond to a single syntactic constituent type

12 12/18 Subjectivity Filters for MPQA Systems 3 subjectivity filters: Manual: consider a sentence to be opinion if it contains at least one opinion of intensity medium or higher, and to be fact otherwise Rulebased: use a bootstrapping algorithm to perform a sentence-based opinion classification Na ï ve Bayes: trained a Naive Bayes subjectivity classifier on the labeled set

13 13/18 Experiments on Subjectivity Filters Answer rank experiments: Can subjectivity filters improve the answer identification phase? For each opinion Q, do the following: Results

14 14/18 Experiments on Subjectivity Filters Answer probability experiments: Can opinion information be used in an answer generator? Compute the probabilities: Results: < < < < < >

15 15/18 Opinion Source Filters for MPQA Systems Source filter: removes all sentences that do not have an opinion annotation with a source that matches the source of the question (manually identified) Use Manual source annotation only Answer rank experiment

16 16/18 Opinion Source Filters for MPQA Systems Results: Outperforms the baseline on some questions and performs worst on others MRR is worse than the baseline (0.4633 vs. 0.4911) MRFA is the best (11.26 vs. 61.33)  the ability to recognize the As to the hardest Qs M7: What did South Africa want Mugabe to do after the 2002 election? (rank: 153  21) M8: What is Mugabe ’ s opinion about the West ’ s attitude and actions towards the 2002 Zimbabwe election? (rank: 182  11) Exception: V3: Did anything surprising happen when Hugo Chavez regained power in Venezuela after he was removed by a coup? No clear source, only a single answer, opinion not clear …… Always ranked an answer within the first 25 answers Especially useful in the additional processing phase

17 17/18 Conclusion Use OpQA corpus to compare the characteristics of answers to fact and opinion questions Surmise that traditional QA approaches may not be as effective for MPQA as they have been for fact-based QA Investigate the use of machine learning and rule-based opinion filters and showed that they can be used to guide MPQA systems

18 18/18 Q & A

19 19/18 Questions in the OpQA collection by topic

20 20/18 Syntactic Constituent Type the % of correct answers that would remain after filtering roughly half as likely to correspond to a single syntactic constituent type Vary much more widely with respect to syntactic category

21 21/18 Results for the subjectivity filters No filtering at least as high as in the baseline


Download ppt "1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University."

Similar presentations


Ads by Google