Download presentation
Presentation is loading. Please wait.
Published byArron Oliver Modified over 9 years ago
1
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of Southern California Presented By: Soobia Afroz
2
Introduction The degree of difficulty How closely a given corpus matches the question and NOT on the question itself Q: When was the UN founded? A: The UN was formed in January 1942. A: The name "United Nations", coined by United States President Franklin D. Roosevelt, was first used in the "Declaration by United Nations" of 1 January 1942, during the Second World War, when representatives of 26 nations pledged their Governments to continue fighting together against the Axis Powers. Larger text => Good Answers => Validation in original text
3
Paraphrasing questions: Create semantically equivalent paraphrases of the questions Match Answer/string with any of the paraphrases Question paraphrases + Retrieval engine Find documents containing correct answers Rank and select better answers Automatically paraphrase questions by TextMap. Example: “How did Mahatma Gandhi die?” “How deep is Crater Lake?” “Who invented the cotton gin?”
4
Automatic Paraphrases of questions:
5
How the system works: Parse questions Identify the answer type of the question Reformulate the question average reformulations: 3.14 Match at parse-tree level
6
1. Syntactic reformulations Turn a question into declarative form, e.g.,
7
2. Inference Reformulations.
8
3. Reformulation Chains
9
4. Generation
10
Information Retrieval and the Web TREC (Text Retrieval Conference) IR system for Webclopedia Web Web based IR system Query Reformulation module Web Search engine Sentence Ranking module
11
1. Query Reformulation module Previous attempts: Simple, exhaustive string-based manipulations Transformation grammars Learning algorithms Current attempt: Analyze how people naturally form queries to find answers Randomly selected 50 TREC8 questions Manually produced simplest queries that yield the most Web pages containing answers Analyzed the manually-produced queries and categorized them into seven ‘natural’ techniques that were used to form a natural language question Derived algorithms that replicate each of the observed technique
12
Query Reformulation Techniques
13
2. Sentence Ranking module Produce a list of Boolean queries for each question using all the query reformulation techniques Retrieve the top ten results for each query using a web search engine Retrieve the documents, strip HTML, segment the text into sentences Each sentence is ranked according to 2 schemas: Score w.r.t. queries terms: -- Each word in query assigned a weight -- Each quoted term in the query has a weight equal to the sum of the weights of its words -- Each sentence has a weight equal to the weighted overlap with queries terms Score w.r.t. answers: -- Tag sentences using BBN’s IdentiFinder ( a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities. ) -- Score sentences according to the overlap with answer type, checked against the answer type and the semantic entities found by IdentiFinder
14
Evaluation of the results:
16
Reformulations led to more correct answers when used in conjunction with a large corpus like the Web.
17
Conclusion Likelihood of finding correct answers is increased by QR IR module produces higher quality answer candidates Scoring precision is increased for answer candidates A strong match with a reformulation provides additional confidence in the correctness of the answer
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.