Building a Simple Question Answering System Mark A. Greenwood Natural Language Processing Group Department of Computer Science University of Sheffield,

Slides:



Advertisements
Similar presentations
Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!
Advertisements

A complete citation, notecard, and outlining tool
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Question Answering Question Answering Available from: Mark A. Greenwood MEng.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
A GOAL-BASED FRAMEWORK FOR SOFTWARE MEASUREMENT
Getting Started: Research and Literature Reviews An Introduction.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
1 Dr Alexiei Dingli Introduction to Web Science Reusing knowledge.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
Research Methods for Business Students
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Guidelines for Examination Candidates Raymond Hickey English Linguistics University of Duisburg and Essen (August 2015)
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
n Introduction Introduction n Making a source list Making a source list Making a source list n Preparing a Works Cited list Preparing a Works Cited.
Dr. Alireza Isfandyari-Moghaddam Department of Library and Information Studies, Islamic Azad University, Hamedan Branch
A Pattern Based Approach to Answering Factoid, List and Definition Questions Mark A. Greenwood and Horacio Saggion Natural Language Processing Group Department.
©2003 Pearson Education, Inc., publishing as Longman Publishers. Study Skills Topic 13 Preparing & Taking Exams PowerPoint by JoAnn Yaworski.
What have others said or recommended, as their part of “Participating in the Process?” What have others said or recommended, as their part of “Participating.
DATA COMMUNICATION DONE BY: ALVIN SAMPATH CARLVIN SAMPATH.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Project.  Topic should be: Clear and specific Practical and meaningful, this means the results of your research must have some implications in real life.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
A Data Driven Approach to Query Expansion in Question Answering Leon Derczynski, Robert Gaizauskas, Mark Greenwood and Jun Wang Natural Language Processing.
Bits & Bytes Created by Chris McAbee For AAMU AGB199 Extra Credit Created from information copied and pasted from
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
ECE 2300 Circuit Analysis Lecture Set #6 The Node Voltage Method with Voltage Sources.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
A Language Independent Method for Question Classification COLING 2004.
Why is Computational Linguistics Not More Used in Search Engine Technology? John Tait University of Sunderland, UK.
1 Just-in-Time Interactive Question Answering Language Computer Corporation Sanda Harabagiu, PI John Lehmann John Williams Paul Aarseth.
Common Fractions © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math Dividing 1 3 ÷ 1 3.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
AnswerFinder Question Answering from your Desktop Mark A. Greenwood Natural Language Processing Group Department of Computer Science University of Sheffield,
Presenter: Shanshan Lu 03/04/2010
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
Higher English Close Reading Types of Questions Understanding Questions Tuesday 8 OctoberCMCM1.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Close Reading Intermediate 2. Time The Close Reading exam paper lasts for one hour. (Date and time for 2011: Friday 13 May, 1.00pm to 2.00pm.) NAB: Friday.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
What students really think of their reading lists: reading list software at the University of Huddersfield Alison Sharman 2015.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Word Create a basic TOC. Course contents Overview: table of contents basics Lesson 1: About tables of contents Lesson 2: Format your table of contents.
Word Processing Word processing packages such as Microsoft Word are text based. When text is entered via a keyboard, the characters are displayed on screen.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
CSM06: Information Retrieval Notes about writing coursework reports, revision and examination.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
1. 1.To examine the information included in business reports. 2.To understand how to organize documents in order to ensure clear communication. 3.To analyze.
How to Turnitin Dr Stephen Rankin Lecturer in Academic Writing and Literacy Murdoch University A 6 step guide for submitting your assignments to Turnitin.
Dinosaurs During this unit you will be briefly covering several different topics to give you a taste of ICT. The overall aim of this unit is to create.
ONESEARCH & REFERENCING Library Skills. Outcomes By the end of this session you should be able to: ■Identify and name keywords and synonyms relating to.
HTBN Batches These slides are intended as a starting point for further discussion of how eTime might be extended to allow easier processing of HTBN data.
1 RESEARCHING USING ONLINE SOURCES _____________________________ A Guide to Searching for and Evaluating Web Pages on the Internet.
Data Structures and Algorithm Analysis Dr. Ken Cosh Linked Lists.
Research – using the Internet and other secondary sources and Source analysis Top Tips – get ready to make your own notes!
Advanced Higher Computing Science
Plagiarism and Academic Integrity
Text Based Information Retrieval
Retrieval Evaluation - Reference Collections
Information Retrieval and Web Design
Presentation transcript:

Building a Simple Question Answering System Mark A. Greenwood Natural Language Processing Group Department of Computer Science University of Sheffield, UK

November 27th 2003P3 Seminar Overview What is Question Answering? Question Types Evaluating Question Answering Systems A Generic Question Answering Framework The Standard Approach A Simplified Approach  Results and Evaluation  Problems with this Approach  Possible Extensions Question Answering from your Desktop

November 27th 2003P3 Seminar What is Question Answering? The main aim of QA is to present the user with a short answer to a question rather than a list of possibly relevant documents. As it becomes more and more difficult to find answers on the WWW using standard search engines, question answering technology will become increasingly important. Answering questions using the web is already enough of a problem for it to appear in fiction (Marshall, 2002): “I like the Internet. Really, I do. Any time I need a piece of shareware or I want to find out the weather in Bogotá… I’m the first guy to get the modem humming. But as a source of information, it sucks. You got a billion pieces of data, struggling to be heard and seen and downloaded, and anything I want to know seems to get trampled underfoot in the crowd.”

November 27th 2003P3 Seminar Question Types Clearly there are many different types of questions:  When was Mozart born? Question requires a single fact as an answer. Answer may be found verbatim in text i.e. “Mozart was born in 1756”.  How did Socrates die? Finding an answer may require reasoning. In this example die has to be linked with drinking poisoned wine.  How do I assemble a bike? The full answer may require fusing information from many different sources. The complexity can range from simple lists to script-based answers.  Is the Earth flat? Requires a simple yes/no answer. The systems outlined in this presentation attempt to answer the first two types of question.

November 27th 2003P3 Seminar Evaluating QA Systems The biggest independent evaluations of question answering systems have been carried out at TREC (Text Retrieval Conference) over the past five years.  Five hundred factoid questions are provided and the groups taking part have a week in which to process the questions and return one answer per question.  No changes are allowed to your system between the time you receive the questions and the time you submit the answers. Not only do these annual evaluations give groups a chance to see how their systems perform against those from other institutions but more importantly it is slowly building an invaluable collection of resources, including questions and their associated answers, which can be used for further development and testing. Different metrics have been used over the years but the current metric is simply the percentage of questions correctly answered.

November 27th 2003P3 Seminar A Generic QA Framework A search engine is used to find the n most relevant documents in the document collection. These documents are then processed with respect to the question to produce a set of answers which are passed back to the user. Most of the differences between question answering systems are centred around the document processing stage.

November 27th 2003P3 Seminar The Standard Approach Obviously different systems use different techniques for processing the relevant documents. Most systems, however, use a pipeline of modules similar to that shown above, including our standard system, known as QA-LaSIE. Clearly this leads to a complicated system in which it is often difficult to see exactly how an answer is arrived at. This approach can work – some groups report that they can answer approximately 87% of TREC style factoid questions (Moldovan et al, 2002). QA-LaSIE on the other hand answers approximately 20% of the TREC style factoid questions (Greenwood et al, 2002).

November 27th 2003P3 Seminar A Simplified Approach The answers to the majority of factoid questions are easily recognised named entities, such as countries cities dates peoples names company names… The relatively simple techniques of gazetteer lists and named entity recognisers allow us to locate these entities within the relevant documents – the most frequent of which can be returned as the answer. This leaves just one issue that needs solving – how do we know, for a specific question, what the type of the answer should be.

November 27th 2003P3 Seminar A Simplified Approach The simplest way to determine the expected type of an answer is to look at the words which make up the question: who – suggests a person when – suggests a date where – suggests a location Clearly this division does not account for every question but it is easy to add more complex rules: country – suggests a location how much and ticket – suggests an amount of money author – suggests a person birthday – suggests a date college – suggests an organization These rules can be easily extended as we think of more questions to ask.

November 27th 2003P3 Seminar A Simplified Approach

November 27th 2003P3 Seminar Results and Evaluation The system was tested over the 500 factoid questions used in TREC 11 (Voorhees, 2002): Results for the question typing stage were as follows:  16.8% (84/500) of the questions were of an unknown type and hence could never be answered correctly.  1.44% (6/416) of those questions which were typed were given the wrong type and hence could never be answered correctly.  Therefore the maximum attainable score of the entire system, irrespective of any future processing, is 82% (410/500). Results for the information retrieval stage were as follows:  At least one relevant document was found for 256 of the of the correctly typed questions.  Therefore the maximum attainable score of the entire system, irrespective of further processing, is 51.2% (256/500).

November 27th 2003P3 Seminar Results and Evaluation Results for the question answering stage were as follows:  25.6% (128/500) questions were correctly answered by the system using this approach. This is compared to the 22.2% (111/500) of the questions which were correctly answered by QA-LasIE and 87.4% (437/500) correctly answered by the best system evaluated at TREC 2002 (Moldovan et al, 2002). Users of web search engines are, however, used to looking at a set of relevant documents and so would probably be happy looking at a handful of short answers.  If we examine the top five answers returned for each question then the system correctly answers 35.8% (179/500) of the questions which is 69.9% (179/256) of the maximum attainable score.  If we examine all the answers returned for each question then 38.6% (193/500) of the questions are correctly answered which is 75.4% (193/256) of the maximum attainable score, but this involves displaying over 20 answers per question.

November 27th 2003P3 Seminar Problems with this Approach The gazetteer lists and named entity recognisers are unlikely to cover every type of named entity that may be asked about:  Even those types that are covered may well not be complete.  It is of course relatively easy to build new lists, e.g. birthstones. The most frequently occurring instance of the right type might not be the correct answer.  For example if you are asking when someone was born, it maybe that their death was more notable and hence will appear more often (e.g. John F Kennedy’s assination). There are many questions for which correct answers are not named entities:  How did Patsy Cline die? – in a place crash

November 27th 2003P3 Seminar Possible Extensions A possible extension to this approach is to include answer extraction patterns (Greenwood and Gaizauskas, 2003).  These are basically enhanced regular expressions in which certain tags will match multi-word terms.  For example questions such as “What does CPR stand for?” generate patterns such as “ NounChunK ( X ) ” where CPR is substituted for X to select a noun chunk that will be suggested as a possible answer.  Often using these patterns will allow us to determine when the answer is not present in the relevant documents (i.e. if none of the patterns match then we can assume the answer is not there).

November 27th 2003P3 Seminar Possible Extensions Advantages of using these patterns alongside the simple approach:  The answer extraction patterns can be easily incorporated as they become available.  These patterns are not constrained to only postulating named entities as answers but also single words (of any type), noun/verb chunks or any other sentence constituent we can recognise.  Even questions which can be successfully answered by the simple approach may benefit from answer extraction patterns. The patterns for questions of the form “When was X born?” include “ X ( DatE – ” which will extract the correct answer from text such as “Mozart ( ) was a musical genius” correctly ignoring the year of death.

November 27th 2003P3 Seminar QA from your Desktop Question answering may be an interesting research topic but what is needed is an application that is as simple to use as a modern web search engine. The ideas outlined in this talk have been implemented in an application called AnswerFinder. Hopefully this application will allow average computer users access to question answering technology.

Any Questions? Copies of these slides can be found at:

November 27th 2003P3 Seminar Bibliography Mark A. Greenwood and Robert Gaizauskas. Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering. In Proceedings of the Workshop on Natural Language Processing for Question Answering (EACL03), pages 29–34, Budapest, Hungary, April 14, Mark A. Greenwood, Ian Roberts, and Robert Gaizauskas. The University of Sheffield TREC 2002 Q&A System. In Proceedings of the 11th Text REtrieval Conference, M. Marshall. The Straw Men. HarperCollins Publishers, Dan Moldovan, Sanda Harabagiu, Roxana Girju, Paul Morarescu, Finley Lacatusu, Adrian Novischi, Adriana Badulescu, and Orest Bolohan. LCC Tools for Question Answering. In Proceedings of the 11th Text REtrieval Conference, Ellen M. Voorhees. Overview of the TREC 2002 Question Answering Track. In Proceedings of the 11th Text REtrieval Conference, 2002.