Using Information Extraction for Question Answering Done by Rani Qumsiyeh.

Slides:



Advertisements
Similar presentations
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Improved TF-IDF Ranker
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
September EXPERIMENTAL TECHNIQUES & EVALUATION IN NLP Universita’ di Venezia 1 Ottobre 2003.
Introduction to Information Extraction Chia-Hui Chang Dept. of Computer Science and Information Engineering, National Central University, Taiwan
4/14/20051 ACE Annotation Ralph Grishman New York University.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Text mining and the Semantic Web Dr Diana Maynard NLP Group Department of Computer Science University of Sheffield.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Ⅵ. Information Extraction (1~4) 2007 년 2 월 20 일 인공지능 연구실 이경택 Text: The text mining handbook Page.94 ~ 109.
OHTO -99 SOFTWARE ENGINEERING “SOFTWARE PRODUCT QUALITY” Today: - Software quality - Quality Components - ”Good” software properties.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Introduction  Information Extraction (IE)  A limited form of “complete text comprehension”  Document 로부터 entity, relationship 을 추출 
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
ITTL.ppt-1 Information Technology & Telecommunications Laboratory Semantic Technologies Applied to FOIA Review William Underwood Partnerships in Innovation:
Pete Bohman Adam Kunk.  ChronoSearch: A System for Extracting a Chronological Timeline ChronoChrono.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
NYU: Description of the Proteus/PET System as Used for MUC-7 ST Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002.
Welcome to Year 6 SATs meeting Brindle St James’ CE Primary School.
Splitting Complex Temporal Questions for Question Answering systems ACL 2004.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Domain Act Classification using a Maximum Entropy model Lee, Kim, Seo (AAAI unpublished) Yorick Wilks Oxford Internet Institute and University of Sheffield.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Templates of slides for P2 1. A very brief refresher of your problem Describe in English -what artifacts (programs, etc) you wish to synthesize, -from.
Information Retrieval (based on Jurafsky and Martin) Miriam Butt October 2003.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
LaSIE: The Large Scale Information Extraction System Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield.
WIKT 2007Košice, november Tvorba sémantických metadát Michal Laclavík Ústav Informatiky SAV.
Message Understanding Conference (MUC)
NYU Coreference CSCI-GA.2591 Ralph Grishman.
Background & Overview Proposed Model Experimental Results Future Work
Introduction to Information Extraction
Social Knowledge Mining
Clustering Algorithms for Noun Phrase Coreference Resolution
Chunk Parsing CS1573: AI Application Development, Spring 2003
CS246: Information Retrieval
Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

Using Information Extraction for Question Answering Done by Rani Qumsiyeh

Problem More Information added to the web everyday. Search engines exist but they have a problem This calls for a different kind of search engine.

History of QA QA can be dated back to the 1960’s Two common approaches to design QA: Information Extraction Information Retrieval Two conferences to evaluate QA systems TREC (Text REtrieval Conference) MUC (Message Understanding Conference)

Common Issues with QA systems Information retrieval deals with keywords. Information extraction learns the question. The question could have multiple variations which means Easier for IR but more broad results Harder for IE but more EXACT results

Message Understanding Conference (MUC) Sponsored by the Defense Advanced Research Projects Agency (DARPA) Developed methods for formal evaluation of IE systems In the form of a competition, where the participants compare their results with each other and against human annotators‘ key templates. Short system preparation time to stimulate portability to new extraction problems. Only 1 month to adapt the system to the new scenario before the formal run.

Evaluation Metrics Precision and recall: Precision: correct answers/answers produced Recall: correct answers/total possible answers F-measure Where is a parameter representing relative importance of P & R: E.g., =1, then P&R equal weight, =0, then only P Current State-of-Art: F=.60 barrier

MUC Extraction Tasks Named Entity task (NE) Template Element task (TE) Template Relation task (TR) Scenario Template task (ST) Coreference task (CO)

Named Entity Task (NE) Mark into the text each string that represents, a person, organization, or location name, or a date or time, or a currency or percentage figure

Template Element Task (TE) Extract basic information related to organization, person, and artifact entities, drawing evidence from everywhere in the text.

Template Relation task (TR) Extract relational information on employee_of, manufacture_of, location_of relations etc. (TR expresses domain independent relationships between entities identified by TE)

Scenario Template task (ST) Extract prespecified event information and relate the event information to particular organization, person, or artifact entities (ST identifies domain and task specific entities and relations)

Coreference task (CO) Capture information on corefering expressions, i.e. all mentions of a given entity, including those marked in NE and TE (Nouns, Noun phrases, Pronouns)

An Example The shiny red rocket was fired on Tuesday. It is the brainchild of Dr. Big Head. Dr. Head is a staff scientist at We Build Rockets Inc. NE: entities are rocket, Tuesday, Dr. Head and We Build Rockets CO: it refers to the rocket; Dr. Head and Dr. Big Head are the same TE: the rocket is shiny red and Head‘s brainchild TR: Dr. Head works for We Build Rockets Inc. ST: a rocket launching event occurred with the various participants.

Scoring templates Templates are compared on a slot-by- slot basis Correct: response = key Partial: response » key Incorrect: response != key Spurious: key is blank overgen=spurious/actual Missing: response is blank

Maximum Results Reported

KnowitAll, TextRunner, KnowitNow Differ in implementation, but do the same thing.

Using them as QA systems Able to handle questions that produce 1 relation Who is the president of the US? “can handle” Who was the president of the US in 1998? “fails” Produces a huge number of facts that the user still has to go through.

Textract Aims at solving ambiguity in text by introducing more named entities. What is Julian Werver Hill's wife's telephone number? equivalent to: What is Polly's telephone number? Where is Werver Hill's affiliated company located? equivalent to: Where is Microsoft located?

Proposed System Determine what named entity we are looking for using Textract. Use Part of Speech tagging. Use TextRunner as the basis for search. Use WordNet to find synonyms. Use extra entities in text as “constraints”

Example

(WP who) (VBD was) (DT the) (JJ first) (NN man) (TO to) (VB land) (IN on) (DT the) (NN moon) The verb (VB) is treated as the argument. The noun (NN) is treated as the predicate We make sure that position is maintained We keep prepositions if they have two nouns. (president of the US) Other non stop words are constraints, i.e., “first”

Example

Anaphora Resolution Use anaphora resolution to determine that landed is not associated with landed but wrote instead.

Use Synonyms We use word net to find possible synonyms for verbs and nouns to produce more facts. We only consider 3 synonyms as it takes more time the more fact retrievals we have to do.

Using constraints

Delimitations Works well with Who, When, Where questions as named entity is easily determined. Achieves about 90% accuracy on all Works less well with What, How questions Achieves about 70% accuracy Takes about 13 seconds to answer question.

Future Work Build an ontology to determine named entity and parse question (faster) Handle combinations of questions. When and where did the holocaust happen?