Techniques Used in Modern Question-Answering Systems Candidacy Exam Elena Filatova December 11, 2002 Committee Luis GravanoColumbia University Vasileios.

Slides:



Advertisements
Similar presentations
Improved TF-IDF Ranker
Advertisements

QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Automatic indexing and retrieval of crime-scene photographs Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield Scene of.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Question-Answering: Overview Ling573 Systems & Applications March 31, 2011.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Vector Space Model CS 652 Information Extraction and Integration.
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Overview of Search Engines
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Retrieval in Practice
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Question Answering over Implicitly Structured Web Content
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Multimedia Information Retrieval
Traditional Question Answering System: an Overview
Automatic Detection of Causal Relations for Question Answering
CSE 635 Multimedia Information Retrieval
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Techniques Used in Modern Question-Answering Systems Candidacy Exam Elena Filatova December 11, 2002 Committee Luis GravanoColumbia University Vasileios Hatzivassiloglou Department of Computer Science Rebecca J. Passonneau

Present vs Past Research on QA Current systems –Mainly systems written for TREC conference factoid questions short answers huge text collections Related systems –IR queries vs questions return documents vs short answers –Systems based on semantic representations ( Lehnert ): questions about one text vs text collections inference from semantic structure of a text vs searching for an answer in the text –One type of output (NP) from a closed collection ( Kupiec ) answer inference vs answer extraction

Lehner’t system John loved Mary but she didn’t want to marry him. One day, a dragon stole Mary from the castle. John got on top of his horse and killed the dragon. Mary agreed to marry him. They lived happily ever after. Q: Why did Mary agree to marry John? A: Because she was indebted to him Problems stated: –right classification –dependency of answer inference procedure on the type of the question

Current QA Systems question analysis question query extracted document s rules for answer list of answers Information Extraction right query long text domain dependency predefined types of answers Information Retrieval

Plan Classification Information (document) retrieval –Query formation Information extraction –Passage extraction –Answer extraction Usage of answer redundancy on Web in QA QA for restricted domain Evaluation procedure for current QA systems and analysis of the performance

Classification and QA question analysis question query extracted document s rules for answer list of answers

Theory of Classification Rosch et al: classification of basic objects World is structured: real-world attributes do not occur independently of each other: object_has(wings) => P(object_has(feathers)) > P(object_has(fur)) Each category (class) – set of attributes that are common for all the objects in the category Types of categories Superordinate – small amount of common attributes ( furniture ) Subordinate – a lot of common attributes ( floor lamp, desk lamp ) Basic – optimal amount of common attributes ( lamp ): basic objects are the most inclusive categories which delineate the correlation structure of the environment Though c lassification is a converging problem for objects, it is not possible to compile a list of all possible basic categories.

QA classification. Hierarchical/nonhierarchical classification –Even if there exist hierarchy in the classification it can be represented as flat: detailed classes + other class Amount of types (MULDER – 3 types vs Webclopedia – over 140 types) Trade off between –Detailed classes for better answer extraction and –High precision in defining the classes Usage of semantics Usage of syntax –Most of syntactic parsers are built on corpora which do no contain a lot of questions (WSJ) => need of additional corpus Attempts to automate this process –Maximum Entropy ( Ittycheriah ) –Classifiers ( Li&Roth )

Why QA classification is important? Usage of question type for 1.query construction question keywords + filtering mechanism ( Harabagiu ) synonyms and syn.sets from WordNet ( Webclopedia ) in both cases there is no connection with possible answer space information retrieval ( Agichtein, Berger ) there is connection between question and answer spaces but these types do not give the type of the answer 2. searching for a correct answer in the passage extracted from a text

Logical Forms Syntactic analysis plus semantic => logical form Mapping of question and potential answer LFs to find the best match ( Harabagiu, Webclopedia )

Query formation WordNet: synonyms, hyponyms, etc. Morphology: verbal forms, plural/single nouns, etc. Knowledge of the domain (IBM’s system) Statistical methods for connecting question and answer spaces: –Agichtein : automatic acquisition of patterns that might be good candidates for query expansion 4 ‘types‘ of question –Berger : to facilitate query modification (expansion) each question term gets a set of answer terms FQA: closed set of question-answer pairs

Information retrieval Classical IR is the first step of QA Vector-space model (calculation of similarity between terms in the query and terms in the document) IR techniques used in current QA systems are usually for one database (either web or TREC collection) Is it possible to apply Distributed IR techniques? –domain restricted QA with extra knowledge about the text collection IBM system –“splitting” one big collection of documents into smaller collections about specific topics –it might require change in classification: type of the question might cause the changes in query formulation, document extraction process, answer extraction process

question analysis question query extracted document s rules for answer list of answers Information Extraction Information Retrieval

Passage extraction Passages of particular length ( Cardie ) + Vector representation for each passage Paragraphs or sentences Classical text excerpting –Each sentence is assigned a score –Retrieved passages are formed by taking the sentences with the highest score Global-Local Processing ( Salton ) McCallum : passage extraction based not only on words but also on other features (e.g. syntactic constructions)

Information Extraction Domain dependency ( Grishman ) predefined set of attributes for the search specific for each topic, e.g. terrorism: victims, locations, perpetrators usually a lot of manually tagged data for training or texts divided into two groups: one topic – all other texts ( Riloff ) in both cases division into topics is a necessary step which is not applicable to open domain QA systems

What information can be extracted (IE) Named entities (NE-tagging) –Numbers (incl. dates, ZIP codes, etc.) –Proper names (locations, people, etc.) –Other depending on the system TREC8 – 80% questions asked for NEs NEs might also support Correlated entity: mini-CV ( Srihari ) Who is Julian Hill? name; age; gender; position; affiliation; education General events ( Srihari ) Who did what to whom when More complicated IE techniques lead QA back to AI approach

Answer Extraction Three main techniques for answer extraction are based on: 1.syntactic-semantic tree dependencies: ( Harabagiu, Webclopedia ) LF of the question is mapped to LF of possible answers 2.surface patterns ( Webclopedia ) – ( -) – was born on Good patterns require detailed classification: NUMBER vs DOB 3.text window –Cardie: query-dependant text summarization of text passages with/without syntactic and semantic information LF mapping classical MT surface patterns example-based MT text window statistical MT

Usage of Web (Answer redundancy) Multiple formulation of answer can useful for: 1.IR stage: increased chances to find an answer that matches query ( Clarke, Brill ) no need in searching for an exact formulation of the answer 2. IE stage: facilitation of answer extraction ( Agichtein, Ravichandran, Brill ) create a list of patterns which might contain the answer either completely automatic (Agichtein) or using handwritten filters based on question types and domain (Brill) 3.Answer validation ( Magnini ) correct answer redundancy

Domain restricted applications FAQ (different from IR or QA) –match the input question with a list of already existing questions –predefined output (according to the above question matching) Rillof –5 types of questions –answer extraction from a given text => no IR stage –always there is an answer (unique answer) IBM system –based on good knowledge of inner structure of IBM web- site –Use of FAQ techniques results are better than for open-domain QA systems restricted-domain MT vs open-domain MT

Evaluation IR and IE have different evaluation measures –IR: each document is marked either relevant/non-relevant  recall + precision –IE: gold standard answer key enumerates all acceptable responses  recall + precision –QA: mean reciprocal rank (MRR)  For each question: receive score equal to reciprocal of rank of first correct response, or 0 if no correct response found. Overall system score is mean of individual question scores. N – amount of questions asked; Ki = rank of the correct answer or 0; RAR =1/ Ki

Future of QA FROMTO Questions: Simple facts Questions: Complex: Uses Judgments Terms; Knowledge of User Context Needed Answers: Simple Factoid Answers found in Single Document Answers: Search Mult. Sources; Fusion of Info; Resolution of Conflicting Data; Interpretations, Conclusions