PIQUANT Question Answering System

Slides:



Advertisements
Similar presentations
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Advertisements

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Vikas BhardwajColumbia University NLP for the Web – Spring 2010 Improving QA Accuracy by Question Inversion Prager et al. IBM T.J. Watson Res. Ctr. 02/18/2010.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Information Retrieval in Practice
Semantic Web Tools for Authoring and Using Analysis Results Richard Fikes Robert McCool Deborah McGuinness Sheila McIlraith Jessica Jenkins Knowledge Systems.
QUIRK:Project Progress Report Monterey, June Cycorp IBM.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Selectively using linguistic resources in the QA Raffaella Bernardi Gilad Mishne Valentin Jijkoun Maarten de Rijke Projects , , ,
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Use of Patterns for Detection of Answer Strings Soubbotin and Soubbotin.
Overview of Search Engines
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti,
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
JAVELIN Project Briefing 1 AQUAINT Year I Mid-Year Review Language Technologies Institute Carnegie Mellon University Status Update for Mid-Year Program.
A Unified Framework for the Semantic Integration of XML Databases
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Search Engines and Information Retrieval Chapter 1.
Web Services Experience Language Web Services eXperience Language Technical Overview Ravi Konuru e-Business Tools and Frameworks,
1 The BT Digital Library A case study in intelligent content management Paul Warren
QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)
Steps Toward an AGI Roadmap Włodek Duch ( Google: W. Duch) AGI, Memphis, 1-2 March 2007 Roadmaps: A Ten Year Roadmap to Machines with Common Sense (Push.
Survey of Semantic Annotation Platforms
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Semantic Search via XML Fragments: A High-Precision Approach to IR Jennifer Chu-Carroll, John Prager, David Ferrucci, and Pablo Duboue IBM T.J. Watson.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002.
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
PIQUANT at AQUAINT Kick-Off Dec PIQUANT Practical Intelligent QUestion ANswering Technology A Question Answering system integrating Information.
AQUAINT IBM PIQUANT ARDACycorp Subcontractor: PIQUANT Question Answering System ARDA AQUAINT Program June Workshop 2002 This work was supported in part.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
AQUAINT Kickoff Meeting Advanced Techniques for Answer Extraction and Formulation Language Computer Corporation Dallas, Texas.
AQUAINT Testbed John Aberdeen, John Burger, Conrad Chang, Scott Mardis The MITRE Corporation © 2002, The MITRE Corporation.
QUIRK:Project Progress Report December Cycorp IBM.
MedKAT Medical Knowledge Analysis Tool December 2009.
Faculty Faculty Richard Fikes Edward Feigenbaum (Director) (Emeritus) (Director) (Emeritus) Knowledge Systems Laboratory Stanford University “In the knowledge.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
Information Retrieval in Practice
Towards a framework for architectural design decision support
Search Engine Architecture
Text Based Information Retrieval
Reading Report on Hybrid Question Answering System
Search Engine Architecture
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Architecture Components
Associative Query Answering via Query Feature Similarity
Illustrations of different approaches Peter Clark and John Thompson
Web IR: Recent Trends; Future of Web Search
Visual Basic .NET BASICS
Traditional Question Answering System: an Overview
Lecture 12: Data Wrangling
CS246: Information Retrieval
Search Engine Architecture
Entity Linking Survey
Presentation transcript:

PIQUANT Question Answering System Dave Ferrucci, John Prager, Jennifer Chu-Carroll, Chris Welty, Chris Cesar and Scott Fahlman ARDA AQUAINT Program June Workshop 2002 This work was supported in part by the Advanced Research and Development Activity (ARDA)'s Advanced Question Answering for Intelligence (AQUAINT) Program under contract number MDA904-01-C-0988.

Subcontractor: Cycorp Overview Progress Update Architecture Qplans Working Example Answer Selection and Resolution Performance Improvements Summary IBM Research Subcontractor: Cycorp

PIQUANT Research Objectives Integration & impact of knowledge based system (e.g., Cyc) in QA Extensible QA architectures Declarative question plans Parallel solution paths and pervasive confidence processing Deeper linguistic & knowledge-based analysis IBM Research Subcontractor: Cycorp

Progress Since AQUAINT Kickoff Architecture Design Support for multiple answering agents, solution paths and knowledge sources Centralized ontology management & uniform access to knowledge sources New question plan modules Improved Ranking Enhanced Answer Selection using deeper linguistic analysis Integration of Cyc in Answer Resolution for “sanity checking” Integration of multiple knowledge sources Answering question previously missed Multiple solutions paths based on alternative question decomposition Integration of Cyc as a knowledge source IBM Research Subcontractor: Cycorp

Architectural Limitations as of TREC10 Pipeline Single Answering Approach Limited Extensibility Single Solution Source WordNet added as second-class citizen No Knowledge System component Limited question understanding Shallow conceptual map from Q to A Limited to explicit matches -- cut-off from inferred possibilities “Explanations” limited to text passages containing answers Can’t filter out crazy answers IBM Research Subcontractor: Cycorp

Classic Pipeline with WordNet Question Analysis Question Search Text Query HitList Answer Classification Answer Type WN Query WN Answer Answer Selection WordNet Answer IBM Research Subcontractor: Cycorp

Knowledge Source Services Question Analysis WordNet Question KB Query Cyc Answer Classification Text Query Search List Hit Answer Type Answer Selection Text Search Answers Cyc Answers Answer Justification & Presentation Answer Resolution Answer Answers WordNet Answers IBM Research Subcontractor: Cycorp

Subcontractor: Cycorp Answering Agents KS Adaptation Layer Answering Agents QGoals WordNet Question Analysis Question Complex Decomposition & Planning QFrame Cyc Answer Classification Causality Search Convert Question to Web Query List Hit Answer Selection Web Answer Justification & Presentation Answer Resolution Answer Answers IBM Research Subcontractor: Cycorp

Planning-Based Answering Agent KS Adaptation Layer Answering Agents WordNet Question Analysis Question Plan Selection QFrame Answer Classification Answering Agent Selection QPlans Search QPlan Execution Eng QGoals List Hit Answer Selection Answer Candidates Web Answer Resolution Cyc QFilter Answer Justification & Presentation Answer Resolution Answer Answers IBM Research Subcontractor: Cycorp

Subcontractor: Cycorp QPlans Plans for attacking different question types Identifies knowledge sources to use Text Search, Cyc, WordNet, … Specifies preferences, when relevant, of sources Simple questions have base plans (no recursion) Complex questions can be broken into sub-plans IBM Research Subcontractor: Cycorp

Sample Question Types 10 identified, 5 with QPlans When When was the Battle of Hastings? Define What is anorexia nervosa? Property What is the population of the capital of Great Britain? WhatX What county is Phoenix AZ in? Super What is the largest snake in the world? IBM Research Subcontractor: Cycorp

Mapping Questions to QPlans Property What is the P of X? What is the P of X? What is the capital of Great Britain? What is the Declaration of Independence? Define What is X? What is X? What is the capital of Great Britain? What is the Declaration of Independence? IBM Research Subcontractor: Cycorp

Subcontractor: Cycorp QPlan Example Ask: “What is the population of the capital of Great Britain?” Recognize question type: Property Recognize answer type: NUMBER/POPULATION Plan Text Search: “Population of the capital of Great Britain” PA Search: “The capital of Great Britain” and (NUMBER$ or POPULATION$) Cyc, DB and WordNet queries Decomposition For each answer, A, to “What is the capital of Great Britain?” Ask: “What is the population of” A Each element of the decomposition may be answered by different knowledge sources (e.g., Cyc, WordNet etc). IBM Research Subcontractor: Cycorp

Our TREC10 System vs. PIQUANT What is the population of the capital of Tajikistan? Text Search Wrong! 5.3 Million What is the capital of Tajikistan? What is the population of Dushanbe? Text Search Cyc X = Dushanbe 460,000 nil What is the population of the capital of Tajikistan? What is the population of X? Right! IBM Research Subcontractor: Cycorp

Subcontractor: Cycorp PIQUANT Architecture KS Adaptation Layer Answering Agents WordNet Question Analysis Question Plan Selection QFrame Answer Classification Answering Agent Selection QPlans Search QPlan Execution Eng QGoals List Hit Answer Selection Answer Candidates Web Answer Resolution Cyc QFilter Answer Justification & Presentation Answer Resolution Answer Answers IBM Research Subcontractor: Cycorp

Enhance Answer Resolution/Selection Deeper linguistic analysis Identifying and matching answer type Name-Entity Tagger Matching syntactic relationships between Q and A Deep Parser Multiple knowledge sources to reinforce answers Encyclopedia Britannica “Crazy Answer” Elimination Using Cyc IBM Research Subcontractor: Cycorp

Deeper Linguistic Analysis In Answer Selection Hit List (Passages) Answer Selection Answers & Ranks Answer type Input Passages (typically 10) returned by the search engine Candidate passages for question: What is the capital of England? “Shaykh Salim Sabah al-Salim continued his talks today with high-ranking officials in the British capital, London.” “BRISTOL, capital of south-west England, holds a peculiar fascination for psephologists.” Semantic type(s) of answer sought Process Identify candidate answers using a semantic-based named-entity tagger <PERSON>Shaykh Salim Sabah al-Salim</PERSON> continued his talks <DATE>today</DATE> with <ROLE>high-ranking officials</ROLE> in the British capital, <CAPITAL>London</CAPITAL>.” Rank candidate answers based on pre-identified features IBM Research Subcontractor: Cycorp

Multiple Knowledge Sources Question Analysis KB Query WordNet Question EB with PA Index Cyc Answer Classification Text Query Search Answer Type List Hit Answer Selection TREC with PA Index Substantiating answers with multiple sources increases confidence TREC Corpus + Encyclopedia Britannica Found previously missed answers Improved rank of previously found answers Text Search Answers Cyc Answers Answer Justification & Presentation Answer Resolution Answer Answers WordNet Answers IBM Research Subcontractor: Cycorp

Subcontractor: Cycorp PIQUANT Architecture KS Adaptation Layer Answering Agents WordNet Question Analysis Question Plan Selection QFrame Answer Classification Answering Agent Selection QPlans Search QPlan Execution Eng QGoals List Hit Answer Selection Answer Candidates Web Answer Resolution Cyc QFilter Answer Justification & Presentation Answer Resolution Answer Answers IBM Research Subcontractor: Cycorp

“Crazy Answer” Elimination Semantic type mismatch Examples What city in Florida is Sea World in? London, San Diego, Tulsa Who was Charles Lindbergh’s wife? Babe Ruth, Jack Dempsey Issue Need to determine if an ISA relationship is possible between two entities Unreasonable numerical ranges What is the weight of a wolf? 300 tons How many states have a lottery? 600, 203 How big is our galaxy in diameter? 14 feet, 43 feet Issues (Under Development at Cycorp) Need upper and/or lower bounds on property values Need reasonable units for certain measures IBM Research Subcontractor: Cycorp

Performance Evaluation Evaluation performed on a set of 364 TREC9 questions Results of Improved Answer Selection/Resolution Deeper linguistic analysis Multiple knowledge sources to reinforce answers MRR # Missed Answers # Answers in Rank 1 TREC10 0.666 64 203 +Improved Ranking 0.720 47 228 +Multiple Sources 0.739 42 235 +Sanity Checking TBD Substantially increased number of answers in rank 1 particularly important in recursive architecture IBM Research Subcontractor: Cycorp

Subcontractor: Cycorp Next Six Months Richer question-classification, plan development and execution Ontology synthesis and central management/access Richer and more robust integration of knowledge sources Answer Aggregation Answer Elimination Answer Generation Answering Agent for Causality Questions Leverage dialog with Cyc regarding event pre and post conditions e.g., postCondition (“drink poison”, “die”) Improve Answer Resolution Confidence Processing Implementation Improvements (Speed, Modularity) IBM Research Subcontractor: Cycorp

PIQUANT June Workshop Update The End