PIQUANT at AQUAINT Kick-Off Dec 3-5 2001 PIQUANT Practical Intelligent QUestion ANswering Technology A Question Answering system integrating Information.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Information Retrieval in Practice
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Search Engines and Information Retrieval
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
INFO 624 Week 3 Retrieval System Evaluation
Writing Good Software Engineering Research Papers A Paper by Mary Shaw In Proceedings of the 25th International Conference on Software Engineering (ICSE),
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Describing Syntax and Semantics
© 2004 Soar Technology, Inc.  July 14, 2015  Slide 1 Thinking… …inside the box Randolph M. Jones Commercializing Soar: Software Engineering Perspective.
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Overview of Search Engines
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Information Retrieval in Practice
Query Processing Presented by Aung S. Win.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
JAVELIN Project Briefing 1 AQUAINT Year I Mid-Year Review Language Technologies Institute Carnegie Mellon University Status Update for Mid-Year Program.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Artificial intelligence project
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
Querying Structured Text in an XML Database By Xuemei Luo.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
A Language Independent Method for Question Classification COLING 2004.
AQUAINT IBM PIQUANT ARDACycorp Subcontractor: PIQUANT Question Answering System ARDA AQUAINT Program June Workshop 2002 This work was supported in part.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
AQUAINT Kickoff Meeting Advanced Techniques for Answer Extraction and Formulation Language Computer Corporation Dallas, Texas.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
AQUAINT Scenario Breakout -- Group 2, Team 6 12 June 2002.
The Unreasonable Effectiveness of Data
Yr 7.  Pupils use mathematics as an integral part of classroom activities. They represent their work with objects or pictures and discuss it. They recognise.
An Ontological Approach to Financial Analysis and Monitoring.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Artificial Intelligence Logical Agents Chapter 7.
Information Retrieval in Practice
PIQUANT Question Answering System
Software Engineering (CSI 321)
Traditional Question Answering System: an Overview
ece 627 intelligent web: ontology and beyond
Probabilistic Databases
Presentation transcript:

PIQUANT at AQUAINT Kick-Off Dec PIQUANT Practical Intelligent QUestion ANswering Technology A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation Prime Contractor: IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne, NY Subcontractor: Cycorp

PIQUANT at AQUAINT Kick-Off Dec IBM & Cycorp Bringing Complementary Strengths to QA IBM –Information Retrieval –Natural Language Processing –Scalable System Architectures –Business Applications Architectures Cycorp –Structured Knowledge Representation –Rich Common Sense Knowledge Bases –Deep Inferencing –Ontologies Both symbolic and statistical }

PIQUANT at AQUAINT Kick-Off Dec Experience from TREC8-10 End-to-end system that has performed well Invaluable experience in learning where the problems are: –Coverage –Engineering –Understanding

PIQUANT at AQUAINT Kick-Off Dec IBM’s PIQUANT Principal Extensions Integration of IR/NLP with Structure KBs and Deep Inference –Knowledge System to assist in decomposing and answering questions –Provide justification and/or invalidation of candidate answers Parallel Solution Paths and Pervasive Confidence Analysis –Multiple parallel solution approaches to problem/subproblem –Pervasive use of confidences to mediate management of alternatives –Extensive reinforcement of symbolic approaches by statistical data Well-Defined Component Architecture –Modular –Defined interfaces between NLP, IR, KS and Statistical Components –Declarative representation of question answering plans

PIQUANT at AQUAINT Kick-Off Dec Where Knowledge-Systems Help Heuristic of finding short passages with all the query terms/semantic classes is good but not sufficient. E.g. from TREC9: Q: How much folic acid should an expectant mother take daily? A: 360 tons Q: What is the diameter of the Earth? A: 14 ft. Q: How many states have a lottery? A: 3,312 We will investigate the use of a sophisticated inference engine and knowledge-base (Cyc) to eliminate such answers.

PIQUANT at AQUAINT Kick-Off Dec Question Complexity “Simple” questions are not a solved problem: Complex questions can be decomposed into simpler components. If simpler questions cannot be handled successfully, there’s no hope for more complex ones. BUT: Areas not explored (intentionally) by TREC to date: spelling errors grammatical errors syntactic precision e.g. significance of articles not, only, just …

PIQUANT at AQUAINT Kick-Off Dec Is there such a thing as a “simple” question? A: How many members are there in the Cabinet? Which is more complex? Suppose there is no text that gives the answer explicitly 42 (from HGTTG) B: What is the meaning of life? “simple” -> “simple to state” Complexity is a function for question and data source

PIQUANT at AQUAINT Kick-Off Dec Different Solution Approaches What is the largest city in England? Text Match –Find text that says “London is the largest city in England” (or paraphrase). Confidence is confidence of NL parser * confidence of source. Find multiple instances and confidence of source -> 1. “Superlative” Search –Find a table of English cities and their populations, and sort. –Find a list of the 10 largest cities in the world, and see which are in England. Uses logic: if L > all objects in set R then L > all objects in set E  R. –Find the population of as many individual English cities as possible, and choose the largest. Heuristics –London is the capital of England. (Not guaranteed to imply it is the largest city, but quite likely.) Complex Inference –E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”; “London is larger than Paris”; “London is in England”.

PIQUANT at AQUAINT Kick-Off Dec Parallel Confidence Propagation QFRAMES QPLANS Question Classifications Confidences Candidate Answers Selected Answers Goals (logical forms) with boolean connectives, sequencing and recombination information Validation and Sanity Checks Eliminate some Answers and Adjust Confidences

PIQUANT at AQUAINT Kick-Off Dec Probability Management Associated with every data element A priori probabilities associated with every processing module. Given default values at first, then learned as experience is gained Bayesian, Dempster-Shafer, …

PIQUANT at AQUAINT Kick-Off Dec IBM PIQUANT High-Level Architecture

PIQUANT at AQUAINT Kick-Off Dec IBM PIQUANT Block Diagram

PIQUANT at AQUAINT Kick-Off Dec Knowledge Representation Reasoning Services Ontology & Data Services Question Classification QA-Manager Internals QFRAMES QPLANS QPLAN Execution Engine IR WN DB CYC KB NLP Components Linguistic Question Analysis Answer Presentation Answers QFRAME Plan Generation Answer Resolution Answer Candidates QGOAL

PIQUANT at AQUAINT Kick-Off Dec Question Classification “Daemons” Definition –What is OPEC? Comparative & Superlative –Does Kuwait export more oil than Venezuela? –Which country exports the most uranium? Profile –Who is Rabbani? Relationship –Which countries are allies of Qatar? Chronology –Was OPEC formed before Nixon became president? Enumeration –How many oil refineries are in the U.S.? Cause & Effect –Why did Iraq invade Kuwait? Combination –Which countries are Qatar’s most powerful allies? Classifiers act as “daemons”; perform recognition and sub-plan generation

PIQUANT at AQUAINT Kick-Off Dec Architectural Features Modularity –Self-contained components with well-defined functions and interfaces –Ease of development, experimentation and maintenance Robustness –If a “Knowledge Source” fails the system will continue to operate with (minor) degradation –Exploit redundancy to find best answer Reinforcement –Multiple sources of evidence for same answer are synergistic Transparency –Explicit plans permit ready generation of explanations and symbolic analysis

PIQUANT at AQUAINT Kick-Off Dec IBM PIQUANT Implementation Highlights

PIQUANT at AQUAINT Kick-Off Dec Implementation Highlights Predictive Annotation –Shift computational burden from NLP towards IR –Index semantic labels along with text –Beat the Precision-Recall tradeoff by boosting precision at little cost to recall Virtual Annotation –Answer definitional (“What is”) questions by combination of linguistic, ontological and statistical techniques –Find the hypernyms in e.g. WordNet that have the best combination of closeness and co-occurrence

PIQUANT at AQUAINT Kick-Off Dec Predictive Annotation (1) Predictive Annotation Annotate entire corpus and index semantic labels along with text Identify sought-after label(s) in questions and include in queries Example: Question is “Where is Belize?” –“Where” can map to CONTINENT$, COUNTRY$, STATE$, CITY$, CAPITAL$, PLACE$. –Knowing Belize is a country: “Where is Belize?”  {CONTINENT$ Belize} (assume CONTINENT$  Continents plus sub-continental regions) Suppose text is “… including Belize in central America … ” includingCOUNTRY$ PLACE$ CONTINENT$ PLACE$ Belize in centralAmerica

PIQUANT at AQUAINT Kick-Off Dec Predictive Annotation (2) Increased precision of enhanced bag-of-words: –“Where is Belize”  {CONTINENT$ Belize} –Belize occurs 704 times in TREC corpus –Belize and CONTINENT$ co-occur in only 22 sentences Note: data structure equally appropriate for “Name a country in Central America”, which  {COUNTRY$ Central America} includingCOUNTRY$ PLACE$ CONTINENT$ PLACE$ Belize in centralAmerica

PIQUANT at AQUAINT Kick-Off Dec

PIQUANT at AQUAINT Kick-Off Dec Summary Leverage existing technology base Parallel approach to find answer, exploiting redundancy Declarative plan representation Associate confidences with each component and each intermediate and final result CYC’s knowledge-base and inference engine to solve sub- problems and eliminate nonsensical answer candidates

PIQUANT at AQUAINT Kick-Off Dec High-Level 1 st Year Development Plan Finalize design of data-structures: –QFRAME: question and derived attributes –QPLAN: script for tackling solution –QGOAL: logical-form like structure representing predicate for instantiation or verification Build several recognizers and QPLAN executor (many pieces already exist) Run on many examples to fine-tune and to develop a priori component confidence values Build answer resolution module

PIQUANT at AQUAINT Kick-Off Dec IBM PIQUANT Back up Slides

PIQUANT at AQUAINT Kick-Off Dec Statistical Features Co occurrences to support definition answers Machine Learning to evaluate search engine results Machine Learning to assist in answer selection Learn probable confidence of question recognizers

PIQUANT at AQUAINT Kick-Off Dec QPLAN Multiple per question type Declarative representation of a solution –Independent of knowledge source’s details Executed by planning engine Sequence of solution steps –structure knowledge queries –text search queries –statistical queries etc. Confidences learned over time

PIQUANT at AQUAINT Kick-Off Dec High-level View of Solution Steps 1.Question is processed by linguistic tools. 2.Question is classified into 1 or more types 3.Parallel solution plan is generated and executed. 4.Responses are gathered and examined. 5.If necessary, plan is revised and steps 3-5 revisited. 6.Candidate answers are checked for sanity, merged, sorted and presented Note: a.Dialog manager functions are not considered here. b.All data-structures are assigned confidences and all selections of next steps are mediated by probabilistic computations.