PIQUANT at AQUAINT Kick-Off Dec PIQUANT Practical Intelligent QUestion ANswering Technology A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation Prime Contractor: IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne, NY Subcontractor: Cycorp
PIQUANT at AQUAINT Kick-Off Dec IBM & Cycorp Bringing Complementary Strengths to QA IBM –Information Retrieval –Natural Language Processing –Scalable System Architectures –Business Applications Architectures Cycorp –Structured Knowledge Representation –Rich Common Sense Knowledge Bases –Deep Inferencing –Ontologies Both symbolic and statistical }
PIQUANT at AQUAINT Kick-Off Dec Experience from TREC8-10 End-to-end system that has performed well Invaluable experience in learning where the problems are: –Coverage –Engineering –Understanding
PIQUANT at AQUAINT Kick-Off Dec IBM’s PIQUANT Principal Extensions Integration of IR/NLP with Structure KBs and Deep Inference –Knowledge System to assist in decomposing and answering questions –Provide justification and/or invalidation of candidate answers Parallel Solution Paths and Pervasive Confidence Analysis –Multiple parallel solution approaches to problem/subproblem –Pervasive use of confidences to mediate management of alternatives –Extensive reinforcement of symbolic approaches by statistical data Well-Defined Component Architecture –Modular –Defined interfaces between NLP, IR, KS and Statistical Components –Declarative representation of question answering plans
PIQUANT at AQUAINT Kick-Off Dec Where Knowledge-Systems Help Heuristic of finding short passages with all the query terms/semantic classes is good but not sufficient. E.g. from TREC9: Q: How much folic acid should an expectant mother take daily? A: 360 tons Q: What is the diameter of the Earth? A: 14 ft. Q: How many states have a lottery? A: 3,312 We will investigate the use of a sophisticated inference engine and knowledge-base (Cyc) to eliminate such answers.
PIQUANT at AQUAINT Kick-Off Dec Question Complexity “Simple” questions are not a solved problem: Complex questions can be decomposed into simpler components. If simpler questions cannot be handled successfully, there’s no hope for more complex ones. BUT: Areas not explored (intentionally) by TREC to date: spelling errors grammatical errors syntactic precision e.g. significance of articles not, only, just …
PIQUANT at AQUAINT Kick-Off Dec Is there such a thing as a “simple” question? A: How many members are there in the Cabinet? Which is more complex? Suppose there is no text that gives the answer explicitly 42 (from HGTTG) B: What is the meaning of life? “simple” -> “simple to state” Complexity is a function for question and data source
PIQUANT at AQUAINT Kick-Off Dec Different Solution Approaches What is the largest city in England? Text Match –Find text that says “London is the largest city in England” (or paraphrase). Confidence is confidence of NL parser * confidence of source. Find multiple instances and confidence of source -> 1. “Superlative” Search –Find a table of English cities and their populations, and sort. –Find a list of the 10 largest cities in the world, and see which are in England. Uses logic: if L > all objects in set R then L > all objects in set E R. –Find the population of as many individual English cities as possible, and choose the largest. Heuristics –London is the capital of England. (Not guaranteed to imply it is the largest city, but quite likely.) Complex Inference –E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”; “London is larger than Paris”; “London is in England”.
PIQUANT at AQUAINT Kick-Off Dec Parallel Confidence Propagation QFRAMES QPLANS Question Classifications Confidences Candidate Answers Selected Answers Goals (logical forms) with boolean connectives, sequencing and recombination information Validation and Sanity Checks Eliminate some Answers and Adjust Confidences
PIQUANT at AQUAINT Kick-Off Dec Probability Management Associated with every data element A priori probabilities associated with every processing module. Given default values at first, then learned as experience is gained Bayesian, Dempster-Shafer, …
PIQUANT at AQUAINT Kick-Off Dec IBM PIQUANT High-Level Architecture
PIQUANT at AQUAINT Kick-Off Dec IBM PIQUANT Block Diagram
PIQUANT at AQUAINT Kick-Off Dec Knowledge Representation Reasoning Services Ontology & Data Services Question Classification QA-Manager Internals QFRAMES QPLANS QPLAN Execution Engine IR WN DB CYC KB NLP Components Linguistic Question Analysis Answer Presentation Answers QFRAME Plan Generation Answer Resolution Answer Candidates QGOAL
PIQUANT at AQUAINT Kick-Off Dec Question Classification “Daemons” Definition –What is OPEC? Comparative & Superlative –Does Kuwait export more oil than Venezuela? –Which country exports the most uranium? Profile –Who is Rabbani? Relationship –Which countries are allies of Qatar? Chronology –Was OPEC formed before Nixon became president? Enumeration –How many oil refineries are in the U.S.? Cause & Effect –Why did Iraq invade Kuwait? Combination –Which countries are Qatar’s most powerful allies? Classifiers act as “daemons”; perform recognition and sub-plan generation
PIQUANT at AQUAINT Kick-Off Dec Architectural Features Modularity –Self-contained components with well-defined functions and interfaces –Ease of development, experimentation and maintenance Robustness –If a “Knowledge Source” fails the system will continue to operate with (minor) degradation –Exploit redundancy to find best answer Reinforcement –Multiple sources of evidence for same answer are synergistic Transparency –Explicit plans permit ready generation of explanations and symbolic analysis
PIQUANT at AQUAINT Kick-Off Dec IBM PIQUANT Implementation Highlights
PIQUANT at AQUAINT Kick-Off Dec Implementation Highlights Predictive Annotation –Shift computational burden from NLP towards IR –Index semantic labels along with text –Beat the Precision-Recall tradeoff by boosting precision at little cost to recall Virtual Annotation –Answer definitional (“What is”) questions by combination of linguistic, ontological and statistical techniques –Find the hypernyms in e.g. WordNet that have the best combination of closeness and co-occurrence
PIQUANT at AQUAINT Kick-Off Dec Predictive Annotation (1) Predictive Annotation Annotate entire corpus and index semantic labels along with text Identify sought-after label(s) in questions and include in queries Example: Question is “Where is Belize?” –“Where” can map to CONTINENT$, COUNTRY$, STATE$, CITY$, CAPITAL$, PLACE$. –Knowing Belize is a country: “Where is Belize?” {CONTINENT$ Belize} (assume CONTINENT$ Continents plus sub-continental regions) Suppose text is “… including Belize in central America … ” includingCOUNTRY$ PLACE$ CONTINENT$ PLACE$ Belize in centralAmerica
PIQUANT at AQUAINT Kick-Off Dec Predictive Annotation (2) Increased precision of enhanced bag-of-words: –“Where is Belize” {CONTINENT$ Belize} –Belize occurs 704 times in TREC corpus –Belize and CONTINENT$ co-occur in only 22 sentences Note: data structure equally appropriate for “Name a country in Central America”, which {COUNTRY$ Central America} includingCOUNTRY$ PLACE$ CONTINENT$ PLACE$ Belize in centralAmerica
PIQUANT at AQUAINT Kick-Off Dec
PIQUANT at AQUAINT Kick-Off Dec Summary Leverage existing technology base Parallel approach to find answer, exploiting redundancy Declarative plan representation Associate confidences with each component and each intermediate and final result CYC’s knowledge-base and inference engine to solve sub- problems and eliminate nonsensical answer candidates
PIQUANT at AQUAINT Kick-Off Dec High-Level 1 st Year Development Plan Finalize design of data-structures: –QFRAME: question and derived attributes –QPLAN: script for tackling solution –QGOAL: logical-form like structure representing predicate for instantiation or verification Build several recognizers and QPLAN executor (many pieces already exist) Run on many examples to fine-tune and to develop a priori component confidence values Build answer resolution module
PIQUANT at AQUAINT Kick-Off Dec IBM PIQUANT Back up Slides
PIQUANT at AQUAINT Kick-Off Dec Statistical Features Co occurrences to support definition answers Machine Learning to evaluate search engine results Machine Learning to assist in answer selection Learn probable confidence of question recognizers
PIQUANT at AQUAINT Kick-Off Dec QPLAN Multiple per question type Declarative representation of a solution –Independent of knowledge source’s details Executed by planning engine Sequence of solution steps –structure knowledge queries –text search queries –statistical queries etc. Confidences learned over time
PIQUANT at AQUAINT Kick-Off Dec High-level View of Solution Steps 1.Question is processed by linguistic tools. 2.Question is classified into 1 or more types 3.Parallel solution plan is generated and executed. 4.Responses are gathered and examined. 5.If necessary, plan is revised and steps 3-5 revisited. 6.Candidate answers are checked for sanity, merged, sorted and presented Note: a.Dialog manager functions are not considered here. b.All data-structures are assigned confidences and all selections of next steps are mediated by probabilistic computations.