QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Software Testing and Quality Assurance
References Kempen, Gerard & Harbusch, Karin (2002). Performance Grammar: A declarative definition. In: Nijholt, Anton, Theune, Mariët & Hondorp, Hendri.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Identifying Abbreviation Definitions in Biomedical Text Ariel SchwartzMarti Hearst.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
HAS. Patterns The use of patterns is essentially the reuse of well established good ideas. A pattern is a named well understood good solution to a common.
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
Presented by Zeehasham Rasheed
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Introduction to Machine Learning Approach Lecture 5.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Introduction to Systems Analysis and Design Trisha Cummings.
9/8/20151 Natural Language Processing Lecture Notes 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Artificial Intelligence Dr. Paul Wagner Department of Computer Science University of Wisconsin – Eau Claire.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Knowledge representation
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg.
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY WITHIN THE (SEMANTIC) WEB Matthew Williams
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
Introduction to Embodied Construction Grammar March 4, 2003 Ben Bergen
1 What is OO Design? OO Design is a process of invention, where developers create the abstractions necessary to meet the system’s requirements OO Design.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Compiler design Lecture 1: Compiler Overview Sulaimany University 2 Oct
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Mining the Biomedical Research Literature Ken Baclawski.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Data Mining and Decision Support
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Approaches to Machine Translation
QuASI: Question Answering using Statistics, Semantics, and Inference
Approaches to Machine Translation
Semantic Markup for Semantic Web Tools:
CS246: Information Retrieval
Parsing Unrestricted Text
Information Retrieval
Presentation transcript:

QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley / ICSI / Stanford University

Outline Project Overview Three topics: Assigning semantic relations via lexical hierarchies From sentences to meanings via syntax From text analysis to inference using conceptual schemas

Main Goals Support Question-Answering and NLP in general by: Deepening our understanding of concepts that underlie all languages Creating empirical approaches to identifying semantic relations from free text Developing probabilistic inferencing algorithms

Two Main Thrusts Text-based: Use empirical corpus-based techniques to extract simple semantic relations Combine these relations to perform simple inferences “statistical semantic grammar” Concept-based: Determine language-universal conceptual principles Determine how inferences are made among these

Relation Recognition Abbreviation Definition Recognition Semantic Relation Identification

UCB, Sept-Nov, 2002 Abbreviation Definition Recognition Developed and evaluated new algorithm Better results than existing approaches Simpler and faster as well Semantic Relation Identification Developed syntactic chunker Analyzed sample relations Began development of a new computational model Incorporates syntax and semantic labels Test example: identify “treatment for disease”

Abbreviation Examples “Heat-shock protein 40 (Hsp40) enables Hsp70 to play critical roles in a number of cellular processes, such as protein folding, assembly, degradation and translocation in vivo.” “Glutathione S-transferase pull-down experiments showed the direct interaction of in vitro translated p110, p64, and p58 of the essential CBF3 kinetochore protein complex with Cbf1p, a basic region helix-loop-helix zipper protein (bHLHzip) that specifically binds to the CDEI region on the centromere DNA.” “Hpa2 is a member of the Gcn5-related N- acetyltransferase (GNAT) superfamily, a family of enzymes with diverse substrates including histones, other proteins,arylalkylamines and aminoglycosides.”

Related Work Pustejovsky et al. present a solution based on hand-build regular expression and syntactic information. Achieved 72% recall at 98% Chang et al. use linear regression on a pre-selected set of features. Achieved 83% recall at 80% * precision, and 75% recall at 95% precision. Park and Byrd present a rule-based algorithm for extraction of abbreviation definitions in general text. Yoshida et al. present an approach close to ours, trying to first match characters on word and syllable boundaries. * Counting partial matches, and abbreviations missing from the “gold-standard” their algorithm achieved 83% recall at 98% precision.

The Algorithm Much simpler than other approaches. Extracts abbreviation-definition candidates adjacent to parentheses. Finds correct definitions by matching characters in the abbreviation to characters in the definition, starting from the right. The first character in the abbreviation must match a character at the beginning of a word in the definition. To increase precision a few simple heuristics are applied to eliminate incorrect pairs. Example: Heat shock transcription factor (HSF). The algorithm finds the correct definition, but not the correct alignment: Heat shock transcription factor

Results On the “gold-standard” the algorithm achieved 83% recall at 96% precision. * On a larger test collection the results were 82% recall at 95% precision. These results show that a very simple algorithm produces results that are comparable to these of the exiting more complex algorithms. * Counting partial matches, and abbreviations missing from the “gold-standard” our algorithm achieved 83% recall at 99% precision.

From sentences to meanings via syntax Factored A* Parsing Relational approaches to semantic relations Learning aspectual distinctions

Factored A* Parsing Goal: develop a lexicalized parser that is fast, accurate and exact [finds the model’s best parse] Technology exists to get any two, but not all three Approximate Parsing – Fast but Inexact Beam or “Best-First” Parsing [Charniak, Collins, etc.] Factored: represent tree and dependencies separately Simple, modular, extensible design Permits fast, high accuracy, exact inference A* estimates combined from product of experts model Available from: [Java, src]

Factored A* Parsing Syntactic Model BasicBest Semanti c Model None Basic Best Syntactic Model NoneBasicBest Semantic Model Basic Best Dependency Accuracy Labeled Bracketing Accuracy (F1) Work Done LL TT DD

Learning Semantic Relations FrameNet as starting point and training data Constraint Resolution for Entire Relations Logical Relations Probabilistic Models Combinations of the Two Bootstrap to new domains Building blocks for Q/A relevant tasks: Semantic Roles in Text Inference Improved Syntactic Parsing

Learning Aspect:The Perfect English perfect has experiential, relevant, and durative readings: have been to Bali vs. have just eaten lunch Disambiguation is necessary for text understanding: John has traveled to Malta [now, or in the past?] Siegel (2000) looked at inherent but not contextual aspect Current status: annotation underway for training statistical classifier Ref- time Event John has lived in Miami for ten years now. Event John has lived in Miami before.

Concept-based Analysis From text analysis to inference using conceptual schemas Relational Probabilistic Models Open Domain Conceptual Relations

Inference and Conceptual Schemas Hypothesis: Linguistic input is converted into a mental simulation based on bodily-grounded structures. Components: Semantic schemas image schemas and executing schemas are abstractions over neurally grounded perceptual and motor representations Linguistic units lexical and phrasal construction representations invoke schemas, in part through metaphor Inference links these structures and provides parameters for a simulation engine

Conceptual Schemas Much is known about conceptual schemas, particularly image schemas However, this understanding has not yet been formalized We will develop such a formalism They have also not been checked extensively against other languages We will examine Chinese, Russian, and other languages in addition to English

Schema Formalism SCHEMA SUBCASE OF EVOKES AS ROLES : CONSTRAINTS :: :: |

A Simple Example SCHEMA hypotenuse SUBCASE OF line-segment EVOKES right-triangle AS rt ROLES Comment inherited from line-segment CONSTRAINTS SELF rt.long-side

Source-Path-Goal SCHEMA: spg ROLES: source: Place path: Directed Curve goal: Place trajector: Entity

Extending Inferential Capabilities Given the formalization of the conceptual schemas How to use them for inferencing? Earlier pilot systems Used metaphor and Bayesian belief networks Successfully construed certain inferences But don’t scale New approach Probabilistic relational models Support an open ontology

A Common Representation Representation should support Uncertainty, probability Conflicts, contradictions Current plan Probabilistic Relational Models (Koller et al.) DAML + OIL

Status of PRM for AQUAINT Fall 2002 Designed the basic PRM code-base/infrastructure Packages for BN’s, OOBN. Designed PRM inference Algorithm. Spring-Summer 2003 Implement the PRM inference Algorithm Design Dynamic Probabilistic Relational Models (DPRM) Implement DPRM to replace Pilot System DBN Test DPRM for QA Related Work Probabilistic OWL (PrOWL) Probabilistic FrameNet

An Open Ontology for Conceptual Relations Build a formal markup language for conceptual schemas We propose to use DAML+OIL/OWL as the base. Advantages of the approach Common framework for extending and reuse Closer ties to other efforts within AQUAINT as well as the larger research community on the Semantic Web. Some Issues Expressiveness of DAML+OIL Representing Probabilistic Information Extension to MetaNet, capture abstract concepts

Current Status Summer/Fall 2002 FrameNet-1 is available in DAML+OIL Image Schemas have been formalized and DAML+OIL representation designed Initial set of Metaphors and an SQL Metaphor database is in place. Spring 2003 Populate Metaphor Database Populate Image Schema Database Summer 2003 Test Inferencing with Image Schemas for QA.

Putting it all Together We have proposed two different types of semantics Universal conceptual schemas Semantic relations In Phase I they will remain separate However, we are exploring using PRMs as a common representational format In later Phases they will be combined