QuASI: Question Answering using Statistics, Semantics, and Inference

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
References Kempen, Gerard & Harbusch, Karin (2002). Performance Grammar: A declarative definition. In: Nijholt, Anton, Theune, Mariët & Hondorp, Hendri.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
Presented by Zeehasham Rasheed
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis seeking strategies. Statistical language analysis techniques.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Survey of Semantic Annotation Platforms
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
WSMX Execution Semantics Executable Software Specification Eyal Oren DERI
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Artificial Intelligence 2004 Ontology
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Mining the Biomedical Research Literature Ken Baclawski.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
The Unreasonable Effectiveness of Data
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Natural Language Processing Vasile Rus
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Advanced Computer Systems
Approaches to Machine Translation
Sentiment analysis algorithms and applications: A survey
PRESENTED BY: PEAR A BHUIYAN
Learning Attributes and Relations
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Presented by: Hassan Sayyadi
Natural Language Processing (NLP)
Compiler Lecture 1 CS510.
Statistical NLP: Lecture 13
Knowledge Representation
Statistical NLP: Lecture 9
[jws13] Evaluation of instance matching tools: The experience of OAEI
Approaches to Machine Translation
Semantic Markup for Semantic Web Tools:
Symbolic AI 2.0 Yi Zhou.
Using Natural Language Processing to Aid Computer Vision
Natural Language Processing (NLP)
Exploring Probability Through Yahtzee Extensions
Jana Diesner, PhD Associate Professor, UIUC
Statistical NLP : Lecture 9 Word Sense Disambiguation
Deep Structured Scene Parsing by Learning with Image Descriptions
Natural Language Processing (NLP)
Presentation transcript:

QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley / ICSI / Stanford Univ

Outline Project Overview Motivating Example Research Approaches Text-based Analysis Concept-based Analysis Summary

Main Goals Support Question-Answering and NLP in general by: Deepening our understanding of concepts that underlie all languages Creating empirical approaches to identifying semantic relations from free text Developing probabilistic inferencing algorithms

Two Main Thrusts Text-based: Concept-based: Use empirical corpus-based techniques to extract simple semantic relations Combine these relations to perform simple inferences “statistical semantic grammar” Concept-based: Determine language-universal conceptual principles Determine how inferences are made among these

Principal Project Personnel Text-based Prof. Marti Hearst Prof. Chris Manning Concept-based Prof. Jerry Feldman Dr. Srini Narayanan

Target System: Overview Training Corpora Statistical Input Other Applications Semantic Semantic Relations Parser Phase Two Probabilistic Knowledge Answers Cognitive Linguistics Universal Inference Algorithms Schemas

Motivating Example Anthrax Scare Continues to Paralyze the Federal Government The inhalation anthrax scare threatens to cripple several Federal Government activities. Legislation in Congress continues to remain at a standstill while the Senate is conducting business at a reduced pace. The Postal service confirms that manual inspections for anthrax spores have reduced mail processing and delivery to a slow crawl.

Motivating Example: Labeling Conceptual Relations Anthrax Scare[NN] Continues to[Aspect] Paralyze[Force Dynamic] the Federal Government[NN] The inhalation anthrax scare[NN] threatens to cripple[Force Dynamic] several Federal Government activities [NN]. Legislation in Congress continues to[Aspect] remain at a standstill [ES Map]while the Senate is conducting [Aspect] business at a reduced pace[ES Map]. The Postal service [NN] confirms that manual inspections[NN] for anthrax spores[NN] have reduced[Aspect, Scale] mail processing and delivery [NN] to a slow crawl[ES Map].

Text-based Analysis

Text-based Analysis Main Tasks: Modify probabilistic parsing algorithms to Take semantic relations into account Better support ambiguity resolution Support co-reference resolution Automate identification of semantic relations via Machine Learning by Leveraging off of lexical ontologies Building large training sets via bootstrapping techniques

Towards better statistical parsers: Head Corner Based Derivation Process Start with a known goal category, which is the start symbol of the grammar First find a head for that constituent, and then parse outward from that head This approach Better exploits headed structure of natural language Lets us work outward from “islands of certainty”

Towards better statistical parsers: Head Corner Based Derivation Process Once we have a head, we decide what kind of phrase it heads what kinds of arguments the head is likely to have. Then recursively apply this procedure to each argument This gives us a generative probabilistic model of sentence probabilities. Crucially, we always have governing and less oblique heads available, thus supporting disambiguation.

Semantic Role Analysis Semantic roles provide a limited level of semantics that nevertheless allows reasoning across lexicalization patterns Goal is to explore bootstrapping knowledge of semantic roles from limited lexical resources

Semantic Role Labeling Example on NN Compounds inhalation anthrax scare anthrax scare -> caused-by relation caused-by(PublicConcern, InfectiousDisease) inhalation anthrax -> type-of relation, or more specifically, contracted-by relation contracted-by(Disease,ExposureType) -> InfectiousDisease caused-by(PublicConcern, contracted-by (Disease,ExposureType))

Semantic Role Labeling Example on NN Compounds Approach: Train a model based on labeled data and a lexical hierarchy Preliminary results: ~60% accuracy on an 18-way classification and small training set Next step: Create a larger training set via bootstrapping Find lexico-syntactic patterns that unambiguously indicate the relation of interest Use these to label new instances Use these + lexical ontology to create probability model of which subtrees, when combined, yield which relations

Concept-based Analysis

Inference and Conceptual Schemas Hypothesis: Linguistic input is converted into a mental simulation based on bodily-grounded structures. Components: Semantic schemas image schemas and executing schemas are abstractions over neurally grounded perceptual and motor representations Linguistic units lexical and phrasal construction representations invoke schemas Inference links these structures and provides parameters for a simulation engine

Concept-based Analysis Main Tasks: Formalize Image Schemas Identify Cross-lingual Conceptual Schemas Apply Probabilistic Relational Models to Inferencing over Conceptual Schemas

Conceptual Schemas Much is known about conceptual schemas, particularly images schemas However, this understanding has not yet been formalized We will develop such a formalism They have also not been checked extensively against other languages We will examine Chinese, Russian, and German, in addition to English

Extending Inferential Capabilities Given the formalization of the conceptual schemas How to use them for inferencing? Earlier pilot systems Used Bayesian belief networks Successfully construed certain inferences But don’t scale New approach Probabilistic relational models Support an open ontology

A Common Representation Representation should support Uncertainty, probability Conflicts, contradictions Current plan Probabilistic Relational Models (Koller et al.) DAML + OIL

An Open Ontology for Conceptual Relations Build a formal markup language for conceptual schemas We propose to use DAML+OIL as the base. Advantages of the approach Common framework for extending and reuse Closer ties to other efforts within AQUAINT as well as the larger research community on the Semantic Web. Some Issues Expressiveness of DAML+OIL Representing Probabilistic Information

DAML-I: An Image Schema Markup Language <daml:Class rdf:ID="SPG"> <rdf s:comment> A basic type of schema </rdfs:comment> <rdfs:subClassOf rdf:resource="#Schema"/> </daml:Class> <daml:objectProperty rdf:ID="source"> <daml:subPropertyOf rdf:resource="&conc-rel;#role"/ <daml:domain rdf:resource="#SPG"/> <daml:range rdf:resource="&daml;#Thing"/> </daml:objectProperty>

Putting it all Together We have proposed two different types of semantics Universal conceptual schemas Semantic relations In Phase I they will remain separate However, we are exploring using PRMs as a common representational format In later Phases they will be combined

Summary Goal: Deep Semantic Interpretation of Text Build a foundation for deep, yet robust and scalable, semantic analysis of human language, with applications to question answering from huge text collections. Use semantic schemas, probabilistic language processing and knowledge representation, machine learning and bootstrapping.