QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Ontology Assessment – Proposed Framework and Methodology.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
References Kempen, Gerard & Harbusch, Karin (2002). Performance Grammar: A declarative definition. In: Nijholt, Anton, Theune, Mariët & Hondorp, Hendri.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
Presented by Zeehasham Rasheed
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
9/8/20151 Natural Language Processing Lecture Notes 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Survey of Semantic Annotation Platforms
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
WSMX Execution Semantics Executable Software Specification Eyal Oren DERI
A Language Independent Method for Question Classification COLING 2004.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
What’s MPEG-21 ? (a short summary of available papers by OCCAMM)
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
The Unreasonable Effectiveness of Data
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
Object-Oriented Parsing and Transformation Kenneth Baclawski Northeastern University Scott A. DeLoach Air Force Institute of Technology Mieczyslaw Kokar.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Natural Language Processing Vasile Rus
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
PRESENTED BY: PEAR A BHUIYAN
QuASI: Question Answering using Statistics, Semantics, and Inference
Web Service Modeling Ontology (WSMO)
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Natural Language Processing (NLP)
Knowledge Representation
Statistical NLP: Lecture 9
Semantic Markup for Semantic Web Tools:
Using Natural Language Processing to Aid Computer Vision
Natural Language Processing (NLP)
Exploring Probability Through Yahtzee Extensions
Natural Language Processing (NLP)
Presentation transcript:

QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley / ICSI / Stanford Univ

Outline Project Overview Motivating Example Research Approaches Text-based Analysis Concept-based Analysis Summary

Main Goals Support Question-Answering and NLP in general by: Deepening our understanding of concepts that underlie all languages Creating empirical approaches to identifying semantic relations from free text Developing probabilistic inferencing algorithms

Two Main Thrusts Text-based: Use empirical corpus-based techniques to extract simple semantic relations Combine these relations to perform simple inferences “statistical semantic grammar” Concept-based: Determine language-universal conceptual principles Determine how inferences are made among these

Principal Project Personnel Text-based Prof. Marti Hearst Prof. Chris Manning Concept-based Prof. Jerry Feldman Dr. Srini Narayanan

Universal Schemas Inference Algorithms Probabilistic Knowledge Semantic Relations Statistical Semantic Parser Phase Two Answers Input Other Applications Target System: Overview Training Corpora Cognitive Linguistics

Motivating Example Anthrax Scare Continues to Paralyze the Federal Government The inhalation anthrax scare threatens to cripple several Federal Government activities. Legislation in Congress continues to remain at a standstill while the Senate is conducting business at a reduced pace. The Postal service confirms that manual inspections for anthrax spores have reduced mail processing and delivery to a slow crawl.

Motivating Example: Labeling Conceptual Relations Anthrax Scare[NN] Continues to[Aspect] Paralyze[Force Dynamic] the Federal Government[NN] The inhalation anthrax scare[NN] threatens to cripple[Force Dynamic] several Federal Government activities [NN]. Legislation in Congress continues to[Aspect] remain at a standstill [ES Map]while the Senate is conducting [Aspect] business at a reduced pace[ES Map]. The Postal service [NN] confirms that manual inspections[NN] for anthrax spores[NN] have reduced[Aspect, Scale] mail processing and delivery [NN] to a slow crawl[ES Map].

Text-based Analysis

Main Tasks: Modify probabilistic parsing algorithms to Take semantic relations into account Better support ambiguity resolution Support co-reference resolution Automate identification of semantic relations via Machine Learning by Leveraging off of lexical ontologies Building large training sets via bootstrapping techniques

Towards better statistical parsers: Head Corner Based Derivation Process Start with a known goal category, which is the start symbol of the grammar First find a head for that constituent, and then parse outward from that head This approach Better exploits headed structure of natural language Lets us work outward from “islands of certainty”

Towards better statistical parsers: Head Corner Based Derivation Process Once we have a head, we decide what kind of phrase it heads what kinds of arguments the head is likely to have. Then recursively apply this procedure to each argument This gives us a generative probabilistic model of sentence probabilities. Crucially, we always have governing and less oblique heads available, thus supporting disambiguation.

Semantic Role Analysis Semantic roles provide a limited level of semantics that nevertheless allows reasoning across lexicalization patterns Goal is to explore bootstrapping knowledge of semantic roles from limited lexical resources

Semantic Role Labeling Example on NN Compounds inhalation anthrax scare anthrax scare -> caused-by relation caused-by(PublicConcern, InfectiousDisease) inhalation anthrax -> type-of relation, or more specifically, contracted-by relation contracted-by(Disease,ExposureType) -> InfectiousDisease inhalation anthrax scare caused-by(PublicConcern, contracted-by (Disease,ExposureType))

Semantic Role Labeling Example on NN Compounds Approach: Train a model based on labeled data and a lexical hierarchy Preliminary results: ~60% accuracy on an 18- way classification and small training set Next step: Create a larger training set via bootstrapping Find lexico-syntactic patterns that unambiguously indicate the relation of interest Use these to label new instances Use these + lexical ontology to create probability model of which subtrees, when combined, yield which relations

Concept-based Analysis

Inference and Conceptual Schemas Hypothesis: Linguistic input is converted into a mental simulation based on bodily-grounded structures. Components: Semantic schemas image schemas and executing schemas are abstractions over neurally grounded perceptual and motor representations Linguistic units lexical and phrasal construction representations invoke schemas Inference links these structures and provides parameters for a simulation engine

Concept-based Analysis Main Tasks: Formalize Image Schemas Identify Cross-lingual Conceptual Schemas Apply Probabilistic Relational Models to Inferencing over Conceptual Schemas

Conceptual Schemas Much is known about conceptual schemas, particularly images schemas However, this understanding has not yet been formalized We will develop such a formalism They have also not been checked extensively against other languages We will examine Chinese, Russian, and German, in addition to English

Extending Inferential Capabilities Given the formalization of the conceptual schemas How to use them for inferencing? Earlier pilot systems Used Bayesian belief networks Successfully construed certain inferences But don’t scale New approach Probabilistic relational models Support an open ontology

A Common Representation Representation should support Uncertainty, probability Conflicts, contradictions Current plan Probabilistic Relational Models (Koller et al.) DAML + OIL

An Open Ontology for Conceptual Relations Build a formal markup language for conceptual schemas We propose to use DAML+OIL as the base. Advantages of the approach Common framework for extending and reuse Closer ties to other efforts within AQUAINT as well as the larger research community on the Semantic Web. Some Issues Expressiveness of DAML+OIL Representing Probabilistic Information

DAML-I: An Image Schema Markup Language A basic type of schema <daml:subPropertyOf rdf:resource="&conc-rel;#role"/

Putting it all Together We have proposed two different types of semantics Universal conceptual schemas Semantic relations In Phase I they will remain separate However, we are exploring using PRMs as a common representational format In later Phases they will be combined

Summary Goal: Deep Semantic Interpretation of Text Build a foundation for deep, yet robust and scalable, semantic analysis of human language, with applications to question answering from huge text collections. Use semantic schemas, probabilistic language processing and knowledge representation, machine learning and bootstrapping.