CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Explanation Producing Combination of NLP and Logical Reasoning through Translation of Text to KR Formalisms CHITTA BARAL ARIZONA STATE UNIVERSITY 1 School.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Geographical Service: Gianluca Correndo, Manuel Salvadores, Yang Yang, Nicholas Gibbins, Nigel Shadbolt A compass for the Web of Data.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Survey of Semantic Annotation Platforms
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
Based on “Semi-Supervised Semantic Role Labeling via Structural Alignment” by Furstenau and Lapata, 2011 Advisors: Prof. Michael Elhadad and Mr. Avi Hayoun.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
1 Commonsense Reasoning in and over Natural Language Hugo Liu Push Singh Media Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA.
Linguistic Graph Similarity for News Sentence Searching
Approaches to Machine Translation
Chapter 1 Introduction.
PRESENTED BY: PEAR A BHUIYAN
Web News Sentence Searching Using Linguistic Graph Similarity
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Chapter 1 Introduction.
Natural Language Processing (NLP)
CS416 Compiler Design lec00-outline September 19, 2018
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Approaches to Machine Translation
CS416 Compiler Design lec00-outline February 23, 2019
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Natural Language Processing (NLP)
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Artificial Intelligence 2004 Speech & Natural Language Processing
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Natural Language Processing (NLP)
Presentation transcript:

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio Alessio Paolucci Università Degli Studi Dell’Aquila

Outline 1. Motivation 2. Our Proposal 3. Workflow 4. Deep Analysis: Parsing & Dependency Structure 5. Context Disambiguation 6. Resolution 7. OOLOT 8. RDF/OWL Exporting 9. Example 10. Conclusion

“overcome the knowledge acquisition bottleneck” Motivation

Structured data from plain text The more interesting one: Ontology population (Semantic Web) …but endless possibilities!!!

Our Proposal Our framework allows us to:  Extract knowledge from natural language sentences using a deep analysis technique based on linguistic dependencies and phrase syntactic structure.  Use OOLOT (Ontology Oriented Language of Thought) an intermediate language based on ASP (Answer Set Programming), specifically designed for the representation of the distinctive features of the knowledge extracted from natural language.  Easily Integrate our framework in the context of the Semantic Web. OOLOT lets us exploit the non monotonic reasoning (through ASP) to deal with common sense reasoning and other typical aspects of the knowledge encoded through the Natural Language.

Workflow

Parsing Syntactic Parsing:  It can determine the syntactic structure of a sentence  Chomsky’s constituent analysis  It builds up the elements in their hierarchical order  Syntactic parsers decompose a text into tokens and attribute them their grammatical function Statistical Parsing:  It is based on a corpus of training annotated data  It gathers information about the frequency with which the elements are needed in specific contexts  Only statistic may be not enough to determine when to split a symbol in sub- symbols Probabilistic Context Free Grammar (PCFG):  More than one production rule may apply to a sequence of words, thus resulting in a conflict  It uses the frequency of various productions to order them

Parsing Stanford Parser: PCFG parser

Parsing Statistical parsing is useful to solve problems like ambiguity and efficiency We lose part of the semantic information BUT Dependency Grammar: words in a sentence are connected by means of binary, asymmetrical governor-dependent relationships

Context Disambiguation Given a (finite) set of contexts, assign each lexical item to one (or more) context(s) including a score. Context_1Context_2Context… Context_m Lexical Item We use a simple, frequency-based, disambiguation algorithm.

Resolution Car Each lexical item (a word, or a set of), is resolved against popular ontologies, including DBPedia, YAGO, GeoNames, WordNet 3 OWL, …

OOLOT The language of thought is an intermediate format mainly inspired by Kowalski’s LoT. It has been introduced to represent the extracted knowledge in a way that is totally independent from original lexical items and, therefore, from original language. Our LOT is itself a language, but its lexicon is ontology oriented, so we adopted the acronym OOLOT (Ontology Oriented Language Of Thought). OOLOT is used to represent the knowledge extracted from natural language sentences, so basically the bricks of OOLOT (lexicons) are ontological identifier related to concepts (in the ontology), and they are not a translation at lexical level.

OOLOT: Lambda-based translation Example: “Many girls eat apples”

OOLOT: Lambda-based translation Example: “Many girls eat apples”

OOLOT: Lambda-based translation

And, finally, after applying apple to the previous partial expression, we have:

RDF/OWL Exporting Since OOLOT is designed to have a representation very close to RDF, it's possible to export toward RDF/OWL. In many cases, when is possible to maintain the semantic, there is a 1:1 mapping, otherwise we're starting using RDF/OWL syntactic approximations through reification (when you can’t preserve the original semantic) OOLOT: predicate(subject, object) RDF: Best case:

Framework In Action “Ferrari is an Italian sports car manufacturer based in Maranello.”

Framework in Action

Conclusion & Future Works OOLOT Deep Analysis RDF Exporting Dependency Parsing ASP Further exploit: OOLOT language ASP to RDF/OWL Exporting This is a quite new framework, so many aspects need to be refined and improved.