Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.

Slides:



Advertisements
Similar presentations
Copyright 2001, ActiveState. XSLT and Scripting Languages or…XSLT: what is everyone so hot and bothered about?
Advertisements

Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Modelling with expert systems. Expert systems Modelling with expert systems Coaching modelling with expert systems Advantages and limitations of modelling.
An Introduction to GATE
University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell.
University of Sheffield NLP Module 4: Machine Learning.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Information Extraction CS 652 Information Extraction and Integration.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Using Use Case Scenarios and Operational Variables for Generating Test Objectives Javier J. Gutiérrez María José Escalona Manuel Mejías Arturo H. Torres.
Ontology-based Information Extraction for Business Intelligence
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Some Commercial Text Mining Systems Xuanhui Wang UIUC March 29th, 2007.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
An approach to Intelligent Information Fusion in Sensor Saturated Urban Environments Charalampos Doulaverakis Centre for Research and Technology Hellas.
Survey of Semantic Annotation Platforms
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
Information Extraction From Medical Records by Alexander Barsky.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Ontology-Based Information Extraction: Current Approaches.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Knowledge Representation Fall 2013 COMP3710 Artificial Intelligence Computing Science Thompson Rivers University.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Definition and Technologies Knowledge Representation.
General Architecture of Retrieval Systems 1Adrienn Skrop.
MDD-Kurs / MDA Cortex Brainware Consulting & Training GmbH Copyright © 2007 Cortex Brainware GmbH Bild 1Ver.: 1.0 How does intelligent functionality implemented.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Knowledge Representation
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
 Corpus Formation [CFT]  Web Pages Annotation [Web Annotator]  Web sites detection [NEACrawler]  Web pages collection [NEAC]  IE Remote.
Artificial intelligence (AI)
Introduction to Information Extraction
Social Knowledge Mining
TA : Mubarakah Otbi, Duaa al Ofi , Huda al Hakami
Knowledge Representation
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Presentation transcript:

Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.

Purpose To link together  Recent developments in natural language processing (NLP): Information Extraction (IE)  Classical logic programming: Prolog New Paradigm: bifurcated process  An IE application which will produced structured output from a corpus of free, unstructured text.  Transformation of extracted information into a Prolog knowledge-base (sets of fact-triples) Documents: biographies

Why NLP? Language is the cornerstone of intelligence  The Turing Test: the ability to converse like man Understanding and generating texts in a natural language, e.g. English Many specific NLP tasks  Chatterbots, e.g. Eliza  Machine Translation  Information Retrieval (IR), e.g. Google  Information Extraction!!  SciFi Dreams: universal translation, computers you can talk to, etc.

Information Extraction (IE) Most generally, the transformation of  Information contained in free, unstructured text in a natural language into  A prescribed, structured format. More specifically, the identification of  Instances of certain object classes  Their attributes  Relationships between object instances Always restricted into a particular domain  In order to have a reasonably sized and sufficiently expressive ontology

Why IE? An Expert must read many documents Advent of the Internet & Information Age  Explosion of the sheer volume of textual information, readily available in electronic form  New opportunity: lots and lots of available information to exploit  Formidable challenge: impossible for an expert to read and analyze that much text. A pragmatic approach:  Full text understanding is out of reach  Automate just some of the tasks, i.e. the identification of objects, attributes, and relations

IE - Details Five Tasks in IE  Named Entity Recognition (NE)  Coreference Resolution (CO)  Template Element Construction (TE)  Template Relation Construction (TR)  Scenario Template Production (ST) Metrics for Evaluation  Precision:  Recall:  F-measure (borrowed from IR): More intuitive reformulation:

Annotations Annotations identify objects in text Annotation graph: a directed, acyclic graph (DAG)  Nodes position in the text  Edges The literal text Annotations

Frames Frame: representation of an object, consisting of slots, which contain values Typical Prolog fact: Frame(Slot, Value). We propose to synthesize it with the idea of annotations: Doc(Annot, Text).  Main idea: represent the document directly as an object: compromise between text and knowledge Several Advantages  A corpus of multiple related documents  Direct link between information and its source  Opens the door for the application of Prolog's logic.

Design The IE application  Input: corpus of free, unstructured text  Output: the annotated documents, represented as annotation graphs  How: use GATE (language: JAPE) The Prolog application  Input: the annotated document  Output: a frame, i.e. a set of Prolog facts.  How: use XSB (language: Prolog)

General Architecture for Text Engineering (GATE) A comprehensive architecture for development of NLP applications Documents treated as an annotation graph Java Annotation Patterns Engine  Its own language for writing grammars that identify instances of object classes to annotate A Nearly New Information Extraction (ANNIE) system  An already implemented rudimentary IE system, that can be extended through addition of JAPE grammars for annotating Machine-learning models for annotating

GATE

Procedures Obtain the corpus – Python script Write the Jape grammars  annotations 'Mathematician', 'Father'. Train a model  annotation 'Protagonist' Write the Prolog application to  Parse GATE's XML output into a structure  Construct the annotation graph from it  Process the annotations into a document frame  Output the document frame Test by posing queries

IE Result: Fermat.html Precision: 1. (why so high?)  use of a gazetteer list  aggressive pruning by context Recall:  paid for aggressive pruning, missed some F-measure (β = 2)  0.973

Prolog Result Correctly constructs facts. Sample session: | ?- 'Galois.html.xml'('Mathematician', X). X = Abel; X = Cauchy; X = Evariste Galois; X = Fourier; X = Galois; X = Gauss; X = Gergonne; X = Jacobi; X = Lagrange; X = Legendre; X = Libri; X = Liouville; X = Poisson; X = Vernier

Results The Prolog layer is universal, cross-domain  The IE application may produce any annotation, not restricted to one subject area  Bifurcation: success Opens door to logic and rules, esp. for cross- document relations | ?- 'Galois.html.xml'('Mathematician', X), 'Cauchy.html.xml'('Protagonist', X). X = Cauchy; no

Conclusion With the recent advancements in computing power, logic programming is finally feasible for practical use  To run my Prolog application, ran it on the server robustus, giving it 2 GB of memory  However, computing power continues to be a limitation (GATE crashed every day) Where do we go from here?  More expressive document frame  Context analysis (through proximity, etc)  Better IE applications through statistical processing