University of Sheffield, NLP Case study: GATE in the NeOn project Diana Maynard University of Sheffield.

Slides:



Advertisements
Similar presentations
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Advertisements

eClassifier: Tool for Taxonomies
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
The CODS Protégé Server. Goals 3 Collaborative Ontology Development Approaches Browse with limited Edit Version Control (analogous to cvs, svn) But should.
…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Page 1 Copyright © 2010 Data Access Technologies, Inc. Model Driven Solutions May 2009 Cory Casanave Architecture of Services SOA for E-Government Conference.
Y. Jaques Yves Jaques ICIS Requirements Gathering, June 2008, Rome NeOn Lifecycle Support for Networked Ontologies.
Click to edit Master title style Page - 1 OneSky Teams Step-by-Step Online Corporate Communication Support 2006.
XP New Perspectives on Microsoft Office Word 2003 Tutorial 7 1 Microsoft Office Word 2003 Tutorial 7 – Collaborating With Others and Creating Web Pages.
An Ontology Creation Methodology: A Phased Approach
Jone Garmendia, Head of Cataloguing 25 November 2011 The National Archives Taxonomy.
Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.
1/ 26 AGROVOC and the OWL Web Ontology Language: the Agriculture Ontology Service - Concept Server OWL model NKOS workshop Alicante,
Profiles Construction Eclipse ECESIS Project Construction of Complex UML Profiles UPM ETSI Telecomunicación Ciudad Universitaria s/n Madrid 28040,
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
1 An Introduction to Pivot Tables Using Excel 2000.
1/(20) Introduction to ANNIE Diana Maynard University of Sheffield March 2004
26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.
Funded by: European Commission – 6th Framework Project Reference: IST WP6 review presentation GATE ontology QuestIO - Question-based Interface.
By Waqas Over the many years the people have studied software-development approaches to figure out which approaches are quickest, cheapest, most.
© 2005 AT&T, All Rights Reserved. 11 July 2005 AT&T Enhanced VPN Services Performance Reporting and Web Tools Presenter : Sam Levine x111.
Microsoft Office Illustrated Fundamentals Unit C: Getting Started with Unit C: Getting Started with Microsoft Office 2010 Microsoft Office 2010.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
University of Sheffield NLP Exercise I Objective: Implement a ML component based on SVM to identify the following concepts in company profiles: company.
University of Sheffield NLP Module 4: Machine Learning.
/ faculty of mathematics and informatics TU/e eindhoven university of technology 1 Adaptive Authoring of Adaptive Educational Hypermedia Alexandra Cristea.
The Semantic Web and Language Technology BT Exact, Martlesham Hamish Cunningham Department of Computer Science, University of Sheffield Friday October.
Executional Architecture
Getting Familiar with Web Pages 1 2 The Internet Worldwide collection of interconnected computer networks that enables businesses, organizations, governments,
02-Oct-2008 European Forum for GeoStatistics 2008 in Bled Concept for an Integrated Web Solution / an Infrastructure for Geostatistics (Subproject 3)
How creating a course on the e-lastic platform 1.
Co-funded by the European Union Semantic CMS Community Content Management From free text input to automatic entity enrichment Copyright IKS Consortium.
12 January 2009SDS batch generation, distribution and web interface 1 ExESS IT tool for SDS batch generation, distribution and web interface ExESS IT tool.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
From Model-based to Model-driven Design of User Interfaces.
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Text mining and the Semantic Web Dr Diana Maynard NLP Group Department of Computer Science University of Sheffield.
Ontology-based Information Extraction for Business Intelligence
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Named Entity Recognition without Training Data on a Language you don’t speak Diana Maynard Valentin Tablan Hamish Cunningham NLP group, University of Sheffield,
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Survey of Semantic Annotation Platforms
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
University of Sheffield, NLP Entity Linking Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.
University of Sheffield NLP Teamware: A Collaborative, Web-based Annotation Environment Kalina Bontcheva, Milan Agatonovic University of Sheffield.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Combining GATE and UIMA Ian Roberts. University of Sheffield NLP 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
University of Sheffield, NLP Module 6: ANNIC Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
1 Terminal Management System Usage Overview Document Version 1.1.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Social Knowledge Mining
Hierarchical, Perceptron-like Learning for OBIE
Presentation transcript:

University of Sheffield, NLP Case study: GATE in the NeOn project Diana Maynard University of Sheffield

University of Sheffield, NLP Aims of this talk Demonstrates using GATE for automating SW- specific tasks such as semantic annotation and ontology learning from texts SARDINE: pattern-based relation extraction in the fisheries domain Adding new concepts and instances to the ontology Finding relations between existing concepts in the ontology SPRAT: generic version of SARDINE

University of Sheffield, NLP Recap: IE for the Semantic Web Traditional IE is based on a flat structure, e.g. recognising Person, Location, Organisation, Date, Time etc. For the Semantic Web, we need information in a hierarchical structure Idea is that we attach annotations to the documents, pointing to concepts in an ontology Information can be exported as an ontology annotated with instances

University of Sheffield, NLP Linking the Text to the Ontology

University of Sheffield, NLP The NeOn project NeOn (Networking Ontologies) is a 4-year 14.7 million Euro EU project involving 14 European partners. Focus on using ontologies for large-scale semantic applications in distributed organizations Handles multiple networked ontologies that exist in a particular context, are created collaboratively, and might be highly dynamic and constantly evolving.

University of Sheffield, NLP 6 ODd SOFAS The Food and Agricultural Organisation of the UN have odd sofas…..

University of Sheffield, NLP Wall climbing sofa

University of Sheffield, NLP Sofa made from bicycle seats

University of Sheffield, NLP FAO Case Study Actually, its nothing to do with sofas, or any kind of seating. They do, however, have an Ontology-driven stock over-fishing alert system Focuses on agricultural sector and information management for hunger prevention Case study aims at management of alerts to avoid over-fishing in already stretched global waters Role of GATE is to analyse textual resources to find new information such as new fish names, and relations between ontology elements, e.g. Atlantic cod are fished in the Gulf of Maine

University of Sheffield, NLP 10 SARDINE Species Annotation, Recognition and Indexing of Named Entities SARDINE identify mentions of fish species from text It identifies –existing fish names listed in the ontology and their morphological variants –potential new fish names not listed in the ontology –potential relations between fish names For the new fish, it attempts to classify them in the ontology, based on linguistic information such as synonyms and hyponyms of existing fish It may generate properties also for existing fish in the ontology

University of Sheffield, NLP 11

University of Sheffield, NLP 12

University of Sheffield, NLP 13 Using patterns to find new fish Synonyms: –mummichogs (fundulus heteroclitus) Names appearing in lists: –plankton, herring and clams.... –clams, herring and other types of fish More specific fish names: –Japanese flounder –Red salmon –Suberites sponges

University of Sheffield, NLP Example of JAPE rule (1) Example: Suberites sponges (where sponge is a known class) Rule: AdjClass ( ({Token.category == JJ}) ({Class}):super ):sub --> :sub.SardineSubclass = {rule=AdjClass}, :super.SardineSuperclass = {rule=AdjClass}, …

University of Sheffield, NLP Example of JAPE rule (2) Example: Frogs are a kind of amphibian. Rule:Subclass1 ( ({NP}):sub ( {Lookup.minorType == be} {Token.category == DT} {Lookup.majorType == kind} ) ({NP}):super ) --> …

University of Sheffield, NLP 16 Annotated text in GATE

University of Sheffield, NLP 17 Augmenting the Ontology The new classes found are linked to existing classes in the ontology For existing fish, and new fish which we identified as a synonym or hyponym of an existing fish, the link is to an existing ontology instance When we don't identify a link to any existing fish, we create a new concept The changes to the ontology are stored and can be verified later by human experts

University of Sheffield, NLP 18 Generated animal ontology

University of Sheffield, NLP Recognising components from the ontology In addition to the standard IE components, we use some special ontology components. The OntoRootGazetteer enables us to match words or phrases in the text with classes, instances or properties in an ontology, as any morphological variant Morphological analysis is performed on both text and ontology, then matching is done between the two at the root level. Text is annotated with features containing the root and original string(s) When new elements are added to the ontology, these features can be used to regenerate alternative forms

University of Sheffield, NLP Modifying the ontology We developed a special GATE plugin called NEBOnE (Named Entity Based ONtology Editor) This reuses technology taken from CLOnE (Controlled Language ONtology Editor) CLOnE is designed to create new classes, instances etc from raw (controlled) text generated by the user NEBOnE enables changes to be made to the ontology based on information extraction from input texts (e.g. web pages) in natural language Morphological analysis enables both root forms and variants to be added to the ontology (as properties), along with other variants (e.g. capitalisation)

University of Sheffield, NLP Finding relations between known elements In this case study, we use existing information from the ontology to find relations between them. e.g. fish species -- gear type We have already annotated all fish species, gear types, fishing areas and so on in the text, based on ontology lookup JAPE grammar first finds the subject of the document (a gear type) and adds the information as a document feature When a species name is found, we create a new annotation for the relation gear_used, with a property denoting the species, and another property denoting the ID number of the gear.

University of Sheffield, NLP 22 Viewing relations

University of Sheffield, NLP 23 Using ANNIC to view results By running our application on a Lucene datastore, we can then use ANNIC to view the results Search for the pattern consisting of the name of the relation annotation (in this case gear_used) Show the relevant features (species, gear ID, gear type)

University of Sheffield, NLP 24 Using ANNIC to view results

University of Sheffield, NLP SPRAT Semantic Pattern Recognition and Annotation Tool This is a generic version of SARDINE that runs on all kinds of texts, not just fisheries Does not require a seed ontology Useful for building a domain ontology from scratch Tested on wikipedia pages

University of Sheffield, NLP 26 How well can we do it? Traditional NE recognition on news texts: ~90% precision/recall Ontology-based information extraction on news texts: ~80% precision/recall Pattern-based relation extraction on Wikipedia texts: high accuracy but low recall (or vice versa depending on setup) Relation finding between known entities: ~90% precision/recall

University of Sheffield, NLP More information Neon Project: Neon Toolkit is freely available: SARDINE application can be downloaded from the GATE website