Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.
Dr. Alexandra I. Cristea RDF.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK.
Chapter 7: Resource Description Framework (RDF) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley,
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Editing Description Logic Ontologies with the Protege OWL Plugin.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
ONTOLOGY SUPPORT For the Semantic Web. THE BIG PICTURE  Diagram, page 9  html5  xml can be used as a syntactic model for RDF and DAML/OIL  RDF, RDF.
RDF Data Sources (keyword textbox) Search RDF Data Sources: LOM Binding Schemas (keyword textbox) Search XML Bindings: Dublin Core Application Profiles.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Logics for Data and Knowledge Representation
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
INLS 520 – Erik Mitchell INLS 520 Information Organization.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Semantic Network as Continuous System Technical University of Košice doc. Ing. Kristína Machová, PhD. Ing. Stanislav Dvorščák WIKT 2010.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -
Chapter 7: Resource Description Framework (RDF) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley,
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Common Terminology Services 2 CTS 2 Submission Team Status Update HL7 Vocabulary Working Group May 17, 2011.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
GEMET GEneral Multilingual Environmental Thesaurus leading the way to federated terminologies Stefan Jensen, Head of information services group with input.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
ELIS – Multimedia Lab PREMIS OWL Sam Coppens Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
The Akoma Ntoso Naming Convention Fabio Vitali University of Bologna.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
Ontology Technology applied to Catalogues Paul Kopp.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
On Demand RDF Databases in the Cloud Presentation for the Ontology Forum March 3, 2016.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Improving Data Discovery Through Semantic Search
Service-Oriented Computing: Semantics, Processes, Agents
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Presented by: Hassan Sayyadi
Semantic-Web, Triple-Strores, and SPARQL
Presentation transcript:

Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz

Plan of the Talk Preliminary Notes Motivation Law Document Modeling Ontology Modeling Reason-able view over LOD Conclusions

Preliminary Notes (1) The paper introduces the methodology and technical details behind the EUCases Legal Linked Open Dataset (EUCases-LeLOD) and Web Interface Querying EUCases Linking Platform. Methodologically, the EUCases-LeLOD consists of a set of ontologies and RDF triples EuroVoc and Syllabus are in the center of the model, since they are used as domain specific ontologies

Preliminary Notes (2) Other supporting ontologies have been added: – GeoNames for the named entities; PROTON as an upper ontology; –SKOS as a mapper between ontologies and terminological lexicons; –Dublin Core as a metadata ontology. –For an efficient reasoning, the so-called FactForge reason-able view was explored as an approach to linked data management

Preliminary Notes (3) EUCases-LeLOD is represented in RDF graphs and uses the SPARQL query language. The input documents have been encoded into the legal XML schema Akoma Ntoso. For its Web Interface the EUCases Linking Platform relies on the customized version of the GraphDB Workbench, developed by Ontotext AD:

Motivation We view the process of linking as a gradual one : –the professionals and stakeholders are usually interested in referenced documents –however, even the step of correct referencing requires NLP processing –thus, the linguistic information is in the data, but it remains hidden –we would like to ‘save’ the linguistic information in providing further linking on paragraph, sentence, phrase and word level In this paper we focus on the first step in the context of the future linguistic linking

Law Document Modeling: Structure (1) Two types of EUCases documents: Act (legislative documents) and Judgment (case description) The structure of each EUCases document is divided into two: metadata and content. –The metadata determines the type of the document, its life cycle, including the date of creation, the author(s), the place of creation, keywords, abstract, etc. –The content of the document is the actual text of the document.

Law Document Modeling: Structure (2) The RDF representation of each EUCases document encodes information coming from these both sources - the metadata and the content. –The metadata is already represented explicitly during the creation of the XML representation of the document. –The content of the document is described on the basis of its text annotation via NLP as well as the linking tools developed within the project.

Law Document Modeling: Namespaces All created instances of EUCases documents, metadata elements and annotations are represented as instances in the EUCases dataset. For all EUCases instances the following namespace is eucinst:.

Law Document Modeling: Namespaces For extending some of the ontologies, new classes and properties are defined within the following eucont:.

Ontology Modeling (1) We start with PROTON ontology and extend it with a definition of new classes and properties where necessary or via mapping to the other ontologies. In EUCases-LeLOD two types of documents are represented: acts and judgments.

Ontology Modeling (2) The EUCases document is modelled as a sub-class of the PROTON class ptop:Document ptop:Document rdf:type owl:Class ; rdfs:comment "The information content of any sort of document. The tangible aspects are ignored. It is usually a document in free text with no formal structure or semantics ; rdfs:label ; rdfs:subClassOf ptop:InformationResource.

Document Properties ptop:Document has the following important for EUCases-LeLOD properties (directly defined for it or inherited): ptop:documentAbstract; ptop:documentSubTitle; ptop:derivedFromSource; ptop:hasContributor; ptop:hasDate; ptop:hasSubject; ptop:inLanguage; ptop:informationResourceCoverage; ptop:informationResourceIdentifier; ptop:informationResourceRights; ptop:resourceType; ptop:title. These properties comply with Dublin Core ones.

Ontology of the documents

Ontology Modeling (3) The history of a EUCases document is presented as a sequence of events (or states) like document creation, document publication, document signature, etc. Events and states are modelled by the PROTON class ptop:Happening. Each happening is determined by its beginning and end time moment as well as the participants. In order to avoid the complex nature of the document events in the current version of the EUCases ontology we encode only time stamps for some of the events.

Ontology Modeling (4) Classification of a EUCases document is done via a set of keyword references. The keyword references point to terms in a thesauri like EuroVoc or Legal Taxonomy Syllabus in our case. Such a classification has a source which can be some real agent - Person or Organization; or software agent like EUCases NLP ToolKit. Each classification is represented by the EUCases class eucont:Classification

Ontology Modeling: Classification

Ontology Modeling: Reference

The reference is divided into two parts: web address of the referred document and internal address The web address is represented as the property eucount:sourceURI Geographical characteristics of the EUCases documents are modelled in two ways –the metadata geographical features determine the area in which the document is in force –the second kind of geographical information is extracted automatically from the content of the document

Mapping PROTON to geo-ont:. [ rdf:type owl:Restriction ; owl:onProperty geo-ont:featureCode ; owl:hasValue geo-ont:A.ADM4 ] rdfs:subClassOf pext:PoliticalRegion.

Mapping from PROTON to EUCases eucontLEUCDocument rdfs:subClassOf ptop:Document. eucont:shortTitle rdfs:subPropertyOf ptop:title.

RDF Representation of the EUCases documents The process for RDFization of a given EUCases document comprises the following steps: –Reading and validation of an XML document –For each element of the document which provides information for the RDF representation one or more appropriate new XML elements are added –For each added triple element the module computes the subject, predicate and object URIs –The triples with defined subject, predicate and objects are extracted from the document and converted into actual RDF Triples –The created sets of RDF triples for a given document are loaded into the EUCases RDF repository

Example

Reason-able View over EUCases- LeLOD Reason-able view is an assembly of independent datasets, which can be used as a single body of knowledge with respect to reasoning and query evaluation The key principles can be summarized as the following instruction: –Group selected datasets and ontologies in a compound dataset –Clean up, post-process and enrich the datasets if necessary –Define a set of sample queries against the compound dataset

Reason-able View Makes reasoning and query evaluation feasible Lowers the cost of entry through interactive user interfaces and retrieval methods such as URI auto-completion and RDF search Guarantees availability Easier exploration and querying of unseen data

Ontologies in LeLOD RV EuroVoc and Syllabus are domain modeling ontologies which are used for the annotation of the content in the EUCases documents; GeoNames ontology describes the structure of GeoNames LOD dataset; Dublin Core provides a vocabulary for description of document metadata; PROTON is a general ontology. It plays the important role of a joined ontology for the reason-able view; SKOS is a metaontology for mapping lexicons and ontologies.

LeLOD Reason-able View The linking between the different ontologies and LOD Datasets was implemented in two ways: –Ontology mappings (ensuring the usage of a single ontology for querying the LOD dataset) –Instance mappings (provided by the NLP pipelines implemented in EUCases)

LeLOD Reason-able View (2) The ontology mappings have been done manually and the instance mappings have been done automatically via the respective NLP pipelines The created LOD dataset is represented as a reason-able view and loaded in the Graph DB RDF repository In this way, it is made available for professionals in the domain area.

Conclusions We consider this as a basis for creation of Linguistic LOD datasets. The creation of LLOD from the linguistic analyses of the domain documents needs also usage of appropriate linguistic ontologies. We plan to annotate the documents with WordNet synsets.

Thank you very much for your attention!

PREFIX eucont: PREFIX gn: PREFIX ptop: PREFIX geonames: SELECT ?doc ?title ?uri ?cname ?c WHERE { ?doc ?p eucont:EUCDocument. ?doc ptop:title ?title. ?doc eucont:sourceURI ?uri. ?doc ptop:informationResourceCoverage ?c. ?c ptop:locatedIn. ?c gn:name ?cname } LIMIT 1000