Www.isocat.org Linking to Linguistic Data Categories in ISOcat Menzo Windhouwer a, Sue Ellen Wright b a The Language Archive - MPI for Psycholinguistics,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

ISOcat Data Model: Workflow & Guidelines Marc Kemps-Snijders a, Sue Ellen Wright b, Menzo Windhouwer a a Max Planck Institute for Psycholinguistics, b.
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
Principles of ISOcat, a Data Category Registry Marc Kemps-Snijders a, Menzo Windhouwer a, Sue Ellen Wright b a Max Planck Institute for.
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
The Wichita lexicon in LEXUS Armik Mirzayan University of Colorado at Boulder Jacquelijn Ringersma Max Planck Institute for Psycholinguistics RELISH Workshop.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
Ontology Notes are from:
SKOS and Other W3C Vocabulary Related Activities Gail Hodge Information International Assoc. NKOS Workshop Denver, CO June 10, 2005.
A Registry for controlled vocabularies at the Library of Congress
Data Category specifications 20 March 20121CLARIN-NL ISOcat workshop.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
Provo, 16 Aug 2007 LMF meeting 1 Lexical Markup Framework: ISO Provo meeting Gil Francopoulo.
The role of metadata schema registries XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Standards for language resources the ISO/TC 37(/SC 4) perspective
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.
The MMI Tools Carlos Rueda Monterey Bay Aquarium Research Institute OOS Semantic Interoperability Workshop Marine Metadata Interoperability Project Boulder,
Multilingual Information Exchange APAN, Bangkok 27 January 2005
Building an Ontology of Semantic Web Techniques Utilizing RDF Schema and OWL 2.0 in Protégé 4.0 Presented by: Naveed Javed Nimat Umar Syed.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Report on the ISOcat project Marc Kemps-Snijders Menzo Windhouwer Peter Wittenburg Sue Ellen Wright January 8,
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
ISO a tutorial Part 2: Representing data categories TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
ISOcat introduction 20 June 20131CLARIN-NL ISOcat workshop.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
ISO/TC37/SC4/TDG6 Language Resource Ontologies , Pisa HASIDA Koiti CfSR, AIST, Japan.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
Tutorial on XML Tag and Schema Registration in an ISO/IEC Metadata Registry Open Forum 2003 on Metadata Registries Tuesday, January 21, 2003; 4:45-5:30.
The ISO Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)
ISOcat status
CLARIN Requirements for a Semantic Registry Daan Broeder The Language Archive – MPI Ineke Schuurman CLARIN-NL/VL – KU Leuven & Utrecht.
Menzo Windhouwer.  The Typological Database System (TDS) provides integrated access to multiple, independently created typological databases.  Users.
Trait ontology approach Marie-Angélique LAPORTE NCEAS June 7 th 2010.
Annotation by category – ELAN and ISO DCR Han Slöetjes, Peter Wittenburg Max-Planck-Institute for Psycholinguistics LREC,
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
ISOcat tutorial DCR data model and guidelines. Simple and complex DCs Simple Data CategoryComplex Data CategoryConceptual Domain Data CategoryDescription.
DC Architecture WG meeting Wednesday Seminar Room: 5205 (2nd Floor)
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
TDS-Curator DANS MPI for Psycholinguistics Utrecht Institute of Linguistics OTS languagelink.let.uu.nl/tds/ 9/21/20101CLARIN-NL - Call 1 - ISOcat status.
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
© Tata Consultancy Services ltd.12 June Metadata and Data Standards Levels of Metadata C. Anantaram Innovation Lab.
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
The Semantic Web By: Maulik Parikh.
Grid Computing 7700 Fall 2005 Lecture 18: Semantic Grid
Analyzing and Securing Social Networks
European Network of e-Lexicography
PREMIS Tools and Services
Grid Computing 7700 Fall 2005 Lecture 18: Semantic Grid
Presentation transcript:

Linking to Linguistic Data Categories in ISOcat Menzo Windhouwer a, Sue Ellen Wright b a The Language Archive - MPI for Psycholinguistics, b Kent State University

Outline A short introduction to data categories – the ISOcat registry How to refer to ISOcat data categories – using PIDs – from XML and RDF resources Fine-tuning (personal) relationships between data categories – the RELcat registry Status 7 -9 March 2012Linked Data in Linguistics - DGfS 20122

ISOcat: a Data Category Registry An implementation of ISO 12620:2009 – Terminology and other content and language resources — Specification of data categories and management of a Data Category Registry for language resources Successor to ISO 12620:1999 which contained a hardcoded list of Data Categories A data category – is the result of the specification of a given data field – an elementary descriptor in a linguistic structure or an annotation scheme 7 -9 March 2012Linked Data in Linguistics - DGfS 20123

Data Category example Data category: /Grammatical gender/Grammatical gender – Administrative part: Identifier: grammaticalGender PID: – Descriptive part: English definition: Category based on (depending on languages) the natural distinction between sex and formal criteria. French definition: Catégorie fondée (selon la langue) sur la distinction naturelle entre les sexes ou d'autres critères formels. – Conceptual domain: Morposyntax conceptual domain: /masculine/, /feminine/, /neuter/, /common/masculinefeminineneutercommon – Linguistic part: French conceptual domain: /masculine/, /feminine/masculinefeminine 7 -9 March 2012Linked Data in Linguistics - DGfS 20124

Data Category types writtenForm string open grammaticalGender string neuter masculine feminine closed simple: string constrained complex: 7 -9 March 2012Linked Data in Linguistics - DGfS 20125

Data Category types language alphabet writtenForm japanese ipa lexicon entry lemma container: 7 -9 March 2012Linked Data in Linguistics - DGfS 20126

Data Category relationships Value domain membership Subsumption relationships between simple data categories (legacy) Relationships between complex/container data categories are not stored in the DCR partOfSpeech string pronoun personal pronoun 7 -9 March 2012Linked Data in Linguistics - DGfS 20127

ISOcat: a Data Category Registry You can: – Find Data Categories relevant for your resources and embed references to them so the semantics of (parts of) your resources are made explicit This can be supported by tools you use, e.g., ELAN, LEXUS and the CMDI Component Editor directly interact with ISOcat – Interact with Data Category owners to improve (the coverage of) their Data Categories – Create (together with others) new Data Categories and/or selections needed for your resources and share those – Submit (your) Data Categories for standardization ISOcat is the DCR for ISO TC 37 – Free of charge – Grass roots approach March 2012Linked Data in Linguistics - DGfS 20128

March 2012Linked Data in Linguistics - DGfS Lexicon Lexical Entry FormSense 0..* 1..* partOfSpeech writtenForm grammaticalGender lexicalType Word Form Lemma LanguageBWOgenders grammaticalGenderwordOrder A (schema for a) lexicon A (schema for a) typological database Shared semantics! The usage of data categories?

Referencing Data Categories Each Data Category should be uniquely identifiable – Ambiguity: different domains use the same term but mean different ‘things’ – Semantic rot: even in the same domain the meaning of a term changes over time – Persistence: for archived resources Data Category references should still be resolvable and point to the specification as it was at/close to time of creation Persistent IDentifiers – ISO 24619:2011 Language resource management - Persistent identification and sustainable access (PISA) – ISOcat uses ‘cool URIs’: (/grammaticalGender/) March 2012Linked Data in Linguistics - DGfS

XML – DC Reference vocabulary ISO 12620:2009 is rather XML oriented – why not RDF? history – terminology management is a separate tradition from Semantic Web/Linked Data – DCIF -> GMT (TMF) -> own XML vocabulary based on UML data model but there is an RDF representation – needs to cover more of the data model Annex A provides the DC reference vocabulary – dcr:datcat to link to any DC – dcr:valueDatcat to link to a simple DC Preferably annotate a schema, e.g., a Relax NG or W3C XML Schema documents XML vocabularies might also provide their own means to link to a data category – TBX XCS, TEI ODD, CMDI,..., TEI (?) (Semantics by reference) 7 -9 March 2012Linked Data in Linguistics - DGfS

LMF Example....../DC /DC /DC /DC /DC /DC /DC March 2012Linked Data in Linguistics - DGfS

RDF – DC annotation property The dcr:datcat RDF annotation property mimics the DC Reference vocabulary – minimizes impact, i.e., allows the data model to use its own terminology – can be tuned using OWL (2) equivalentClass, equivalentPropery or sameAs – problem: annotating literals with simple Data Categories (names can be dcr:. :headword dcr:datcat ; rdfs:label "head ; rdfs:comment "A lemma heading a dictionary :partOfSpeech dcr:datcat ; rdfs:label "part of ; rdfs:comment "A category assigned to a word based on its grammatical and semantic 7 -9 March 2012Linked Data in Linguistics - DGfS

RDF – directly use Data Category PIDs Container Data Categories as RDF classes Complex Data Categories as RDF properties Simple Data Categories – as RDF literals problem: names can be ambiguous – as RDF classes (GrAF example vs cat:. cat:DC-258cat:DC-258 rdfs:label "head ; rdfs:comment "A lemma heading a dictionary cat:DC-396cat:DC-396 rdfs:label "part of ; rdfs:comment "A category assigned to a word based on its grammatical and semantic 7 -9 March 2012Linked Data in Linguistics - DGfS

Data Category Relations In the linked data world its natural to have, next to structural, ontological relationships – RDFS, OWL (2), SKOS,... But other resource/schema formats lack these features Relationships between Data Categories (also across vocabularies) are important for federated search, i.e., to find semantically related resources in another archive 7 -9 March 2012Linked Data in Linguistics - DGfS

RELcat a Relation Registry Stores relationships among Data Categories and also with ‘other’ concept registries – Dublin Core, OLAC, GOLD – (OLiA, OntoLingAnnot) – relationships can be the individual view of a (group of) linguist(s) RELcat is a quad store (graph, subject, predicate, object) Based on a ‘private’ relation type taxonomy so existing relationships specified in other vocabularies can easily be loaded – OWL (2), SKOS – normalized RELcat queries The aim is to support various levels of traversing the semantic network, not formal reasoning – conflicting (theoretical) views (parameters of variation) – but within known combination of sets reasoning may well be possible – also targets semantic search outside of the RDF domain 7 -9 March 2012Linked Data in Linguistics - DGfS

Relation type taxonomy 1.related 1.same as (a symmetric and transitive relationship) 2.almost same as (a symmetric relationship) 3.broader than (a transitive relationship and the inverse of the ’narrower than’ relationship) 1.superclass of (a transitive relationship and the inverse of the ’subclass of’ relationship) 2.has part (a transitive relationship and the inverse of the ’part of’ relationship) 1.has direct part (the inverse of the ’direct part of’ relationship) 4.narrower than (a transitive relationship and the inverse of the ’broader than’ relationship) 1.sub class of (a transitive relationship and the inverse of the ’super class of’ relationship) 2.part of (a transitive relationship and the inverse of the ’has part’ relationship) 1.direct part of (the inverse of the ’has direct part’ relationship) 7 -9 March 2012Linked Data in Linguistics - DGfS

Relation relcat rel dc cat :. relcat:cmdirelcat:cmdi { cat:DC-2573cat:DC-2573 rel:sameAs dc:identifier.rel:sameAsdc:identifier cat:DC-2482cat:DC-2482 rel:sameAs dc:language.rel:sameAsdc:language... cat:DC-2556cat:DC-2556 rel:subClassOf dc:contributor.rel:subClassOfdc:contributor cat:DC-2502cat:DC-2502 rel:subClassOf dc:coverage.rel:subClassOfdc:coverage } 7 -9 March 2012Linked Data in Linguistics - DGfS

Extension 1.related 1.same as (a symmetric and transitive relationship) 1.owl:equivalentClass 2.owl:equivalentProperty 3.owl:sameAs 4.skos:exactMatch 2.almost same as (a symmetric relationship) 1.skos:closeMatch 7 -9 March 2012Linked Data in Linguistics - DGfS

Normalized query PREFIX rel: PREFIX cat: SELECT ?c WHERE { cat:DC-2482 rel:sameAs ?c. }cat:DC-2482 Finds the same-as clique for /languageID/ (DC-2482) specified in any vocabulary, e.g., RELcat (CMDI) for Dublin Core and annotated OWL for GOLDlanguageIDDC March 2012Linked Data in Linguistics - DGfS

Semantic network 7 -9 March 2012Linked Data in Linguistics - DGfS Data Category Registry - ISOcat Linguistic knowledge baseLinguistic resource (schema) Data categories Containers Concepts Concept Registry Relation Relation Registry - RELcat Schema Registry - SCHEMAcat

Status ISOcat: in production, mainly lacking in standardization – RELcat: alpha version gives read only access to some relation sets, lacking some reasoning and UI – SCHEMAcat: design phase 7 -9 March 2012Linked Data in Linguistics - DGfS

March 2012Linked Data in Linguistics - DGfS Thank you for your attention! Visit Questions?