ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER 9 2013 ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.

Slides:



Advertisements
Similar presentations
Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
ISOcat Data Model: Workflow & Guidelines Marc Kemps-Snijders a, Sue Ellen Wright b, Menzo Windhouwer a a Max Planck Institute for Psycholinguistics, b.
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
Status Report of the Study Group on MDR/MFI Implemenations ISO/IEC JTC 1/SC 32/WG2 Interim Meeting Santa Fe, NM, USA, November 11~15, 2013 Dongwon Jeong,
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
Direction of Proposals for New Edition (E3) of ISO/IEC 11179
Principles of ISOcat, a Data Category Registry Marc Kemps-Snijders a, Menzo Windhouwer a, Sue Ellen Wright b a Max Planck Institute for.
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
ANSI TAG 37 Committee F43 Language Services and Products Interagency Language Roundtable September 30, 2011 Sue Ellen Wright ISO TC 37, Terminology and.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,
TC3 Meeting in Montreal (Montreal/Secretariat)6 page 1 of 10 Structure and purpose of IEC ISO - IEC Specifications for Document Management.
1 TECO-WIS, 6-8 November 2006 TECHNICAL CONFERENCE ON THE WMO INFORMATION SYSTEM Seoul, Republic of Korea, 6-8 November 2006 ISO 191xx series of geographic.
OASIS TECHNICAL COMMITTEE FORMAT OF AUTOMOTIVE REPAIR INFORMATION SC2-D5 Architecture and Specifications.
TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
Procedures to Develop and Register Data Elements in Support of Data Standardization September 2000.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Future of MDR - ISO/IEC Metadata Registries (MDR) Larry Fitzwater, SC 32 WG 2 Convener Computer Scientist U.S. Environmental Protection Agency May.
Data Category specifications 20 March 20121CLARIN-NL ISOcat workshop.
Tutorial for SC 32/WG 1 e-Business Standards Prepared for: SC Kunming Plenary Meeting Wenfeng Sun, Convenor ISO/IEC JTC1 SC32 WG1 (eBusiness)
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Environmental Terminology Research in China HE Keqing, HE Yangfan, WANG Chong State Key Lab. Of Software Engineering
Trends in Concept Modelling Turning Issues into Solutions How to Discipline a Cat Sue Ellen Wright, Kent State University.
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
Classification and the Metadata Registry Judith Newton NIST IRS XML Stakeholders/ XML Working Group May 18, 2004.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
Report on the ISOcat project Marc Kemps-Snijders Menzo Windhouwer Peter Wittenburg Sue Ellen Wright January 8,
Study Period Report on Registration Procedure SC32WG2 Interim Meeting, Seoul H. Horiuchi SC32WG2 N1070.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Presentation Title: Day:
Potential standardization items for the cloud computing in SC32 1 WG2 N1665 ISO/IEC JTC 1/SC 32 Plenary Meeting, Berlin, Germany, June 2012 Sungjoon Lim,
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
CRITERIA FOR STANDARDIZING DATA CATEGORIES The Well-Formed Data Category Specification SUE ELLEN WRIGHT METADATA TDG WEBINAR
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
® A Proposed UML Profile For EXPRESS David Price Seattle ISO STEP Meeting October 2004.
Extending the MDR for Semantic Web November 20, 2008 SC32/WG32 Interim Meeting Vilamoura, Portugal - Procedure for the Specification of Web Ontology -
ISO/IEC JTC 1/SC 32 Plenary and WGs Meetings Jeju, Korea, June 25, 2009 Jeong-Dong Kim, Doo-Kwon Baik, Dongwon Jeong {kjd4u,
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
ISO TC37/SC4 N435 Nov 12, 2007 Presented by Miran Choi/ETRI Written by Jae Sung Lee/Chungbuk National Univ.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
ISOcat tutorial DCR data model and guidelines. Simple and complex DCs Simple Data CategoryComplex Data CategoryConceptual Domain Data CategoryDescription.
Extending the Metadata Registry for Semantic Web - Enforcing the MDR for supporting ontology concept - May 28, 2008 ISO/IEC JTC 1/SC 32 WG 2 Meeting Sydney,
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Web Service Exchange Protocols Preliminary Proposal ISO TC37 SC4 WG1 2 September 2013 Pisa, Italy.
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
Report on Eighth Open Forum on Metadata Registries, Berlin, April 2005
MDR for the Semantic Web: Supporting Ontology Concept
2. An overview of SDMX (What is SDMX? Part I)
Presentation transcript:

ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013

Terminology Communities of Practice Object-oriented terminology  Thesauri and controlled language, library community  Retrieval of objects and information Discourse-oriented terminology  Text & discourse production  Semantic modeling of concept relations Metadata-oriented terminology  Definition of metadata  Semantic registries for facilitation of ineroperability

ISOcat History as a Metadata Registry Long evolution within ISO TC 37, Terminology and other language and content resources Metadata Registry (MDR) in the spirit of ISO/IEC Not intended as a concept database nor as a terminology database ISO 1087 not designed to reflect actual data element names and concepts (commonly referred to in TC37 as Data Categories) used in terminological resources or in terminology concept systems or other ontological resources.

ISO TC 37 Terminology Standards ISO TC 37 terminology originally was housed in two paper standards, ISO 1087 parts 1 and 2 Devoted to discourse oriented terminology used primarily in the standards of ISO TC 37, SC 3, Systems to manage terminology, knowledge and content Terms currently housed in the iTerm resource TC37/TC37 TC37/TC37 Not compatible for linked data – no PIDs, not exportable in any formalism ISO 1087 terms not necessarily designed to reflect actual data element names and concepts (commonly referred to in TC37 as Data Categories) used in modeling terminological or ontological data Overlaps in usage between terminology and data modeling represent serendipitous convergence; common usage, but not necessary identical

Early Development Collaboration with ISO/IEC JTC 1/SC 32, Metadata Standardization of the data categories used in terminology and other language resources Growing and urgent industry demands for unambiguous, highly efficient interchange of terminological data in localization environments Standards:  ISO 16642, a high level metamodel for concept-oriented terminology databases  ISO 12620, original paper list of data category specifications  ISO 30042, TermBase eXchange format TBX for data collections that conform to the standard.

ISO/IEC Family of Standards Data modeling combines a wide “concept” with an “object class” to form a more specific “data element concept”. Example: “grammatical gender” is defined by the broad concept “grammatical category” combined with the limiting characteristic “grammatical relationships between words in sentences” to define the data element concept. The specification of this DC includes its definition, its datatype, and, in the case of a DC for which there exists a constrained set of values, its conceptual domain in the form of a set of permissible instances. In the DCR as realized object classes are treated as complex data categories and permissible instances are treated as simple data categories. Not just semantics – closely application oriented

ISO 12620:1999 & Core Attributes PID (old ID) DC name / identifier (e.g., grammaticalGender) DC Definition Note Example List of permissible instances in the case of closed DCs (Values themselves defined as simple DCs) (Schemas use the camel case identifier form)

SYNTAX to ISOcat The LIRICS-related SALT project produced SYNTAX, a precursor Meta Data Registry strictly for ISO data. The CLARIN-based ISOcat project expanded to include a wider range of language resources:  Influenced by a dictum from ISO Central Secretariat to enable the extraction of metadata definitions into a broadly conceived concept data base, then planned for implementation by the ISO Central Secretariat  Supported by (since proven to be unworkable) two-stage balloting procedure that mirrored the procedures used in customary ISO balloting for paper standards  Centered on the ISO approach to the creation of a Metadata Registry

Core Functionalities in ISOcat Rigorous definition of core classes (identified in our literature as complex data categories) Specification of itemized value domains where relevant (complex closed DCs) Data element name agnostic (i.e., specification of synonyms and multilingual equivalent names) The ability to group, regroup and subset critical data category selections Ability to output data specifications in readily readable (HTML) and processable form (rdf, rng, wsd, etc.

DATA CATEGORY SPECIFICATIONS The DCR Entry

ISOcat DC Specification – Header Header info: Key & PID; Type; Owner; Scope Critical feature: PID universally resolvable through RESTful interface

PID Resolution Yields: Designed to serve as reference from other resources on the web Capable of supporting external relation registries or other ontological resources that might in future replace DCR-related functionalities

PID Resolution

ISOcat DC Administrative Information Administrative section Contains quite a bit of redundant or unnecessary information Could be reduced or parts hidden

ISOcat DC Description Section Data element name /English language name Data element definition (one and only one) Examples, explanations, notes, sources Repeatable by language Note: can become much more complex than shown here

Conceptual domain, Linguistic Section Conceptual Domain (Links to permissible instances) Language-specific constraints

Link to a Simple DC in the Conceptual Domain Click individual item to display its DC spec Note: linked items are simple DCs

Multiple Conceptual Domains Part of speech – Morphosyntax To be continued …

Multiple Conceptual Domains Part of speech – Terminology

DECLARING DOMAIN & APPLICATION- SPECIFIC SUBSETS Data Category Selections

User Access & Data Category Selections DC Selections Selected DCS Selected DC User’s “Basket” Potential New DCS

Private Workspace Registered users can create their own DCSs either by creating new entries or collecting existing DCs into their own new DCSs. DCs are infinitely reusable and referenceable.

Going Public Owners can declare a DCS (or a DC) public or share with a selected group

Create/Edit Modes Owners or authorized registered members of a sharing group can edit existing entries or create new ones

Quality Check Specs that violate rules for proper form or incompleteness trigger QA warnings that can be resolved by correcting the entries.

Sharing Sharing groups show up in one’s private pane in the interface

Sharing Shared selection

Recommended DCs Moving away from the standardization concept, groups can less formally identify DCs as recommended for a certain context. DCSs can then be standardized in relevant ISO standards.

Standardized DCSs Standardization is more readily realized by listing the DCS in the relevant ISO standard and instantiating the DCS list in the DCR. ISO 24611:2012. Language resource management – Morpho-syntactic annotation framework (MAF)

Data Outputs Human-readable HTML representation

Data Outputs Processable data outputs