The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.

Slides:



Advertisements
Similar presentations
DC8 Registries Breakout. Goals of the session Discuss and clarify : Requirements for registry Framework for policy Relate issues raised to EOR prototype.
Advertisements

DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
Agents and Authority Linking Breakout sessions Oct 4 2-4, Oct 5 3-4:30 Goals: Explore issues in Agent discovery. Do we need an Agents working group? If.
IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Building metadata components Dieter Van Uytvanck Max Planck Institute for Psycholinguistics CLARIN-NL Info Session Nijmegen
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
From CLARIN Component Metadata to Linked Open Data
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
ISO as the metadata standard for Statistics South Africa
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
Update on INSPIRE: INSPIRE maintenance and implementation and INSPIRE related EEA activities on biodiversity CDDA/European protected areas technical meeting.
Metadata Standards and Applications 5. Applying Metadata Standards: Application Profiles.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
CLARIN web services and workflow Marc Kemps-Snijders.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
CMDI Component Registry Patrick Duin Max Planck Institute for Psycholinguistics 2011.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
1  Bob Hager Director of Publishing Standards Metadata Specification.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
ISOcat status
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Formats, interoperability and standards Marc Kemps-Snijders.
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
ISOcat tutorial DCR data model and guidelines. Simple and complex DCs Simple Data CategoryComplex Data CategoryConceptual Domain Data CategoryDescription.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
ISWG / SIF / GEOSS OOSSIW - November, 2008 GEOSS “Interoperability” Steven F. Browdy (ISWG, SIF, SCC)
ISWG / SIF / GEOSS OOS - August, 2008 GEOSS Interoperability Steven F. Browdy (ISWG, SIF, SCC)
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
The Re3gistry software and the INSPIRE Registry
Proposal of a Geographic Metadata Profile for WISE
ViCoS Visualising Conceptual Spaces
CMDI Component Registry
Presentation transcript:

The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work within TC37/SC4

Status of ISOCat Metadata Thematic Domain Based on extensive discussions with experts (Athens,…) Analysing existing metadata sets & inventories: OLAC, IMDI, ENABLER, … Input from CLARIN EU community that also provided translations Dedicated CLARIN NL metadata project Currently 220 metadata DCs

Further steps Time to take stock of the situation How to proceed with standardization? Metadata TDG chair proposal is: – Today walk through the current set of metadata DCs and take note of all comments – Start round two, contact all previous contributors (incl. those that were active providing translations) – After Christmas, the TDG will take charge again ISOCat supported standardization procedures are in place, but is this sufficient? – TDG members representative enough? – DCs ownership not completely transparent – Maybe just startup problems or …

Standardization process Create “submission group”, collection of DC owners Create (one or more coherent) DC selection(s) After submission, the TDG chair forms a decision group {chair, at least one other member} Decision group decides on all submitted DCs This is further forwarded to the DCR board DCR board has to validate and bring to publication

DCR Decision process

Standardization & Metadata Components Assumption is that there is broad agreement about – limitations existing metadata schemas: DC/OLAC, IMDI, TEI header Inflexible: too many (IMDI) or too few (OLAC) metadata elements Limited interoperability (both semantic and syntactic) Problematic (unfamiliar) terminology for some sub- communities. Limited support for LT tool & services descriptions – Address this by: Explicit defined schema & semantics User/project/community defined components

Metadata Components NOT a single new metadata schema but rather allow coexistence of many (community/researcher) defined schemas with explicit semantics for interoperability How does this work? Components are bundles of related metadata elements that describe an aspect of the resource A complete description of a resource may require several components. Components may contain other components Components should be designed for reusability

Metadata Components Technical Metadata Sample frequency Format Size … Lets describe a speech recording

Metadata Components Language Technical Metadata Name Id … Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Sex Language Age Name … Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location … Continent Country Address Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location Project … Name Contact Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location Project Metadata schema Metadata description Lets describe a speech recording Component definition XML W3C XML Schema XML File Profile definition XML Metadata profile

ActorLanguage Recursive Recursive Component model Components can contain other components Enhances reusability Actor Address Location Project

Country dcr:1001 Language dcr:1002 Location Country Coordinates Actor BirthDate MotherTongue Text Language Title Recording CreationDate Type Component registry BirthDate dcr:1000 ISOcat concept registry user Dance Name Type Semantic interoperability partly solved via references to ISO DCR or other registry Selecting metadata components & profiles from the registry Title: dc:title DCMI concept registry Reusability & Explicit Semantics User selects appropriate components to create a new metadata profile or selects an existing profile ISOCat or ISO DCR implementation of ISO standard for data categories under control of the linguistic community ISO TC37 Metadata is just one of the seven “thematic domains”

How to proceed? Decouple separate issues? Standardization of metadata DCs in the ISO-DCR Defining Requirements for a Metadata Component Model Standardizing the Component Model itself Standardizing a Component Specification Language Design/Specify a number of recommended components for specific data types and usages. Of course building on existing or continuing work like ISOCat, PISA, … where possible

Requirements for the component model For example: Component has attributes: name, multiplicity, concept- link, … Component model should support recursion A component contains a number of metadata elements A metadata element has a: name, value-scheme, multiplicity, concept link A component can refer to a number of resources or to other metadata components A component can contain information about resource relations A component grammar has to be fully deterministic to avoid ambiguity

Metadata Component Model Should embody the requirements without defining a component specification language As an example the CLARIN CMDI component model

Component Specification Language clarin.eu:cr1:c_ iso The list of ISO language families. Based on: aav afa alg alv [...] CLARIN Component example: ISO-635 component

ISO Recommended Components Our CMDI experience is that we need to limit the proliferation of components Offer a set of standardized ones for use with – specific data-types – for specific purposes

Reference implementation CLARIN has been working on: – Metadata component registry and editor – Metadata editor If a potential ISO standard for component model and specification is not too different from CLARIN requirements and practice these could serve as a reference implementation

Persistent Identifier Use All is based on the recent FDIS PISA Cool URIs for the concept links to ISOCat and ISOCDB All references to resources and metadata can contain PIDs

The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Thank you for your attention