The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry.

Slides:



Advertisements
Similar presentations
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
Advertisements

CLARIN AAI, Web Services Security Requirements
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
User Attributes; who, where, how many? Daan Broeder TLA – MPI for Psycholinguistics.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
CLARIN and the DSA Paul Trilsbeek The Language Archive Max Planck Institute for Psycholinguistics.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
Steven KrauwerLREC20081 CLARIN: Common Language Resources and Technology Infrastructure for the Humanities and Social Sciences Kimmo Koskenniemi (University.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
CLARIN Centers for a Sustainable Infrastructure Daan Broeder, MPI for Psycholinguistics Jan Odijk, Utrecht University.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CLARIN-NL: Dealing with ISOcat Ineke Schuurman. ISOcat and CLARIN Projects call 1 CLARIN-NL Joint Flemish/Dutch pilot Whenever relevant, elements are.
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands Jan Odijk LREC May.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.
A Domain-Specific RM&IG Solution Designed to Support the Implementation of ISAD(G) Arian Rajh, PhD, Assist. Prof., FFZG Krešimir Meze, Omega software d.o.o.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
ADC Meeting ICEO Standards Working Group Steven F. Browdy, Co-Chair ADC Workshop Washington, D.C. September, 2007.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
CLARIN-NL Call 4 Jan Odijk CLARIN-NL Call 4 Info-session Amsterdam, 30 Aug
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
DASISH Metadata Catalogue Binyam Gebrekidan Gebre, Stephanie Roth, Olof Olsson, Catharina Wasner, Matej Durco, Bartholemeus Worcslav, Przemyslaw Lenkiewicz,
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
N. Calzolari 1Nijmegen, August 2010 Conclusions – Observations (maybe biased)  Field linguistics: Re-doing the path we did, asking the same questions,
eSciDoc Community Model Draft eSciDoc Community Model Overview 1.Introduction 2.Requirements on the Community Model 3.Organizational.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
CLARIN Requirements for a Semantic Registry Daan Broeder The Language Archive – MPI Ineke Schuurman CLARIN-NL/VL – KU Leuven & Utrecht.
TSG-S Project Coordination Recommendations Nick Yamasaki TSG-S Chair ABSTRACT: This document presents TSG-S recommendations for improved coordination of.
1 CLARIN? ISOCAT! Ineke Schuurman Hilversum,
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Sources of inspiration Discussions in DFT Use Cases Discussions in DF Use Cases „Paris“ Document Comments on „PARIS“ document Urgently need “Basic and.
eSciDoc Community Model Draft eSciDoc Community Model Overview 1.Introduction 2.Requirements on the Community Model 3.Organizational.
Repository Registries Agenda 11.30Welcome & State of the Discussion Is it all one – is it all different? Peter & Herman and commenters 12.10Actions to.
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
DC Architecture WG meeting Wednesday Seminar Room: 5205 (2nd Floor)
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
ISWG / SIF / GEOSS OOS - August, 2008 GEOSS Interoperability Steven F. Browdy (ISWG, SIF, SCC)
GEOSS Common Infrastructure (GCI) The GEOSS Common Infrastructure allows Earth Observations users to search, access and use the data, information, tools.
International Planetary Data Alliance Registry Project Update September 16, 2011.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
BoF: VREs- Keith G Jeffery & Helen Glaves
Common Solutions to Common Problems
Session 2: Metadata and Catalogues
Malte Dreyer – Matthias Razum
Bird of Feather Session
Presentation transcript:

The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry

Agenda 9:30 Introduction: Daan Broeder 10:00 CLARIN requirements: Menzo Windhouwer 10:45 coffee & tea 11:15 TLA requirements: Daan Broeder 11:45 ISO TC 37 requirements: Sue Ellen Wright 12:30 lunch 13:30 discussion 15:00 coffee & tea 15:30 discussion & conclusion - Sebastian Drude

ISOcat History & TLA ISOcat precursor: Syntax (INRIA) < 2007: LIRICS project TLA involvement with ISOcat 2007 – 2013 MPI as RA since 2008 Support & funding – MPG specific support for a year – RELISH: lexicon interoperability, – CLARIN: semantic interoperability: metadata, tagsets – TLA proper resources Several drives & cycles of modifications – Improvements for usability – Interoperability with other tools: CLARIN CMDI tools – to support the ISO standardization workflow – To support community driven standardization

Status Registration Authority for the ISO-DCR with MPI- PL since 2008 MPI/TLA Responsible for the maintenance and extension of the ISO-DCR – ISO TC37 decides on (standardized) content – ISO TC37 has inspired functional extensions Always good discussion under the supervision of ISO TC37 at the plenaries and beyond As yet no standardized DCs All ISO inspired ISOcat processes seem stuck 4

TLA perception wrt. ISOcat issues Actual use of the ISO-DCR by the user community is below our expectations. Causative factors: – The complex model of the DCR confuses many users for whom a simple concept definition would be sufficient but are now confronted with a for them confusing user interface – ISO standardization process that does not seem to fit the non- terminology linguistic community – not a single DC has been standardized – Current content of many DCs is of low quality and does not inspire participation – It is especially in research infrastructure projects as CLARIN, where resource providers have been 'forced' to use the ISO-DCR that we noticed the before mentioned factors Actual use of the ISO-DCR by the user community is below our expectations 11 June 2013ISO TC 37 plenary - DCR meeting5

ISOcat Future Although we are positive about the contribution that the work on the ISO-DCR has given to the discussion and concept development on the need for semantic interoperability, we feel that we should also consider alternatives. For this we plan to organise a conference in Q where alternative solutions for semantic interoperability can be compared and discussed and we will of course invite interested people from the TC 37 community to participate. The outcome of such discussions may lead to MPI-PL changing its use and support for the current form of the ISO- DCR. Of course MPI-PL will assist any other party that is prepared to take over the running and maintenance of the ISO-DCR in its current form and become registration authority in our place. In any case the MPI-PL will honor its commitments regarding the persistency of DCs. 11 June 2013ISO TC 37 plenary - DCR meeting6

TLA SEMANTIC REGISTRY REQUIREMENTS

What does TLA want for a SR? A Semantic Registry that we can use with our current and future projects to: – makes semantics explicit: documentation – support semantic interoperability: functionality Has some persistency (longer than a project’s lifetime) Supports community processes – Community coordination, recommendation etc. Visible content can be controlled The whole package GUI+content is sufficiently attractive to convince people to use it Can be used by adjacent communities from SSH Can be maintained at acceptable costs – Share with other communities – Use ready available software (as a basis)

TLA - projects CLARIN + associated: Joint domain of LR and LT – ISOcat + RELcat: – Tagset interoperability – Architecture needs: PID + term + description+ examples + API + relations + coordination DASISH Social Sciences & Humanities common services – Metadata interoperability needs: PID + term + description. Use handcrafted equivalences based on schema inspection and docs – Tools & Services registry EUDAT general Data Management Services – Metadata interoperability needs: PID + term + description. Use handcrafted equivalences based on schema inspection and docs – Semantic services: still in planning stage RDA: Research Data Alliance, for now just talking – Data Type Registry WG has implementation that can use a SR: PID + term + description

A (very (too?)) simple model PID Term Description Examples Key/value For grouping concepts e.g. taxonomies and community specific purposes a set of key/value pairs is very powerful (Can support limited relations and typing when you need that) Principals and privileges Key/value Manage access and collaboration Organize the concepts

What do we currently have? ISOcat works, but not very well so. Content Quality is not always very good and people are reluctant to add. – Partly by historic reasons (SYNTAX) – Frequent errors made by contributors: complex model – Proliferation of DCs Not able to find useful existing definitions Insufficient coordination GUI too complex and slow caused partly by complex model Standardization process did not work – Required modifications to support community coordination Difficult to convince our own community, certainly difficult to convince colleagues from other disciplines Expensive in maintenance both software and coordination We spent much time implementing unused functionality

TLA Strategies A.ISOcat stays as it is. – MPI stays RA and responsible for development – ISO link is maintained – TLA: costs are high, little synergy with other projects B.TLA stops as RA – Less costs – ISO link is maintained – find suitable partner as RA – TLA: develops/uses LW SR C.Adapt ISOcat, leaner model, no stdz. workflow – Costs for keeping static image of current DCs – Possibly more synergy, more agile – Move functions (typing) to RelCat – More easy maintenance D.Hybrid model: light-weight/open SR and ISOcat (closed SR) – Cost do not diminish – Better management for communities – ISO link is maintained E. New Semantic registry lightweight SR – Costs for keeping static image of current DCs – Possibly existing product (SKOS based, Media Wiki, …), no maintenance – Possibly more synergy, shared development – ISO link severed, more agile F. (Mis)use existing CMS as DRUPAL, PLONE – Costs for keeping static image of current DCs – Process functions taken care of – But data-entry, forms have to be programmed – PIDs? – Little maintenance – ISO link severed, more agile

Cost / Benefits (TLA centric) actionMaint.Initial costs CLARINSynergy other projects ISO compatibility Keep ISOcat as it is ISOcat -> other RA TLA new LW Reg Leaner ISOcat model without ISO stdz (ISO requirements?) Hybrid ISOcat + LW Reg New LW Reg Use existing CMS0--++0