Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.

Slides:



Advertisements
Similar presentations
Building metadata components Dieter Van Uytvanck Max Planck Institute for Psycholinguistics CLARIN-NL Info Session Nijmegen
Advertisements

CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
CLARIN AAI, Web Services Security Requirements
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
CLARIN Technical Infrastructure How far are we?. Short Overview CLARIN is one of the 44 accepted ESFRI Roadmap Initiatives official start: , Kick-off:
User Attributes; who, where, how many? Daan Broeder TLA – MPI for Psycholinguistics.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
DASISH Online Training Module Claudia Engelhardt Access Policies and Licensing Timo Gnadt
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
11 CLARIN? ISOCAT! Ineke Schuurman ISOcat content coördinator CLARIN-NL Amsterdam
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
TLA/CLARIN CLAVAS Use Cases: Overview CMDI integration – Metadata editing Resource Annotation Kinship data.
Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
DASISH Strategic Board T he future of data infrastructures in social science and humanities Bente Maegaard CLARIN ERIC & University of Copenhagen November.
CLARIN Centers for a Sustainable Infrastructure Daan Broeder, MPI for Psycholinguistics Jan Odijk, Utrecht University.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
Language-Sites: Accessing Language Resources via Geographic Information Systems Dieter van Uytvanck, Alex Dukers, Paul Trilsbeek Jacquelijn Ringersma (Peter.
CLARIN-NL Call 3 Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
CLARIN for Linguists Introduction Jan Odijk LOT Summerschool Nijmegen,
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands Jan Odijk LREC May.
CLARIN Common Language Resources and Technology Infrastructure Daan Broeder & Dieter van Uytvanck Max-Planck Institute for Psycholinguistics TF-EMC2 Meeting,
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
CLARIN web services and workflow Marc Kemps-Snijders.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
CLARIN-NL Call 4 Jan Odijk CLARIN-NL Call 4 Info-session Amsterdam, 30 Aug
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
CLARIN and the Humanities Daan Broeder The Language Archive – MPI for Psycholinguistics CLARIN EU/NL Workshop on Federated Identity Management CERN, June.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
CLARIN - a European Research Infrastructure Peter Wittenburg Max-Planck Institut für Psycholinguistik, Nijmegen.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities in the Netherlands Jan Odijk Utrecht 28 June 2010.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
Linguistics with CLARIN Introduction Jan Odijk LOT Winterschool Amsterdam,
Populating the infrastructure the case of the Netherlands Hans Bennis executive board of CLARIN-NL Meertens Institute (KNAW) CLARIN COORDINATORS BUDAPEST,
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
CLARIN work packages. Conference Place yyyy-mm-dd
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
Transcripts are stored in a relational database Transcripts are divided up to their smallest constituent (words), while the context is preserved, in a.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
CLARIN-NL Requirements and Desiderata Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
Tutorial on XML Tag and Schema Registration in an ISO/IEC Metadata Registry Open Forum 2003 on Metadata Registries Tuesday, January 21, 2003; 4:45-5:30.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Formats, interoperability and standards Marc Kemps-Snijders.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
Broad Functional Classification a Data Type Registry Use Case
CLARIN Federated Identity Vision
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Accommodating local cataloguing traditions in a global context
Common Solutions to Common Problems
Presentation transcript:

Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest

CLARIN NL Context 4 Dutch CLARIN centers each with their own interests and traditions  DANS, Dutch Academy data archiving service  INL, Dutch Institute for Lexicography  Meertens Institute, Dutch dialects and language variation  MPI for Psycholinguistics, Endangered Languages, acquisition corpora  Different cross center relations  Organizational relations  Past and existing project cooperation  Can all lead to different preferences for technical solutions, interoperability approaches and data-formats  All have production environments that need to deliver services, so they tend to be conservative with changes  New technology needs to be understood first and usually parallel systems are created  General adaptations for CLARIN requirements can only be slowly introduced  Although centers made commitments, resources are limited.

CLARIN NL Goals  Build and support relevant central infrastructure services  Guide harmonizing the relevant practices and systems at the centers by long-term funded projects  Accept and deliver CLARIN metadata (CMDI) for LRT resources  Use PIDs to identify resources  Federated Identity management as an AAI solution  Use CLARIN recommended formats…  Connect these to the Dutch LRT research world  Offering access to resources and technology  Offering infrastructure services: e.g. catalog of LRs  Run LT services as standardized web-services  Therefore:  infrastructure projects for and by the centers  small short-term projects cross-linking research groups with CLARIN centers

Infrastructure Projects  Creating and testing CLARIN metadata components  Two major Dutch Language Resource centers testing CMDI for their resources  Infrastructure Integration Project  Building & maintaining registries:  ISO-Cat, REL-Cat  CMDI Component registry, ARBIL metadata editor  Planning and supporting the AAI for the CLARIN centers and and user organizations  For format & tag set standards we look to CLARIN EU documentation, but..  Archivable format + installed base = ok  Should be reluctant to adopt new formats  Search and Development  Federated content search for the CLARIN centers  In cooperation with the CLARIN EU EDC initiative  Find we have to extend the SRU/CQL standard  CLAVAS, CLARIN Vocabulary Service

CLARIN NL Sub-projects ProjectDescriptionStandard. & Interop. IssuesCenter AAM-LR Automatic Annotation of Multi-modal Language Resources ISO-Cat (audio TDG), Web-servicesMPI Adelheid A Distributed Lemmatizer for Historical Dutch Web-services (CLAM)MPI ADEPT Assaying Differences via Edit-Distance of Pronunciation Transcriptions Web-app PID (Cool-URI with username)MI DUELME-LMFConverting DUELME into LMF format ISO-Cat, LMFINL INTER-VIEWSCuration of Interview Data PIDs (URN resolver, resource fragments)DANS MIMORE Microcomparative Morphosyntax Research Tool Own format developmentMI SignLinc Linking lexical databases and annotated corpora of signed languages ISO-Cat (Gesture TDG), Open/closed metadata, formats (LMF, EAF) MPI TICClops Text-Induced Corpus Clean-up online processing system INL TDS-Curator A web-services architecture to curate the Typological Database System DANS TQETranscription Quality Evaluation AAI (CLAMless), Fomats (WAV, TextGrid)MPI WFT-GTB Integrating the Wurdboek fan 'e Fryske Taal into the Geïntegreerde Taalbank INL TTNWW (Long-term) Dutch-Flemish project to enable SSH researchers access to existing (STEVIN) HLT tools via web services Web-services (CLAM), corpus formats & tagsets (D-COI, CGN/SoNaR,LASSY, proposed Folia format) several

CLARIN standards info  CLARIN EU website. CLARIN EU FAQ has a few standard recommendations and a CLARIN Standardization Action Plan. There was some criticism about the ‘too theoretical’ content of this document.CLARIN Standardization Action Plan  CLARIN short guide ShortGuide.pdf. The references in this document are out of date. ShortGuide.pdf  The CLARIN EU standardization action plan: also has a list of recommended standards and best practices and points to open issues and the CLARIN position.  CLARIN official documents: there is a document with a very large enumeration of LR&T standards and best practices, but contains no specific recommendation  CLARIN NL Helpdesk has a FAQ with a standards section: references to known CLARIN docs

CLARIN Standards for LRT v6 Standards for LRT V6-3.pdf ( Marc Kemps-Snijders, Núria Bel, Peter Wittenburg, Daan Broeder, Dieter van Uytvanck (CLARIN), Laurent Romary (ISOTC37, TEI), Erhard Hinrichs (CLARIN) and Gerhard Budin (Flarenet) – January 2009  Each known name of a standard or best-practice guideline is commented according to a few criteria:  Standard indicates whether it is a standard (++), a best practice in the field (+) or simply known (0)  State indicates the state: proven (++), ready (+) or in progress (0)  Pivot indicates whether the guideline is meant as a pivot mechanism  Advise indicates whether in CLARIN the usage should be obligatory (++), recommended (+) or whether CLARIN is neutral (0)

Standards for LRT v6 example NameStandardStatePivotAdviseFunctionComment …. TEI Tags++++various tag sets defined by TEI (P5) will be supported by CLARIN when elements are required ISO TMF++ +Terminology Markup Framework … OLAC+++++Added refinements on DC elements Should be supported as a simple pivot format IMDI++++More detailed description set for various LRs is a widely used format and will be supported in CLARIN; elements will be in ISOcat TEI Header (header module) ++++Specification of a wide number of elements that can be used as metadata elements Selected set wil be supported in CLARIN

Recommendations  Create a CLARIN EU standard registry of the form as in the “standards for LRT” doc  Setup a governance structure  With adequate representation of the  National CLARIN partners  Kindred organizations & projects as DARIAH, Flarenet, ISO- TC37  But with emphasis on practicality  Create additional documentation as recipe books to support further uptake and application.

Thank you for your attention CLARIN has received funding from the European Community's Seventh Framework Programme under grant agreement n°