Presentation is loading. Please wait.

Presentation is loading. Please wait.

Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.

Similar presentations


Presentation on theme: "Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest."— Presentation transcript:

1 Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest

2 CLARIN NL Context 4 Dutch CLARIN centers each with their own interests and traditions  DANS, Dutch Academy data archiving service  INL, Dutch Institute for Lexicography  Meertens Institute, Dutch dialects and language variation  MPI for Psycholinguistics, Endangered Languages, acquisition corpora  Different cross center relations  Organizational relations  Past and existing project cooperation  Can all lead to different preferences for technical solutions, interoperability approaches and data-formats  All have production environments that need to deliver services, so they tend to be conservative with changes  New technology needs to be understood first and usually parallel systems are created  General adaptations for CLARIN requirements can only be slowly introduced  Although centers made commitments, resources are limited.

3 CLARIN NL Goals  Build and support relevant central infrastructure services  Guide harmonizing the relevant practices and systems at the centers by long-term funded projects  Accept and deliver CLARIN metadata (CMDI) for LRT resources  Use PIDs to identify resources  Federated Identity management as an AAI solution  Use CLARIN recommended formats…  Connect these to the Dutch LRT research world  Offering access to resources and technology  Offering infrastructure services: e.g. catalog of LRs  Run LT services as standardized web-services  Therefore:  infrastructure projects for and by the centers  small short-term projects cross-linking research groups with CLARIN centers

4 Infrastructure Projects  Creating and testing CLARIN metadata components  Two major Dutch Language Resource centers testing CMDI for their resources  Infrastructure Integration Project  Building & maintaining registries:  ISO-Cat, REL-Cat  CMDI Component registry, ARBIL metadata editor  Planning and supporting the AAI for the CLARIN centers and and user organizations  For format & tag set standards we look to CLARIN EU documentation, but..  Archivable format + installed base = ok  Should be reluctant to adopt new formats  Search and Development  Federated content search for the CLARIN centers  In cooperation with the CLARIN EU EDC initiative  Find we have to extend the SRU/CQL standard  CLAVAS, CLARIN Vocabulary Service

5 CLARIN NL Sub-projects ProjectDescriptionStandard. & Interop. IssuesCenter AAM-LR Automatic Annotation of Multi-modal Language Resources ISO-Cat (audio TDG), Web-servicesMPI Adelheid A Distributed Lemmatizer for Historical Dutch Web-services (CLAM)MPI ADEPT Assaying Differences via Edit-Distance of Pronunciation Transcriptions Web-app PID (Cool-URI with username)MI DUELME-LMFConverting DUELME into LMF format ISO-Cat, LMFINL INTER-VIEWSCuration of Interview Data PIDs (URN resolver, resource fragments)DANS MIMORE Microcomparative Morphosyntax Research Tool Own format developmentMI SignLinc Linking lexical databases and annotated corpora of signed languages ISO-Cat (Gesture TDG), Open/closed metadata, formats (LMF, EAF) MPI TICClops Text-Induced Corpus Clean-up online processing system INL TDS-Curator A web-services architecture to curate the Typological Database System DANS TQETranscription Quality Evaluation AAI (CLAMless), Fomats (WAV, TextGrid)MPI WFT-GTB Integrating the Wurdboek fan 'e Fryske Taal into the Geïntegreerde Taalbank INL TTNWW (Long-term) Dutch-Flemish project to enable SSH researchers access to existing (STEVIN) HLT tools via web services Web-services (CLAM), corpus formats & tagsets (D-COI, CGN/SoNaR,LASSY, proposed Folia format) several

6 CLARIN standards info  CLARIN EU website. CLARIN EU FAQ has a few standard recommendations and a CLARIN Standardization Action Plan. There was some criticism about the ‘too theoretical’ content of this document.CLARIN Standardization Action Plan  CLARIN short guide http://www.clarin.eu/files/standards-CLARIN- ShortGuide.pdf. The references in this document are out of date.http://www.clarin.eu/files/standards-CLARIN- ShortGuide.pdf  The CLARIN EU standardization action plan: http://www.clarin.eu/node/2841 also has a list of recommended standards and best practices and points to open issues and the CLARIN position.http://www.clarin.eu/node/2841  CLARIN official documents: there is a document with a very large enumeration of LR&T standards and best practices, but contains no specific recommendation http://www-sk.let.uu.nl/u/D5C-3.pdf http://www-sk.let.uu.nl/u/D5C-3.pdf  CLARIN NL Helpdesk has a FAQ with a standards section: http://trac.clarin.nl/trac/wiki/WikiStart#Formatsandstandards references to known CLARIN docs http://trac.clarin.nl/trac/wiki/WikiStart#Formatsandstandards

7 CLARIN Standards for LRT v6 Standards for LRT V6-3.pdf (http://www.clarin.eu/system/files/Standards%20for%20LRT-v6.pdf): Marc Kemps-Snijders, Núria Bel, Peter Wittenburg, Daan Broeder, Dieter van Uytvanck (CLARIN), Laurent Romary (ISOTC37, TEI), Erhard Hinrichs (CLARIN) and Gerhard Budin (Flarenet) – January 2009  Each known name of a standard or best-practice guideline is commented according to a few criteria:  Standard indicates whether it is a standard (++), a best practice in the field (+) or simply known (0)  State indicates the state: proven (++), ready (+) or in progress (0)  Pivot indicates whether the guideline is meant as a pivot mechanism  Advise indicates whether in CLARIN the usage should be obligatory (++), recommended (+) or whether CLARIN is neutral (0)

8 Standards for LRT v6 example NameStandardStatePivotAdviseFunctionComment …. TEI Tags++++various tag sets defined by TEI (P5) will be supported by CLARIN when elements are required ISO 16642 TMF++ +Terminology Markup Framework … OLAC+++++Added refinements on DC elements Should be supported as a simple pivot format IMDI++++More detailed description set for various LRs is a widely used format and will be supported in CLARIN; elements will be in ISOcat TEI Header (header module) ++++Specification of a wide number of elements that can be used as metadata elements Selected set wil be supported in CLARIN

9 Recommendations  Create a CLARIN EU standard registry of the form as in the “standards for LRT” doc  Setup a governance structure  With adequate representation of the  National CLARIN partners  Kindred organizations & projects as DARIAH, Flarenet, ISO- TC37  But with emphasis on practicality  Create additional documentation as recipe books to support further uptake and application.

10 Thank you for your attention CLARIN has received funding from the European Community's Seventh Framework Programme under grant agreement n° 212230


Download ppt "Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest."

Similar presentations


Ads by Google