Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.

Slides:



Advertisements
Similar presentations
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
Advertisements

CLARIN AAI, Web Services Security Requirements
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
CLARIN Technical Infrastructure How far are we?. Short Overview CLARIN is one of the 44 accepted ESFRI Roadmap Initiatives official start: , Kick-off:
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
CLARIN licensing schemes Anje Müller Gjesdal & Gunn Inger Lyse, University of Bergen.
CLARIN and the DSA Paul Trilsbeek The Language Archive Max Planck Institute for Psycholinguistics.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
EUDAT Towards a pan-European Collaborative Data Infrastructure Ari Lukkarinen CSC-IT Center for Science, Finland APA Conference, November 6th, 2012.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen,
CLARIN Centers for a Sustainable Infrastructure Daan Broeder, MPI for Psycholinguistics Jan Odijk, Utrecht University.
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
Language-Sites: Accessing Language Resources via Geographic Information Systems Dieter van Uytvanck, Alex Dukers, Paul Trilsbeek Jacquelijn Ringersma (Peter.
CLARIN for Linguists Introduction Jan Odijk LOT Summerschool Nijmegen,
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
From DOBES to CLARIN and beyond Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen ?
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands PIDs in Data Infrastructures Peter Wittenburg CLARIN Research.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO and the Peter Doorn Data Archiving and Networked Services EUDAT Conference Trust.
CLARINO WP2 National Registry and Long- Term Archiving Freddy Wetjen and Oddrun Pauline Ohren National Library of Norway Bergen, 12. September 2013.
REPLIX Max Planck Institute for Psycholinguistics, TLA.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
Standards and Tools: DOBES and CLARIN Views - resumé after about 8 years - Peter Wittenburg, André Moreira The Language Archive - Max Planck Institute.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
CLARIN - a European Research Infrastructure Peter Wittenburg Max-Planck Institut für Psycholinguistik, Nijmegen.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Linguistics with CLARIN Introduction Jan Odijk LOT Winterschool Amsterdam,
Summary Data Practices Report Peter Wittenburg Max Planck Data & Compute Center former MPI for Psycholinguistics.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
Exploring and Enriching a LR Archive via the Web Marc Kemps-Snijders, Alex Klassmann, Claus Zinn, Peter Berck, Albert Russel, Peter Wittenburg MPI for.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
KATRINE GASSER Meeting: Data Management projects 15/
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
IULA-UPF repositories: management, integration, how to survive Marta Villegas.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
CLARIN and CLARINO resources Knut Hofland Uni Research Computing Bergen, Norway Workshop ICAME 37, Hong Kong,
Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Services.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Herbadrop.
PIDs in EUDAT Webinar, 15 Februari 2013
Towards a pan-European Collaborative Data Infrastructure
EUDAT’s engagement with the Earth Sciences
AAI for a Collaborative Data Infrastructure
CLARIN Federated Identity Vision
Common Solutions to Common Problems
Malte Dreyer – Matthias Razum
Virtual Competency Centre 1: e-Infrastructure General VCC meeting, 2/3 April 2012, Utrecht, The Netherlands Karlheinz Moerth (Co-head of VCC 1, Austria)
Working Group: DFT - some use cases - Peter Wittenburg, Raphael Ritz
Presentation transcript:

Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics

DOBES – what is it? international collaboration documenting cultures and languages since 2000 about 65 teams, about 100 languages/cultures about 12 regional archives connected bi-directionally to MPI archive long-term archiving/curation/accessing of about 80 TB is a must unclear legal situation -> trust as basis is a result of years of collaboration

DOBES which tools? ELAN/ LEXUS/ SYNPATHY Annotation/Lexicon/Syntax IMDI->CMDI/ ARBIL Organization/Metadata LAMUS/ AMS/ RSS Data Management/ Replication Data Archive IMDI->CMDI/ GIS/Faceted/ OAI-PMH Metadata Access/Harvesting ANNEX/ IMEX/ LEXUS/ TROFA Annotation/Pictures/Lexicon/Search VICOS Annotations/Relations/Conceptual Spaces ISOcat Semantic Interoperability COSIX, REPLIX, OAI-PMH Handle Replication Harvesting PID Shoebox Transcriber CLAN XML Import/Export time series annotation lexicon conceptual spaces organization & metadata metadata & content search pattern detection & annotation DOBES tool suite

what does DOBES need? trust, trust, trust,... – otherwise no data deposits & access different aspects of trust: protection of data, no copyright claims, adherence to CoC, etc. persistent store and access (MPG: 50 years on bit-stream) etc. dynamic and safe replication to several remote sites now 4 big centers now exchange with 12 regional centers it was not really safe – now better thanks to EUDAT (usage of PIDs) access to remote replicas requires “rights transfer” would like to have distributed AAI change to massive crowd sourcing via mobile/smart phones large amount of data – automatic + parallel annotation by detectors

CLARIN – what is it? many interviews CLARIN ERIC Germany Netherlands Flanders Norway Denmark Austria Czech Republic Bulgaria Poland Finland South Tirol Oxford France? Landscape ~200 institutions 10 audited centers 15 more to come Offer an interoperable and accessible domain of language resources and technology.

CLARIN – trusted centers trusted centers as pillars for resources and services set of criteria for becoming recognized CLARIN center part of criteria is DSA compliance (Data Seal of Approval) proper repository setup (data organization a la RDA: PID, Metadata, Relations) funding/persistency statements sufficient staffing currently 10 audited centers ~ 15 more to come soon however: many have problems to turn their “data chaos” into a trusted repository – quite a challenge

CLARIN – CMDI to harmonize MD from schema based to semantically based interoperability why: so many sub-communities with specific requirements common anchor is now ISOcat “concept” and component registry separate relation registry since relations are often task dependent ! need a smart tool environment to support metadata ISOcat ARBIL

> resources / > 500 tools/services all open metadata being harvested Virtual Language Observatory CLARIN – VLO activity

CLARIN – AAI cross-country AAI does not work yet

CLARIN – Workflow activity Web 2.0 Application for Tool Chaining and Execution Repository StuttgartTübingenBerlinLeipzigFinland Standard-conformant Text Corpus Encoding StuttgartTübingenLeipzig RomaniaPolandAustriaNetherlands language evolution speech recognition virtual reality image recognition brain imaging text processing

what does CLARIN need? from others PID system for DOs – available (EPIC, DataCite) personal ID – improving (ORCID) neutral assessment instances – available (DSA) common cross-country AAI solution simplifying SSO persistent and accessible replication store – available (EUDAT, national) easy exchange & store for long-tail data - available (OpenAIRE, EUDAT, national) cluster capacity to support open workflow landscape – (EUDAT?, national) some HPC for special calculations/simulations – (PRACE?, national) improved common semantic solutions – yet unclear (EUDAT?) more harmonization in data organization etc. – therefore RDA

Thanks.