Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.

Similar presentations


Presentation on theme: "Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics."— Presentation transcript:

1 Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics

2 DOBES – what is it? international collaboration documenting cultures and languages since 2000 about 65 teams, about 100 languages/cultures about 12 regional archives connected bi-directionally to MPI archive long-term archiving/curation/accessing of about 80 TB is a must unclear legal situation -> trust as basis is a result of years of collaboration

3 DOBES which tools? ELAN/ LEXUS/ SYNPATHY Annotation/Lexicon/Syntax IMDI->CMDI/ ARBIL Organization/Metadata LAMUS/ AMS/ RSS Data Management/ Replication Data Archive IMDI->CMDI/ GIS/Faceted/ OAI-PMH Metadata Access/Harvesting ANNEX/ IMEX/ LEXUS/ TROFA Annotation/Pictures/Lexicon/Search VICOS Annotations/Relations/Conceptual Spaces ISOcat Semantic Interoperability COSIX, REPLIX, OAI-PMH Handle Replication Harvesting PID Shoebox Transcriber CLAN XML Import/Export time series annotation lexicon conceptual spaces organization & metadata metadata & content search pattern detection & annotation DOBES tool suite

4 what does DOBES need? trust, trust, trust,... – otherwise no data deposits & access different aspects of trust: protection of data, no copyright claims, adherence to CoC, etc. persistent store and access (MPG: 50 years on bit-stream) etc. dynamic and safe replication to several remote sites now 4 big centers now exchange with 12 regional centers it was not really safe – now better thanks to EUDAT (usage of PIDs) access to remote replicas requires “rights transfer” would like to have distributed AAI change to massive crowd sourcing via mobile/smart phones large amount of data – automatic + parallel annotation by detectors

5 CLARIN – what is it? many interviews CLARIN ERIC Germany Netherlands Flanders Norway Denmark Austria Czech Republic Bulgaria Poland Finland South Tirol Oxford France? Landscape ~200 institutions 10 audited centers 15 more to come Offer an interoperable and accessible domain of language resources and technology.

6 CLARIN – trusted centers trusted centers as pillars for resources and services set of criteria for becoming recognized CLARIN center part of criteria is DSA compliance (Data Seal of Approval) proper repository setup (data organization a la RDA: PID, Metadata, Relations) funding/persistency statements sufficient staffing currently 10 audited centers ~ 15 more to come soon however: many have problems to turn their “data chaos” into a trusted repository – quite a challenge

7 CLARIN – CMDI to harmonize MD from schema based to semantically based interoperability why: so many sub-communities with specific requirements common anchor is now ISOcat “concept” and component registry separate relation registry since relations are often task dependent ! need a smart tool environment to support metadata ISOcat www.isocat.org ARBIL www.lta.nl

8 > 100.000 resources / > 500 tools/services all open metadata being harvested Virtual Language Observatory CLARIN – VLO activity

9 CLARIN – AAI cross-country AAI does not work yet

10 CLARIN – Workflow activity Web 2.0 Application for Tool Chaining and Execution Repository StuttgartTübingenBerlinLeipzigFinland Standard-conformant Text Corpus Encoding StuttgartTübingenLeipzig RomaniaPolandAustriaNetherlands language evolution speech recognition virtual reality image recognition brain imaging text processing

11 what does CLARIN need? from others PID system for DOs – available (EPIC, DataCite) personal ID – improving (ORCID) neutral assessment instances – available (DSA) common cross-country AAI solution simplifying SSO persistent and accessible replication store – available (EUDAT, national) easy exchange & store for long-tail data - available (OpenAIRE, EUDAT, national) cluster capacity to support open workflow landscape – (EUDAT?, national) some HPC for special calculations/simulations – (PRACE?, national) improved common semantic solutions – yet unclear (EUDAT?) more harmonization in data organization etc. – therefore RDA

12 Thanks.


Download ppt "Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics."

Similar presentations


Ads by Google