The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.

Slides:



Advertisements
Similar presentations
White Paper on Establishing an Infrastructure for Open Language Archiving Steven Bird and Gary Simons.
Advertisements

CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
CLARIN AAI, Web Services Security Requirements
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Introduction to SDMX Seminar Eurostat/ECLAC 02 October 2012 August Götzfried Head of Unit, Eurostat B5 Management of statistical data and metadata.
CLARIN and the DSA Paul Trilsbeek The Language Archive Max Planck Institute for Psycholinguistics.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
Steven KrauwerLREC20081 CLARIN: Common Language Resources and Technology Infrastructure for the Humanities and Social Sciences Kimmo Koskenniemi (University.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
DASISH Common Solutions to Common Problems. DASISH – Data Service Infrastructure for the Social Sciences and Humanities DASISH brings together 5 ESFRI.
Virtual Observatory Single Sign-on U.S. National Virtual Observatory National Center for Supercomputing Applications Ray Plante, Bill Baker.
Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen,
Clinic to Cloud Provides an Electronic Medical Records System to Doctors in Australia, Hosted by Highly Secure Microsoft Azure Data Centers MICROSOFT AZURE.
CLARIN Centers for a Sustainable Infrastructure Daan Broeder, MPI for Psycholinguistics Jan Odijk, Utrecht University.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands PIDs in Data Infrastructures Peter Wittenburg CLARIN Research.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
CLARIN - a European Research Infrastructure Peter Wittenburg Max-Planck Institut für Psycholinguistik, Nijmegen.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
CMDI Component Registry Patrick Duin Max Planck Institute for Psycholinguistics 2011.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
DASISH Final Conference Common Solutions to Common Problems.
Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly.
Summary Data Practices Report Peter Wittenburg Max Planck Data & Compute Center former MPI for Psycholinguistics.
CLARIN work packages. Conference Place yyyy-mm-dd
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Actualog Social PIM Helps Companies to Manage and Share Product Information Using Secure, Scalable Ease of Microsoft Azure MICROSOFT AZURE ISV PROFILE:
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
SDMX IT Tools Introduction
Authorization and Authentication Infrastructure Daan Broeder & Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Repository Registries Agenda 11.30Welcome & State of the Discussion Is it all one – is it all different? Peter & Herman and commenters 12.10Actions to.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
WHAT WE OFFER Go-To-Market Services Microsoft Azure Brings to Life Citizen Assistance, the Tech Solution That Improves Communication Between the People.
Search and Annotation Tool for Oral History INTER-VIEWS Henk van den Heuvel, Centre for Language and Speech Technology (CLST) Radboud University Nijmegen,
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EPOS and EUDAT.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Network and Server Basics. Learning Objectives After viewing this presentation, you will be able to: Understand the benefits of a client/server network.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Herbadrop.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Aalto Data Repository.
PIDs in EUDAT Webinar, 15 Februari 2013
Towards a pan-European Collaborative Data Infrastructure
Free Cloud Management Portal for Microsoft Azure Empowers Enterprise Users to Govern Their Cloud Spending and Optimize Cloud Usage and Planning MICROSOFT.
CLARIN Federated Identity Vision
Antonella Fresa Technical Coordinator
Common Solutions to Common Problems
European Research Data Services, Expertise & Technology Solutions
Working Group: DFT - some use cases - Peter Wittenburg, Raphael Ritz
Presentation transcript:

The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure

Things that keep us busy I understanding language roots feature matrix extracted from many cross-disciplinary & cross-country resources phylogenetic algorithms to compute dependency trees can’t easily access required resources understanding language machine so many institutes creating brain image data do we know about them and their recording contexts? can we access them easily?

Things that keep us busy II automatic language processing speech and body movement (gesture, signing, mimics, etc.) recognition is hard no one stochastic recognizer will do there is so much technology out there worldwide and components from different disciplines do we know about them can we easily access them

In CLARIN we are so good developed a flexible component model to allow user to create metadata profiles have established an open Data Category Registry (ISOcat) system based on ISO (compliant with ISO 11179) got a professional tool set allowing users to create, register and share components and profiles to create MD descriptions efficiently

In CLARIN we are so good Virtual Language Observatory

In CLARIN we are so good got a distributed SOA domain with many language&speech tools integrated / being integrated use metadata profile matching to find appropriate tools when chaining services

but...  there is so much data (& software) out there no one still knows of resp. no one is able to access from about 200 linguistic departments creating data there are less than a handful centers in EU who have a proper repository, do archiving and curation, give access, allow computation and enrichments, are audited, etc. no way to allow machines currently to access most of the resources blindly - common way: download & squeeze each individual resource/collection proper metadata at high granularity still unpopular only some harmonization at international level only incidentally discipline crossing chats

cross-disciplinary aspect large number of discipline-specific centers with access services all disciplines similar should we all do LTA, offer capacity computing, run PID, etc.? a network of strong data & compute hubs let them give COMMON services such as LTP, data staging, PID, AAI, etc. network of large data hubs network of discipline hubs

but...  do we know what common services are and do we accept do we understand data organizations of communities to design services do we have agreed mechanisms working on large and complex data sets in a secure way in a federation do we agree on the same essential building blocks for a common data infrastructure AND - many communities are organized worldwide Thus - need a GLOBAL forum to agree on some essentials that will make data-driven research more efficient and foster new insights