The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.

Slides:



Advertisements
Similar presentations
National Institute of Statistics, Geography and Informatics (INEGI) Implementation of SDMX in Mexico.
Advertisements

Getting Involved in OLAC Steven Bird University of Pennsylvania LREC Symposium: The Open Language Archives Community 29 May 2002.
Endangered Languages and Web-Based Archiving Megan J. Crowhurst The University of Texas at Austin & CELP Contributors: Chris Beier, Heidi Johnson, Lev.
Getting Involved in OLAC Steven Bird University of Pennsylvania LSA Symposium: The Open Language Archives Community 4 January 2002.
Strategic issues for digital projects... …or, what are we doing here?
CLARIN AAI, Web Services Security Requirements
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Selecting a Data Sharing Repository. 2 Why Share Data? Enabling others to replicate and verify results as part of the scientific process Allows researchers.
CLARIN and the DSA Paul Trilsbeek The Language Archive Max Planck Institute for Psycholinguistics.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
Final Report of Working Group 5 Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg G. Simons.
CMDI Virtual Language Observatory Faceted Browsing Patrick Duin Max Planck Institute for Psycholinguistics 2011.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
Steven KrauwerLREC20081 CLARIN: Common Language Resources and Technology Infrastructure for the Humanities and Social Sciences Kimmo Koskenniemi (University.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
Something for Everything: Thoughts on Archival Description at Princeton Dan Santamaria PACSCL: Something New for Something Old Conference December 4, 2008.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Data Seal of Approval Overview Lightning Talk RDA Plenary 5 – San Diego March 11, 2015 Mary Vardigan University of Michigan Inter-university Consortium.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
National Public Health Performance Standards Local Assessment Instrument Essential Service:10 Research for New Insights and Innovative Solutions to Health.
CLARIN Centers for a Sustainable Infrastructure Daan Broeder, MPI for Psycholinguistics Jan Odijk, Utrecht University.
NGAC Interagency Data Sharing and Collaboration Spotlight Session: Best Practices and Lessons Learned Robert F. Austin, PhD, GISP Washington, DC March.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
Why Data Management Plans? The management of digital and non-digital objects is crucial for the whole scientific and administrative process within an.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
CLARIN ERIC Progress according to the Strategy Plan Steven Krauwer, Bente Maegaard 1.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
CMDI Component Registry Patrick Duin Max Planck Institute for Psycholinguistics 2011.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Dr. Bhavani Thuraisingham October 2006 Trustworthy Semantic Webs Lecture #16: Web Services and Security.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
Responsible Data Use (or what should you do if you find yourself re-using someone else’s data) Ruth Duerr National Snow and Ice Data Center.
DASISH Final Conference Common Solutions to Common Problems.
ICSTI Annual Members’ Meeting & Workshop Dr. Stefan Winkler-Nees; Paris, 5. March 2012 The Alliance of German Science Organisations - Recommendations on.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
CLARIN work packages. Conference Place yyyy-mm-dd
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Digitization Training and Metadata The View from Two UIUC Projects Sarah L. Shreeves University of Illinois at Urbana-Champaign Truth and Consequences.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Orcid.org ORCID adoption in research evaluation workflow ARMS2015, Singapore, 02 Oct 2015 Nobuko Miyairi Regional Director, Asia Pacific
Networks ∙ Services ∙ People Thomas Bärecke Journée Fédération, Paris Collaboration européenne GÉANT SA5 03/07/2015 SA5 T5 team
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
European open science cloud (EOSC) visions and impact on DARIAH roadmap Eveline Wandl-Vogt, Maarten Hoogerwerf, Jakub Szprot.
The TERENA-OER Portal Eli Shmueli IUCC- Israeli-Inter Universities Communication Center MEITAL- Inter-University Center for e-Learning
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
1 Open Discovery Space Overview Argiris Tzikopoulos, Ellinogermaniki Agogi Open Discovery Space [CIP-ICT-PSP ][elearning] A socially-powered and.
Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)
MICHAEL and the European Digital Library: promoting teaching, learning and research The MICHAEL Project is funded under the European Commission eTEN Programme.
CESSDA SaW Training on Trust, Identifying Demand & Networking
Acceptable Use Policy (Draft)
Curtailing the Challenges Faced in Digital Society
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Darja Fišer CLARIN ERIC Director of User Involvement
Common Solutions to Common Problems
Baseline Expectations for Trust in Federation
Presentation transcript:

The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the years to come. Paul Trilsbeek and Alexander König

Users of EL Archives Regionally oriented archives: Largest proportion of users are community members* “Global” archives: Largest proportion are researchers* How to attract more users and a larger variety of users to both types of archives? How to increase the language documentation effort? * Cf. P. Austin: Who uses endangered languages archives?

Community involvement Getting community members to engage in the documentation process? Providing easy, YouTube-like upload mechanism Some possible issues: – Technical quality of the recordings – Metadata – Ethics, methodology

Technical quality of the data Limited by available equipment (camcorder, photo camera, mobile phone?) Limited by a/v recording skills Offering (online) training could help

Metadata Already difficult enough to get current depositors to provide high quality metadata, while they are generally obliged to do so Resources without any kind of metadata are useless Come up with a core set of essential fields that should be filled in Controlled vocabularies might be problematic, perhaps re-using previously entered values as auto- complete suggestions and provide mappings (curation process) to standard CVs

Ethics We assume that current depositors are aware of generally applicable ethical guidelines such as the DOBES code of conduct, i.e. they know that it is for example required to obtain informed consent from the people being recorded For third-party deposits, one does not know whether this is the case Provide guidelines prominently in the deposit site might help

Integration with large-scale infrastructures Currently a lot of developments in the field of data and research infrastructures such as the CLARIN and DARIAH infrastructure projects funded by the European Commission The idea is to make data and service providers interoperable such that a researcher can use them all together seamlessly in “virtual research environments” Some topics: – Federated authentication and authorization – Interoperable metadata framework – Interoperable data providers and web services providers

Integration with large-scale infrastructures What should EL archives do? – Provide metadata records in formats required by these infrastructures (OLAC, CMDI, …)CMDI – Make use of central data category registries such as ISOcat ISOcat – Follow the developments regarding federated AAI infrastructure and participate when possible – Follow the developments regarding web service specifications for language resources

Examples of metadata aggregators VLO faceted browser OLAC faceted browser NaLiDa faceted browser

Example of interoperable web services WebLicht

Access restrictions Access restrictions are necessary in the field of endangered languages archives to respect the whishes of the speech communities and to protect their privacy They do however frustrate many users of EL archives Some researchers keep material restricted for personal career reasons. In the case of a young scholar writing a PhD thesis, this is understandable and acceptable. Less so for established researchers who are the main expert on a certain language and who have been able to collect the data with public funding.