The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands PIDs in Data Infrastructures Peter Wittenburg CLARIN Research.

Slides:



Advertisements
Similar presentations
National Library of New Zealand Dave Thompson Resource Development Analyst Digital Initiatives Unit.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
CLARIN Technical Infrastructure PIDs - How far are we?
Effective management Accurate tracking Easier automation.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
CHAPTER 7 Roderick Dickson Kelli Grubb Tracyann Pryce Shakita White.
Administration & Workflow
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
1 Persistent identifiers, long-term access and the DiVA preservation strategy Eva Müller Electronic Publishing Centre Uppsala University Library, Sweden.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen,
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
Chapter 1 Overview of Databases and Transaction Processing.
Tobias Weigel (DKRZ) Tobias Weigel Deutsches Klimarechenzentrum (DKRZ) Persistent Identifiers Solving a number of problems through a simplistic mechanism.
January, 23, 2006 Ilkay Altintas
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
Max Planck Institute for the History of Science Urs Schoepflin & Simone Rieger, Max Planck Institute for the Histoy of Science, 2009Schoepflin/Rieger December.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
ISAN: International Standard Audiovisual Number Hollywood Post Alliance Technology Retreat January 27 & 28, 2005 S. Merrill Weiss Merrill Weiss Group LLC.
Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Data Citation Implementation Pilot Workshop
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Chapter 1 Overview of Databases and Transaction Processing.
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
PIDs in EUDAT Webinar, 15 Februari 2013
RDA Europe: Views about PID Systems
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
ACS 2016 Moving research forward with persistent identifiers
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
eSciDoc –Object and content modelling experiences
Health Ingenuity Exchange - HingX
IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C
Publishing data and metdata From iRODS to repositories
Database Design Hacettepe University
RDA uptake activities and plans: ESGF
Working Group: DFT - some use cases - Peter Wittenburg, Raphael Ritz
1st Call for Collaboration Projects
Presentation transcript:

The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands PIDs in Data Infrastructures Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure

Automatic Workflows most data is created automatically as part of workflows manual operations are exceptions at data creation time it is not obvious what their future life will be later association with metadata and PIDs troublesome and costly thus immediate generation of metadata and PIDs as part of automated workflows data resources need to be referable and often citable (published) need a reliable and highly performing machinery (registration + resolution) based on stable standards typically DOIs via DataCite typically Handles via EPIC

assume that we have a recording of an extinct language and some annotations that tell us what someone said about medicine etc researchers create relations that need to be preserved Video Recording Sound Recording Annotations Recording Session Metadata Record from Repository A from Repository B from Repository C How long, stable and persistent? are using Handles from EPIC service PID usage in our domain

Biological and cultural processes have evolved together, in a symbiotic spiral; they are now indissolubly linked, with human survival unlikely without such culturally produced aids as clothing, cooked food, and tools. The twelve original essays collected in this volume take an evolutionary perspective on human culture, examining the emergence of culture in evolution and the underlying role of brain and cognition. The essay authors, all internationally prominent researchers in their fields, draw on the cognitive sciences -- including linguistics, developmental psychology, and cognition -- to develop conceptual and methodological tools for understanding the interaction of culture and genome. They go beyond the "how" -- the questions of behavioral mechanisms -- to address the "why" -- the evolutionary origin of our psychological functioning. What was the "X-factor," the magic ingredient of culture -- the element that took humans out of the general run of mammals and other highly social organisms? Several essays identify specific behavioral and functional factors that could account for human culture, including the capacity for "mind reading" that underlies social and cultural learning and the nature of morality and inhibitions, while others emphasize multiple partially independent factors -- planning, technology, learning, and language. The X- factor, these essays suggest, is a set of cognitive adaptations for culture. ePublication Repository 1 eRessource Repository 2 How long, etc.? Handles from EPIC PID usage in our domain

let‘s isolate external properties of our data objects and collections and ignore the content (structure, semantics, packaging, etc.) for a moment Data Object World originatordepositorrepository Auser registered DO - data - metadata (Key-MD) - location handle generator PID property record access rights type (from central registry) ROR flag mutable flag transaction record repository B work ownership data metadata (Key-MD) PID access rights hands-over requests deposits via RAP requests stores maintains receives disseminations via RAP replicates goes back to a paper by Kahn & Wilensky, 2006

way how we organize data different other variants possible 2 DO flavours in our domain bit sequence (instance) metadata PID DOaccess via metadata access via PID immediate access ? bit sequence (instance) metadata PID MDOaccess via metadata access via PID search/browse access

- grouping of related data - large variety of reasons - versions of a DO - presentations of a DO - same interview/experim. - many others - DO part of many collections collections in our domain (similar to MPEG21 containers, items, sub-items) bit sequence metadata (collection) - category 1 - category category N - PID1 - PID PID K PID collection - assoc info PID1 - assoc info PID2 - assoc info metadata - category 1 - category category N - PID category 1 - assoc info category 2 - assoc info ISOcat Registry (ISO 12620, compl. ISO 11179) PID Registry

EUDAT - common services two major tracks: understanding data organization & practices in communities provide first common services after 12 months

PID Use V1 in EUDAT Federation domain X repository X DO1 PIDx URL URLy URLz CKSM Rights.... domain Y repository Y DO1 domain Z repository Z DO1 prefx

PID Use V2 in EUDAT Federation domain X repository X DO1 PIDx URL RoR HDL CKSM Rights.... domain Y repository Y DO1 PIDy URL RoR HDL CKSM Rights.... domain Z repository Z DO1 PIDz URL RoR CKSM Rights.... prefx prefz prefy

EPIC (European PID Consortium: CSC, SARA, GWDG, more) large data centers with national/organizational (MPS) support applying redundancy schemes (persistence, availability) reliability, robustness, performance (registration, resolution) all the same API (agreement on information associated) thus PID syntax not crucial but storing /finding information feasible business model for science security of administration DB for system persistent and balanced governance for HS need a worldwide registry of agreed information types to feed our „stupid“ machines EUDAT relying on EPIC + Handles

Information types in discussion multiple links to resources checksum link to metadata citation metadata RoR statement mutability flag persistency statement pointers to presentation versions provenance statement collection statement pointer to rights (support for parts/fragments) (actionable PIDs) - need agreements - need standard APIs for EUDAT this is crucial