Towards a Persistent Identifier Infrastructure for European e-Research Daan Broeder CLARIN / MPG 2008 CNRI Handle System Workshop.


Similar presentations
The Corporation for National Research Initiatives The Handle System Persistent, Secure, Reliable Identifier Resolution.

2008 Handle System Workshop Introduction Handle Update 17 June 2008 Larry Lannom Corporation for National Research Initiatives
Linking research & learning technologies through standards June Handle Workshop Towards a National Persistent Identifier Infrastructure Handle.
The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Contextual Linking Architecture Christophe Blanchi June Corporation for National Research Initiatives Approved for.
Corporation For National Research Initiatives DOIs and the Handle System 5 August 1998 Larry Lannom CNRI.
THE DONOR PROJECT Titia van der Werf-Davelaar. Project Financed by: Innovation of Scientific Information Provision (IWI) Duration: –phase 1: 1 may 1998.
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
CLARIN AAI, Web Services Security Requirements
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
A Unified Approach to Combat Counterfeiting: Use of the Digital Object Architecture and ITU-T Recommendation X.1255 Robert E. Kahn President & CEO CNRI,
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities Steven Krauwer Utrecht institute of Linguistics.
Steven KrauwerLREC20081 CLARIN: Common Language Resources and Technology Infrastructure for the Humanities and Social Sciences Kimmo Koskenniemi (University.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
DataCite: Making Data Citable Jan Brase (DataCite/TIB Hannover) Brigitte Hausstein (GESIS) Wolfgang Zenk-Möltgen (GESIS)
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Persistent Identifiers Reinhard.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
CLARIN Common Language Resources and Technology Infrastructure Daan Broeder & Dieter van Uytvanck Max-Planck Institute for Psycholinguistics TF-EMC2 Meeting,
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands PIDs in Data Infrastructures Peter Wittenburg CLARIN Research.
REPLIX Max Planck Institute for Psycholinguistics, TLA.
Citing Archived Objects Daan Broeder MPI for Psycholinguistics DELAMAN meeting London 2006.
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
Sept 19,  Provides a common set of terminology and definitions  A framework for describing resources and processes  Enables computer based interoperability.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
MTA SZTAKI Department of Distributed Systems The problems of persistent identifiers in the context of the National Digital Data Archives of Hungary András.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
Accessing a national digital library: an architecture for the UK DNER Andy Powell ELAG 2001, Prague 7 June 2001 UKOLN, University of Bath
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Exploring and Enriching a LR Archive via the Web Marc Kemps-Snijders, Alex Klassmann, Claus Zinn, Peter Berck, Albert Russel, Peter Wittenburg MPI for.
Jamie Hall (ILL). SciencePAD Persistent Identifiers Workshop PANData Software Catalogue January 30th 2013 Jamie Hall Developer IT Services, Institut Laue-Langevin.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
CLARIN Federated Identity Vision
Accessing a national digital library: an architecture for the UK DNER
GÉANT International Networking and Collaboration
Outline Pursue Interoperability: Digital Libraries
Persistent identifiers in VI-SEEM
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Mission DataCite was founded in 2009 as an international organization which aims to: establish easier access to research data increase acceptance of research.
Malte Dreyer – Matthias Razum
Presentation transcript:

Towards a Persistent Identifier Infrastructure for European e-Research Daan Broeder CLARIN / MPG 2008 CNRI Handle System Workshop

Content Domain & Scope Organizational embedding Further requirements Services for e-research with PIDs 2008 CNRI Handle System Workshop

Domain & Scope Reliable references & citations of web accessible resources Language resource domain –Audio & video recordings, pictures, primary texts, annotations –Lexica, grammar descriptions, … –Concepts in terminology registries and ontology's –… Number of resources very big, dependent on how you approach the granularity issue References and citations –embedded in (web) documents –In data structures –In DBs –… 2008 CNRI Handle System Workshop

CLARIN Common Language Resources and Technology Infrastructure The CLARIN project is a large-scale pan- European collaborative effort to create, coordinate and make language resources and technology available and readily useable. As one of its goals CLARIN will create a federation of LR repositories and aims to create a unified resource registry using persistent identifiers CNRI Handle System Workshop

CLARIN Common Language Resources and Technology Infrastructure Preparatory phase (Construction phase ) European dimension (ICT FP7) –112 members from 35 countries, –Prep. Phase Funded with 4.2 ME National dimension: –Funding until now 6.5 ME, more to come –… 2008 CNRI Handle System Workshop

DAM-LR Distributed Access Management for Language Resources (Small 4 partners) European Project aimed at federation building in LR repository domain, Unified metadata catalogue Identity federation using Shibboleth Single resource identifier system for all published resources using the Handle System 2008 CNRI Handle System Workshop

Developed special tools Mover –Updates Handle DB + catalogue –Updates metadata XML files* Restore operations –Recreate the Handle DB (and others) from scratch Lessons learned –Fed. Tech not for all organizations Lund archive R MPI archive R primary 1839 sec primary INL archive R primary R R R R R sec sec DAM-LR HS infrastructure

User benefits

MPG Max-Planck Society Proposal within the MPG to support a MPG wide PID registration service based on the HS. Run by MPG computing center GWDG Will also give support for non-MPG German scientific organizations and (hopefully) CLARIN CNRI Handle System Workshop

Requirements (Political) Independence: European GHR mirror & proxy + no single point of failure Wide(r) acceptance of PID scheme Support for object part addressing, from ISO TC37/SC4 CITER work. Support for (secure) management of resource copies 2008 CNRI Handle System Workshop

proxy MPI archive Class A R primary 1839 primary Archive Class C R R R R CLARIN PID Infrastructure sec. … sec. … 1839/R1 GHR mirror 1111/R5 sec PID registration service

PID Scheme Difficult to gain acceptance –Without PID syntax being official –W3C seems to have problems with anything else but HTTP (see recent XRI events) Can the HS user community help? Possibly only acceptance via urlified handles: Perhaps follow ARK for elegance: – CNRI Handle System Workshop

A y x z Wasteful to issue a pid for each part (think of 100k entries in a lexicon). So use part identifiers. Resolver can make an adequate translation A#z -> objectA?part=z This requires enough flexibility from the resolver to accommodate the object server. The syntax of Z should be standard for the specific data type. Loan from existing fragment identifier syntax standards. 1839/A 1839/x 1839/y 1839/z 1839/A: /A#x, 1839/A#y, 1839/A#z pid resolver object server 1839/A#z /A A y x z z 2008 CNRI Handle System Workshop PIDs & Resource Parts

Lund archive R MPI archive copy 10050/R -> primary 1839 primary R What if MPI moves the resource copy? MPI should have wrt access to the Lund Handle record This would enable changing the Lund URL record too! -> move LHS Access monitor MPI Manager R 2008 CNRI Handle System Workshop Resource duplicates

Lund archive R MPI archive R copy 10050/R -> primary 1839 primary R indirect handles* TYPE = URL –IE-Plugin: ok. –HS proxy: not-ok TYPE = HS_ALIAS (problem*) –IE-Plugin: ok. –HS-Proxy ok Status of 1839/Rcpy handle? –Use in documents? -> hdl:1839/Rcpy 1839/Rcpy -> MPI Manager move Resource duplicates 2008 CNRI Handle System Workshop

Possible Added PID Services Establishing resource authenticity Resource Collection Registration Resource Citation Information Lost Resource Detective … 2008 CNRI Handle System Workshop

Collection Registration Service Much scientific works depends on seemingly accidental distributed collections of material that has no independent embodiment. Needs to be citable with one single PID –encode the collections resource uris directly in a handle record –attach a link to a map of the collections uris Compare recent Aggregation Map concept from ORE 2008 CNRI Handle System Workshop

Citation Information Service (Collections of) resources need to be cited in documents. Acknowledgement & credit also important for primary scientific data E.g. Dutch Spoken Corpus, © Institute for Dutch Lexicography, …. Make this citation information part of the with the PID associated metadata CNRI Handle System Workshop

Establishing Provenance If by accident the handle URI mapping was not properly maintained, special metadata could be available from the handle record to establish its location or find a copy. –URI history, Repository, Depositor, … Labor intensive Only for limited number of resources unless there is a pattern 2008 CNRI Handle System Workshop Lost Resource Detective

2008 CNRI Handle System Workshop The End

Integration it should be an optional extension Make sure HS is not SPF IMDI/LAT SW functions also without HS Issue handles for objects Only for local resources Need special tools Mover –Updates Handle DB + catalogue –Updates IMDI XML files* Restore operations –Recreate the Handle DB (and others) from scratch MPI1001# mpi_url 1839/087-D mpi_url LHS LAT webapps sync Handle DB catalogue mover IMDI harvester CC SSSSS C DAM-LR HS infrastructure