Approaches and Challenges in Managing Persistent Identifiers

Slides:



Advertisements
Similar presentations
Oyster, Edinburgh, May 2006 AIFB OYSTER - Sharing and Re-using Ontologies in a Peer-to-Peer Community Raul Palma 2, Peter Haase 1 1) Institute AIFB, University.
Advertisements

A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
Technology from seed Cloud-TM: A distributed transactional memory platform for the Cloud Paolo Romano INESC ID Lisbon, Portugal 1st Plenary EuroTM Meeting,
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
Tobias Weigel (DKRZ) Tobias Weigel Deutsches Klimarechenzentrum (DKRZ) Persistent Identifiers Solving a number of problems through a simplistic mechanism.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
GIG Software Integration: Area Overview TeraGrid Annual Project Review April, 2008.
Valma Technical Aspects
Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.
Climate Sciences: Use Case and Vision Summary Philip Kershaw CEDA, RAL Space, STFC.
Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality Gary Berg-Cross, Keith.
Digital Preservation: Lessons learned through national action Digital Preservation Interoperability Framework Workshop April 2010.
9-Sept-2003CAS2003, Annecy, France, WFS1 Distributed Data Management at DKRZ Distributed Data Management at DKRZ Wolfgang Sell Hartmut Fichtel Deutsches.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Internet2 Middleware Initiative. Discussion Outline  What is Middleware why is it important why is it hard  What are the major components of middleware.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
The Modeling Circle Courtesy M. Lautenschlager, DKRZ.
4 th WCRP Observations and Assimilation Panel Meeting Hamburg, Germany, March 29-31, Workshop on Ensuring Access and Trustworthiness of Climate.
Diagrams. Typically, we view the static parts of a system using one of the four following diagrams. 1. Class diagram 2. Object diagram 3. Component diagram.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal EGU, 21 April 2016 Antony Wilson, Victoria.
Our Services Outbound Call Center Services
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration.
Web and mobile access to digital repositories Mario Torrisi National Institute of Nuclear Physics – Division of
© Thomas Ludwig Prof. Dr. Thomas Ludwig German Climate Computing Center (DKRZ) University of Hamburg, Department for Computer Science (UHH/FBI) Disks,
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Design Completion A Major Milestone
RDA 9th Plenary Breakout 3, 5 April :00-17:30
2nd GEO Data Providers workshop (20-21 April 2017, Florence, Italy)
Legacy and future of the World Data System (WDS) certification of data services and networks Dr Mustapha Mokrane, Executive Director, WDS International.
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
Chapter 1: Introduction to Systems Analysis and Design
Vincenzo Spinoso EGI.eu/INFN
Data Citation Service for CMIP6 and IPCC DDC Aspects
Presented by Munezero Immaculee Joselyne PhD in Software Engineering
Introduction to Data Management in EGI
ACS 2016 Moving research forward with persistent identifiers
How do we best share and manage the data within WIS in order to fulfil the ever increasing demand for Weather and Climate Data? Submitted by: Working Group.
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
C2CAMP (A Working Title)
Connecting the European Grid Infrastructure to Research Communities
EGI – Organisation overview and outreach
Climate Data Analytics in a Big Data world
DATA SPHINX & EUDAT Collaboration
CMIP6 / ENES Data TF Meeting: DKRZ
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Brief WG/IG reporting Tobias Weigel on behalf of co-chairs
WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)
Task 5 : Supporting CCI Contributions to Obs4MIPs
Mission DataCite was founded in 2009 as an international organization which aims to: establish easier access to research data increase acceptance of research.
Open Archive Initiative
Chapter 1: Introduction to Systems Analysis and Design
Bird of Feather Session
RDA uptake activities and plans: ESGF
Digital Object Management for ENES: Challenges and Opportunities
Chapter 1: Introduction to Systems Analysis and Design
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
Interoperability and data for open science
Presentation transcript:

Approaches and Challenges in Managing Persistent Identifiers Nordic Workshop on Data Citation Policies and Practices Helsinki, 2016/11/23

Motivation and background : About DKRZ A national service provider for the climate (modeling) community DKRZ = German Climate Computing Center Non profit service company established 1987 Located in Hamburg, Germany Balanced HPC / storage system 3 PFlop Bull system 45 PByte Lustre parallel file system 335 PByte HPSS tape backend Data Services: Long term data archival World Data Center for Climate Core node in international climate data federation (ESGF, IS-ENES) Approaches and Challenges in Managing PIDs 2016/11/23

Motivation and background: CMIP6 Approaches and Challenges in Managing PIDs 2016/11/23

Motivation and background: Challenges User-driven: Wider user audience Downstream usage of climate data – new processing and analysis services Resource-driven: Same resources, but... More objects More diversity Not monolithic – graph structures Still: Keep it simple Approaches and Challenges in Managing PIDs 2016/11/23

Addressing the management challenges Support objects through their life cycle Give a name to every object Automate tasks – intelligent agents Make transitions transparent Enable users/agents to pull info to object at hand Requirement: Understand PIDs not as a guarantee for object persistency Approaches and Challenges in Managing PIDs 2016/11/23

Achieving persistency is not primarily a technical challenge! What is persistency? Persistency of the object Not bound to use of a (specific) PID Persistency of the PID Object can be gone Persistency of the PID-Object link Object+PID+link = Citability Persistency statements Persistency of essential metadata Object can be gone! Achieving persistency is not primarily a technical challenge! Approaches and Challenges in Managing PIDs 2016/11/23

Infrastructure view: Automation and abstraction Not anymore just management of files in file systems Management of digital objects through dedicated services/chains Focus on stable protocols and interfaces, modularity Hide complexity of automation machinery from users Approaches and Challenges in Managing PIDs 2016/11/23

PIDs in the middle enable automated management Object management scenarios bring new requirements to PIDs courtesy of Larry Lannom Approaches and Challenges in Managing PIDs 2016/11/23

What are components for a PID federation? Federation: Scalable, but needs to be organized well Technical expertise (common interfaces, protocols) Resources (staff, know-how, funding) Support services (help desk, training) Governance mechanisms Operational schema (processes, QA, reporting, intelligence, innovation management) Approaches and Challenges in Managing PIDs 2016/11/23

Some details into the challenges for CMIP6 Requirement: Put a Handle in every file header, but not allowed to change files after production phase tracking_id = hdl:21.14100/<UUID> Lot of time spent on agreements that ensure sanity of PID record Each object gets a PID and no object outside our control with embedded PID PID not citable – required metadata not ready Still: some file headers are extracted and put in the PID record PIDs are a new development – Handle registration not allowed to interrupt publication process Approaches and Challenges in Managing PIDs 2016/11/23

Making it scalable requires additional effort Buurman, Weigel, Juckes, Lautenschlager, Kindermann: Persistent Identifiers for CMIP6 in the Earth System Grid Federation, EGU 2016 Approaches and Challenges in Managing PIDs 2016/11/23

Approaches and Challenges in Managing PIDs The user‘s reality... Approaches and Challenges in Managing PIDs 2016/11/23

Approaches and Challenges in Managing PIDs Take-home messages Use of PIDs for data management presents new requirements, but also new benefits Automation and machine agent usage are key elements Data citation is one use case besides others, benefits from improved transparency Multiple aspects of persistency can become relevant Approaches and Challenges in Managing PIDs 2016/11/23

Thank you for your attention. Approaches and Challenges in Managing PIDs 2016/11/23