Approaches and Challenges in Managing Persistent Identifiers Nordic Workshop on Data Citation Policies and Practices Helsinki, 2016/11/23
Motivation and background : About DKRZ A national service provider for the climate (modeling) community DKRZ = German Climate Computing Center Non profit service company established 1987 Located in Hamburg, Germany Balanced HPC / storage system 3 PFlop Bull system 45 PByte Lustre parallel file system 335 PByte HPSS tape backend Data Services: Long term data archival World Data Center for Climate Core node in international climate data federation (ESGF, IS-ENES) Approaches and Challenges in Managing PIDs 2016/11/23
Motivation and background: CMIP6 Approaches and Challenges in Managing PIDs 2016/11/23
Motivation and background: Challenges User-driven: Wider user audience Downstream usage of climate data – new processing and analysis services Resource-driven: Same resources, but... More objects More diversity Not monolithic – graph structures Still: Keep it simple Approaches and Challenges in Managing PIDs 2016/11/23
Addressing the management challenges Support objects through their life cycle Give a name to every object Automate tasks – intelligent agents Make transitions transparent Enable users/agents to pull info to object at hand Requirement: Understand PIDs not as a guarantee for object persistency Approaches and Challenges in Managing PIDs 2016/11/23
Achieving persistency is not primarily a technical challenge! What is persistency? Persistency of the object Not bound to use of a (specific) PID Persistency of the PID Object can be gone Persistency of the PID-Object link Object+PID+link = Citability Persistency statements Persistency of essential metadata Object can be gone! Achieving persistency is not primarily a technical challenge! Approaches and Challenges in Managing PIDs 2016/11/23
Infrastructure view: Automation and abstraction Not anymore just management of files in file systems Management of digital objects through dedicated services/chains Focus on stable protocols and interfaces, modularity Hide complexity of automation machinery from users Approaches and Challenges in Managing PIDs 2016/11/23
PIDs in the middle enable automated management Object management scenarios bring new requirements to PIDs courtesy of Larry Lannom Approaches and Challenges in Managing PIDs 2016/11/23
What are components for a PID federation? Federation: Scalable, but needs to be organized well Technical expertise (common interfaces, protocols) Resources (staff, know-how, funding) Support services (help desk, training) Governance mechanisms Operational schema (processes, QA, reporting, intelligence, innovation management) Approaches and Challenges in Managing PIDs 2016/11/23
Some details into the challenges for CMIP6 Requirement: Put a Handle in every file header, but not allowed to change files after production phase tracking_id = hdl:21.14100/<UUID> Lot of time spent on agreements that ensure sanity of PID record Each object gets a PID and no object outside our control with embedded PID PID not citable – required metadata not ready Still: some file headers are extracted and put in the PID record PIDs are a new development – Handle registration not allowed to interrupt publication process Approaches and Challenges in Managing PIDs 2016/11/23
Making it scalable requires additional effort Buurman, Weigel, Juckes, Lautenschlager, Kindermann: Persistent Identifiers for CMIP6 in the Earth System Grid Federation, EGU 2016 Approaches and Challenges in Managing PIDs 2016/11/23
Approaches and Challenges in Managing PIDs The user‘s reality... Approaches and Challenges in Managing PIDs 2016/11/23
Approaches and Challenges in Managing PIDs Take-home messages Use of PIDs for data management presents new requirements, but also new benefits Automation and machine agent usage are key elements Data citation is one use case besides others, benefits from improved transparency Multiple aspects of persistency can become relevant Approaches and Challenges in Managing PIDs 2016/11/23
Thank you for your attention. Approaches and Challenges in Managing PIDs 2016/11/23