Presentation is loading. Please wait.

Presentation is loading. Please wait.

Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Similar presentations


Presentation on theme: "Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara."— Presentation transcript:

1 Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara

2 Persistent Identifiers (PIDs) Pointers to data resources Digital Resources: Data, metadata, documents Real world objects: Species, patient, cell line Globally unique Exist infinitely long Used to identify and retrieve resources Examples: ISBNs, BSNs, DOIs, EPIC PIDS, URIs

3 Digital Object (DO) Data PID Metadata Synchronise PID, Data and Metadata during creation, maintenance and deletion of a digital object!

4 PIDs are static World of data infrastructure (hardware) Data 2 Data 1 Data 4 Data 3 PID 1PID 2PID 3PID 4

5 Workflow1: Change storage environment PID1 PID2 Storage site AStorage site B

6 Use Case 1: Digital repositories PIDs point to landing page of the digital repository showing metadata “Real” data can be downloaded from this page with another link E.g. B2SHARE, 3.TU Datacentrum & DANS repositories PID http://hdl.handle.net/11304/3265434c-4b34-11e4-81ac-dcbd1b51435ehttp://hdl.handle.net/11304/3265434c-4b34-11e4-81ac-dcbd1b51435e resolves to https://b2share.eudat.eu/record/139

7 Use Case 2: Enabling data flows PIDs point to data directly If needed create another field specifying the data type to choose application Use data in workflow via PID, NOT via actual location!

8 Resolving PIDs Global Registry E.g. Handle system Global Registry E.g. Handle system Client gets request to resolve hdl:123/456 1. Client sends request to Global to resolve 0.NA/123 (prefix handle for 123/456) hdl:123/456 2. Global Responds with Service Information for 123 #1 #2 #3 Secondary Site A, e.g. SURFsara Secondary Site B Local Service #1#2 Primary Site 4. Server responds with handle data Service Information Local Handle Service IP xc.. xc.. xc..... xcccxv xccx xcccxv xccx xcccxv xccx

9 Example: Relationships between DOs PID: prefix1/suffix1 Metadata: key1: … key2: prefix2/suffix2 key3: prefix3/suffix3 PID: prefix2/suffix2 Metadata: key1: … key2: prefix1/suffix1 PID: prefix3/suffix3 Metadata: key1: … key2: prefix1/suffix1 Part of/has part relationships Model cohort-patient relationship Model patient-samples relationship

10 Guidelines: Characteristics of PIDs What should be identifiable by a PID? Define what is data and what is metadata Granularity of PIDs: How much information should a PID contain? Location Checksums Other system specific information Do not put contents information of the data here! Don’t mix PIDs with other IDs, e.g. database IDs Opacity: No assumptions about data context in PID

11 Guidelines: Referable data How persistent is the data? What and how much in a DO may change? When should a new DO be instantiated? Versioning via PIDs? Define PID management processes: 1.Connecting Data, Metadata and PID 2.Handling changes in data and metadata 3.Handling changes in storage environment 4.Deleting data, metadata, or PIDs Which problem should be addressed with PIDs?

12 The handle system Offers a resolution service for PIDs Gives a lot of freedom for implementation, e.g. PID information types Software architecture designed for high availability and scalability Basis for several PID providers Costs: 50$ for registering a prefix with handle + 50$/year maintenance EPIC PIDs and DOIs built their service upon the handle system. Thus, a PID is a handle

13 PID systems DOIs Data registry service Library specific metadata standard incorporated in PID entry (Author info, Dublin core, …)  ensuring interoperability between registered data objects Costs: 0.06$-1$ per PID, depending on service (CrossRef) + annual fee EPIC PIDs Data registry service Create own metadata for PIDs for data interoperability Only costs for the handle service With one prefix one can create as many PIDs as wanted

14 Example: Python epicclient …

15 B2SAFE: iRODS and PIDs @ KNMI NFS mount iRODS dCache iRODS PID HPSS DMF OS: /data/orfeus/data/continuous/... iRODS: /ORFEUS/eudat/data/continuous/… iRODS: /vzSARA1/eudat/knmi/… KNMI NFS share Seismic system

16 Dataflow KNMI  SURFsara The B2SAFE is implemented as a 2 step process: 1.Register a file in irods ireg a file in iRODS @ KNMI create a handle/PID @ KNMI 2.Replicate a file in irods to an other node Replicate the registered file to SURFsara Create a handle/PID @ SURFsara Update the handle/PID @ KNMI

17 Example handle Domain / prefix / unique identifier Handle/PID @ KNMI: http://hdl.handle.net/11230/7bc49fd6-2836-11e4-955a- d89d6771dd88?noredirect Handle/PID @ SURFsara: http://hdl.handle.net/11112/387ed2e4-5371-11e4-92a8- a0369f0b5f26?noredirect

18 Installation EPIC client, e.g. python or perl client Handle server and an EPIC API server iRODS and B2SAFE for ingesting data (optional) SURFsara provides Handle server EPIC API

19 How to obtain a handle prefix The production prefix has to be purchased from CNRI. Costs 50$/year plus once 50$ for request More information on how to obtain a handle prefix: http://handle.net/service_agreement.html More information on how to make use of SURFsara’s PID service: http://eudat.eu/User+Documentation+-+PIDs+in+EUDAT.html


Download ppt "Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara."

Similar presentations


Ads by Google