The RPID Testbed Rob Quick Manager – High Throughput Computing Research Technologies Indiana University Some slides provided by Beth Plale RDA Plenary Montreal, Canada Sept 2017
Our vision Starts with data network based on Digital Object Architecture (DOA), a distributed architecture of services spread worldwide that together identify and resolve digital objects DOA first espoused by Internet founder Robert Khan in the mid’80’s. DOA is a network of Handle servers at its core Indiana University
The Digital Object Architecture serves as base infrastructure only The Digital Object Architecture serves as base infrastructure only. DOA is silent on issues of modeling data objects themselves: their content, their relationship to their own metadata, and relationship between data objects For object modeling we turn to FAIR principles and PID Kernel Information
(e.g., PID to Profile, URL to target) Handle resolution in a Digital Object Architecture Client Handle System Q: prefix authority PIT API SDK Scale: [80…100] GHS Global Handle Servers Local Handle Service IP Q: local handle Scale: [1000…5000] LHS Local Handle Service Handle information Stores PID kernel information (e.g., PID to Profile, URL to target) Q: DTR with Profile PID Data Type Registry Service Scale: [1..10] Filter-ed PIDS DTR Profile Definition Stores type definitions for kernel information Trusted PIDs
What should go into the PID Kernel Information What should go into the PID Kernel Information? PID Kernel Information is a small amount of information stored at resolver (Local Handle Server) in PID record of a PID Inspiration: take FAIR principles as guide: how far can PID Kernel Information aid in implementing FAIR?
Further imagine an Internet-scale data client that is handed a list of a 100,000,000 PIDs. How does client quickly sift through list to find research data objects? Further suppose client is able to winnow list down to just research data objects, how does it then quickly discard fakes?
Global Handle Registry PID Kernel Information Use case: Client filters list of millions of PIDs to identify research data and makes simple determination of trust Client Handle System Q: prefix authority Global Handle Registry Local Handle Service IP Q: local handle Stores PID kernel information Local Handle Service [1000…5000] Handle information Q: DTR with Profile PID Data Type Registry Service Filter-ed PIDS DTR Profile Definition Stores type definitions for kernel information Trusted research PIDs
Client working with PID Kernel Information looks at each PID in list, accepts those that have: -- Kernel Information profile stored in Data Type Registry (DTR), -- That profile is associated with RDA (in some unspecified manner) -- PID Kernel Information holds tiny amount of data provenance from which basic sense of trust is derived
PID Kernel Information Summary Exploration driven by identifying and evaluating minimal information that can go into Kernel Information that can help make Data Objects FAIR and less dependent on the repository system to enforce FAIRness? Long term goal: Smart data objects Kernel information has potential to spawn new ecosystem of data services for smart data objects
RPID testbed Suite of software services for use by community Data type registry (RDA) PIT API (RDA) Handle service Exploratory services PID Kernel Information Mapping CTS URNs to handles Packaging for use by others Help and advice User advisory group Indiana University
The RPID testbed is open for research, education, non-profit, or pre-competitive use. Ideas are being put into action through a US NSF funded project called Robust PID (RPID) Testbed Project partners include Beth Plale, Rob Quick, Robert McDonald, Yu Lao Indiana University Bridget Almas, Tufts University Larry Lannom, CNRI The opinions expressed here are those of author alone and do not represent the views of the US National Science Foundation