PID centric fabric constructed piece by piece Beth Plale, Tobias Weigel Drawn in part from RDA PID Training, Garching, 2016/08/31
Objective RDA Data Fabric examines fabric composition Composing from RDA Recommendations (largely but not exclusively) PID related recommendations are particularly powerful RDA Persistent Identifier Types API Recommendation and RDA Data Type Registry Recommendation
Objectives of this Session PIT API WG stopped short of recommending minimal metadata associated with PIDs There is no universal set of PID minimal metadata that will be agreed upon by all Organize 2 to 3 small groups of RDA members, where group is bound by some shared interest, who will meet between now and Barcelona (Spr 16) and define a minimal set that works for them
What is killer app for this? Bringing universality to provenance Provenance relates Data Objects to one another through Revision: change to object through time Derivation: attribution of one data object to those data objects that influenced its creation Replication: relating identical objects to one another Metadata DO to the DO itself Part of: data objects part of same collection …. W3C PROV standard for provenance is well defined
What is killer app for this? Universality of provenance How: Provenance record pointed to as part of minimal metadata record associated with a PID Provenance definition available in DTR Gives: well defined map of relationship of one data object to others Why important?: data provenance is siloed. This approach breaks down the silos. Has potential for universality of data provenance; a goal that has been elusive since data provenance inception in 2005
Minimal provenance (as JSON-LD) "@context": { "prov": "http://www.w3.org/ns/prov#", }, "prov:wasDerivedFrom": “IDENTIFIER", "prov:revisionOf": “IDENTIFIER", "prov:primarySourceOf": “IDENTIFIER", "prov:quatationOf": “IDENTIFIER", "prov:specializationOf": “IDENTIFER", "prov:alternateOf": “IDENTIFIER", "prov:hadMember": “IDENTIFER", "prov:memberOf": “IDENTIFIER" }
Provenance definition in Data Type Registry http://pragma8.cs.indiana.edu:8080/pragmapit-ext-0.2/pitapi/generic/20.5000.347/18536afecc5e6ca6ab41
Approach: core profile and community extensions Size Fixity key Data Provenance Policy ... Size Fixity key Timestamp – Creation Timestamp – Last Mod Data Provenance Policy Owner ... Size Granularity Data Provenance Policy ... Climate sciences Material sciences Linguistics Core profile
What is a profile and how does it relate to PID records? Base assumption: There is minimal core set of information associated with each PID Minimal set should be useful not only to maintainer of Data Object, but should facilitate DO‘s discovery and use Each user community may design their own profiles. No single size fits all – but recurring elements should be reused Size Fixity Key Timestamps Data Provenance ... No single size fits all – PID profiles are registered and referenced, they may differ between communities, but some elements recur
Example profile registered in Data Type Registry Property name Target type Mandatory? DO Location URL Yes Policy Policy specification* Time stamp (last modified) Date/Time Data provenance PID Deletion flag Boolean No Deletion reason String Would Policy be an attribute you think important to your community‘s profile? Why?
Take pulse of room How many of you are data providers? How many are interested in tool building to consume data under the minimal PID-DTR model? Questions End of public portion of meeting. Remainder of meeting targeted to those who are interested in working on this topic
Parcel interested parties into groups; begin discussion within group Next steps Define criteria by which we self organize into small groups to work Sep – Mar 2016 My interest is as data provider Framework for analysis of HathiTrust 14M digitized books from university libraries (Plale, director) data consumer: PRAGMA: Pacific Rim partners in facilitating shared computing and data sharing (Plale, steering board) Parcel interested parties into groups; begin discussion within group