Agenda (AM) 9:30-10:15 Introduction to RDA 10:15-10:30 Participant Feedback Exercise 10:30-10:50 Coffee Break 10:50-11:50 RDA Output Deep Dive 11:50-1:00 Participant Introductions 1:00-2:30 Lunch
Agenda (PM) 2:30-3:15 Use Case Walk Through - Applying an output 3:15-4:00 Discussion What are barriers to participation and adoption? What role will/should RDA play in the future of the humanities?
Adopting RDA Outputs in the Humanities Bridget Almas RDA/ADHO Workshop DH2016 Krakow, July 12, 2016 @BridgetAlmas
All of us are... Creating data of various types and uses Text, Images, XML, RDF, JSON, JSON-LD, Tabular, ... Manuscripts, Bibliographic, Prosopographic, Geographic, ... Assigning stable (if not “persistent”) identifiers to our data URNs, URLs, DOIs/Handles, ARKs,... Reusing data from other sources @BridgetAlmas
Most or many of us... Want others to be able to reuse our data Want our data to be machine-actionable Have copyright/access requirements for our data @BridgetAlmas
Very few of us... Publish formal, machine-actionable descriptions of our data Use automated systems for assignment of identifiers to data Have standard practices which we apply across projects Use identifiers for our data which guarantee persistence, advertise what they are capable of, and can be reliably resolved by anyone anywhere @BridgetAlmas
How do we make data sharing a reality? A first step is to publish data with stable identifiers, but it isn’t enough. We need to know What sort of data does it identify? How can we get it in a format that we can process? What is its provenance and how to cite it? Are there newer/older versions of it? Is it a part of a collection? … ? Right now there is no consistent way to get answers to these questions across different projects, providers, domains We’re all figuring it out and building ad-hoc solutions that work in some cases for some data and not others @BridgetAlmas
RDA Data Fabric + Tools, Services, Manual processes,... Diagram Source: Peter Wittenberg @BridgetAlmas
RDA DTR, PIT and Collections Data Types Registry: provides a recommendation for formalizing, registering and communicating definitions of machine-actionable data types PID Types: provides a recommendation for standard approach to coupling metadata with persistent identifiers to enable services that support discovery, access, verification of integrity and authenticity and a variety of other use cases. Collections WG: will provide recommendations for common collection models and an multidisciplinary API for building, sharing and expanding collections of data objects @BridgetAlmas
RDA Output: Data Types Registry Defines “Data Types” as characterizations of data at any level of granularity which are identified, defined and registered Proposes a Data Model and JSON Schema Defines an API for Creating, Reading, Updating, Deleting and Querying Data Type Records Defines Requirements for Registry Implementation and Federation @BridgetAlmas
RDA Output: DTR Proposed Data Model Identifier Type Name Human Readable Description Provenance (including contributors/source, creation date,modification date) Related Standards and Recommendations Expected Uses Representations and Semantics Properties Specific to this Type Relationships to Other Types @BridgetAlmas
RDA Output: Data Type Registries
RDA Output: PID Types Provides: a conceptual model for a PID record An API for Creating, Reading and Querying PID records Can work on top of existing PID systems in a brokering model, and/or be provided directly by the PID system Depends upon the Data Types Registry @BridgetAlmas
RDA Output: PID Types Source: dx.doi.org/10.15497/FDAA09D5-5ED0-403D-B97A-2675E1EBE786 @BridgetAlmas
PID Record Consists of a number of properties Each property itself has a value and bears a PID, pointing to a property definition with a name and range A PID record type is a specific aggregation of properties, mandatory and optional A PID record profile is a specific aggregation of types, mandatory and optional All properties, types, and profiles have PIDs and are registered in the Data Types Registry Source: dx.doi.org/10.15497/FDAA09D5-5ED0-403D-B97A-2675E1EBE786 @BridgetAlmas
PID Record properties for a CTS URN Type? urn:cts:greekLit:tlg0012.tlg001.perseus-grc2 Property ID (Property Name) Property Value 11314.2/31810b2c24913929bb5e0d4d949de9f7 License CC-BY-SA 11314.2/467d9ba30e2d9879fd9d483f319e462c Predecessor identifier urn:cts:greekLit:tlg0012.tlg001.perseus-grc1 11314.2/5546b0166091d9ae869f081f5548f3fc Repository of Record http://data.perseus.org …. CTS API Endpoint http://cts.perseids.org/api ...
PID Record properties for a LOD/URL Record Type? https://pleiades.stoa.org/places/530809 Property ID (Property Name) Property Value 11314.2/31810b2c24913929bb5e0d4d949de9f7 License CC-BY ... Available Formats JSON,CSV,HTML,RDF,KML …. Format Specifier HTTP Header Accepts @BridgetAlmas
RDA WIP: Collections WG Formalization of Collections Models API for Create/Read/Update/List/Query operations on Collections Use cases include virtual, local and mixed collections, collections with open and access protected data, heterogeneous and homogenous data types,... Operations will include basic CRUD/L, but also query and set operations Builds upon the PID Types and DTR components Collections will be identified by Data Types and have typed Capabilities Must be implementable by existing collection solutions @BridgetAlmas
RDA WIP: Collections WG (Modeling proposal) Diagram Source: Tobias Weigel, DKRZ
Simple (Re)Use Case A service wants to analyze data referenced in scholarly publications by PID, such as a text passage referenced by CTS URN and a place identified by Gazetteer URL. A PID Types broker service provides the PIT API. CTS text and Gazetteer data providers register their URNs and URLs with the PID Types broker service (via HTTP calls to the PTI API). The analysis service can query the PIT broker to find out if the PIDs in a publication are registered, retrieve properties that tell it where to resolve the URN to the text, the formats available for the Gazetteer URL and how to specify them to get data it can use. The underlying data is then available for reuse by the service. @BridgetAlmas
More Complex Data Management Use Case @BridgetAlmas
Our data types @BridgetAlmas Text (Structured, unstructured, digitized books) Persistent identifiers Bibliographic Geographic/Map Tiles Prosopographic Ethnographic/Fieldwork (Traditional and virtual) Museum data Images Historical attributes, relationships Text alignments Treebanks …. @BridgetAlmas
Our unmet infrastructure needs Institutional service for assigning persistent, nationally or internationally recognized identifiers for our digital publications and datasets Data curation systems (that are) integrated with the active research phase Authentication services Tools for converting data outputs from different sources and formats Data visualization services Data mining tools and services Data storage services Pre-made secure endpoints for managing ontological models Narratives about how to choose an appropriate tool and how to get started with research data Storage for datasets during the course of my research, as opposed to finalised datasets. Services for hosting URI-based gazetteers of specific regions, periods, etc Registry for hosting data about collections ... @BridgetAlmas
What’s next? Do the RDA outputs provide value and a means to begin addressing some of our unmet needs? If not, why? If so, what do we need to do start taking advantage of them? Identify our core Data Types (primitives and derived types) Identify our core PID record types/profiles Evaluate their use with a test DTR and PIT API Begin the work of implementing @BridgetAlmas