Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agenda (AM) 9:30-10:15 Introduction to RDA

Similar presentations


Presentation on theme: "Agenda (AM) 9:30-10:15 Introduction to RDA"— Presentation transcript:

1 Agenda (AM) 9:30-10:15 Introduction to RDA
10:15-10:30 Participant Feedback Exercise 10:30-10:50 Coffee Break 10:50-11:50 RDA Output Deep Dive 11:50-1:00 Participant Introductions 1:00-2:30 Lunch

2 Agenda (PM) 2:30-3:15 Use Case Walk Through - Applying an output
3:15-4:00 Discussion What are barriers to participation and adoption? What role will/should RDA play in the future of the humanities?

3 Adopting RDA Outputs in the Humanities
Bridget Almas RDA/ADHO Workshop DH2016 Krakow, July 12, 2016 @BridgetAlmas

4 All of us are... Creating data of various types and uses
Text, Images, XML, RDF, JSON, JSON-LD, Tabular, ... Manuscripts, Bibliographic, Prosopographic, Geographic, ... Assigning stable (if not “persistent”) identifiers to our data URNs, URLs, DOIs/Handles, ARKs,... Reusing data from other sources @BridgetAlmas

5 Most or many of us... Want others to be able to reuse our data
Want our data to be machine-actionable Have copyright/access requirements for our data @BridgetAlmas

6 Very few of us... Publish formal, machine-actionable descriptions of our data Use automated systems for assignment of identifiers to data Have standard practices which we apply across projects Use identifiers for our data which guarantee persistence, advertise what they are capable of, and can be reliably resolved by anyone anywhere @BridgetAlmas

7 How do we make data sharing a reality?
A first step is to publish data with stable identifiers, but it isn’t enough. We need to know What sort of data does it identify? How can we get it in a format that we can process? What is its provenance and how to cite it? Are there newer/older versions of it? Is it a part of a collection? … ? Right now there is no consistent way to get answers to these questions across different projects, providers, domains We’re all figuring it out and building ad-hoc solutions that work in some cases for some data and not others @BridgetAlmas

8 RDA Data Fabric + Tools, Services, Manual processes,...
Diagram Source: Peter Wittenberg @BridgetAlmas

9 RDA DTR, PIT and Collections
Data Types Registry: provides a recommendation for formalizing, registering and communicating definitions of machine-actionable data types PID Types: provides a recommendation for standard approach to coupling metadata with persistent identifiers to enable services that support discovery, access, verification of integrity and authenticity and a variety of other use cases. Collections WG: will provide recommendations for common collection models and an multidisciplinary API for building, sharing and expanding collections of data objects @BridgetAlmas

10 RDA Output: Data Types Registry
Defines “Data Types” as characterizations of data at any level of granularity which are identified, defined and registered Proposes a Data Model and JSON Schema Defines an API for Creating, Reading, Updating, Deleting and Querying Data Type Records Defines Requirements for Registry Implementation and Federation @BridgetAlmas

11 RDA Output: DTR Proposed Data Model
Identifier Type Name Human Readable Description Provenance (including contributors/source, creation date,modification date) Related Standards and Recommendations Expected Uses Representations and Semantics Properties Specific to this Type Relationships to Other Types @BridgetAlmas

12

13 RDA Output: Data Type Registries

14 RDA Output: PID Types Provides: a conceptual model for a PID record
An API for Creating, Reading and Querying PID records Can work on top of existing PID systems in a brokering model, and/or be provided directly by the PID system Depends upon the Data Types Registry @BridgetAlmas

15 RDA Output: PID Types Source: dx.doi.org/ /FDAA09D5-5ED0-403D-B97A-2675E1EBE786 @BridgetAlmas

16 PID Record Consists of a number of properties
Each property itself has a value and bears a PID, pointing to a property definition with a name and range A PID record type is a specific aggregation of properties, mandatory and optional A PID record profile is a specific aggregation of types, mandatory and optional All properties, types, and profiles have PIDs and are registered in the Data Types Registry Source: dx.doi.org/ /FDAA09D5-5ED0-403D-B97A-2675E1EBE786 @BridgetAlmas

17 PID Record properties for a CTS URN Type?
urn:cts:greekLit:tlg0012.tlg001.perseus-grc2 Property ID (Property Name) Property Value /31810b2c bb5e0d4d949de9f7 License CC-BY-SA /467d9ba30e2d9879fd9d483f319e462c Predecessor identifier urn:cts:greekLit:tlg0012.tlg001.perseus-grc1 /5546b d9ae869f081f5548f3fc Repository of Record …. CTS API Endpoint ...

18 PID Record properties for a LOD/URL Record Type?
Property ID (Property Name) Property Value /31810b2c bb5e0d4d949de9f7 License CC-BY ... Available Formats JSON,CSV,HTML,RDF,KML …. Format Specifier HTTP Header Accepts @BridgetAlmas

19 RDA WIP: Collections WG
Formalization of Collections Models API for Create/Read/Update/List/Query operations on Collections Use cases include virtual, local and mixed collections, collections with open and access protected data, heterogeneous and homogenous data types,... Operations will include basic CRUD/L, but also query and set operations Builds upon the PID Types and DTR components Collections will be identified by Data Types and have typed Capabilities Must be implementable by existing collection solutions @BridgetAlmas

20 RDA WIP: Collections WG (Modeling proposal)
Diagram Source: Tobias Weigel, DKRZ

21 Simple (Re)Use Case A service wants to analyze data referenced in scholarly publications by PID, such as a text passage referenced by CTS URN and a place identified by Gazetteer URL. A PID Types broker service provides the PIT API. CTS text and Gazetteer data providers register their URNs and URLs with the PID Types broker service (via HTTP calls to the PTI API). The analysis service can query the PIT broker to find out if the PIDs in a publication are registered, retrieve properties that tell it where to resolve the URN to the text, the formats available for the Gazetteer URL and how to specify them to get data it can use. The underlying data is then available for reuse by the service. @BridgetAlmas

22 More Complex Data Management Use Case
@BridgetAlmas

23 Our data types @BridgetAlmas
Text (Structured, unstructured, digitized books) Persistent identifiers Bibliographic Geographic/Map Tiles Prosopographic Ethnographic/Fieldwork (Traditional and virtual) Museum data Images Historical attributes, relationships Text alignments Treebanks …. @BridgetAlmas

24 Our unmet infrastructure needs
Institutional service for assigning persistent, nationally or internationally recognized identifiers for our digital publications and datasets Data curation systems (that are) integrated with the active research phase Authentication services Tools for converting data outputs from different sources and formats Data visualization services Data mining tools and services Data storage services Pre-made secure endpoints for managing ontological models Narratives about how to choose an appropriate tool and how to get started with research data Storage for datasets during the course of my research, as opposed to finalised datasets. Services for hosting URI-based gazetteers of specific regions, periods, etc Registry for hosting data about collections ... @BridgetAlmas

25 What’s next? Do the RDA outputs provide value and a means to begin addressing some of our unmet needs? If not, why? If so, what do we need to do start taking advantage of them? Identify our core Data Types (primitives and derived types) Identify our core PID record types/profiles Evaluate their use with a test DTR and PIT API Begin the work of implementing @BridgetAlmas


Download ppt "Agenda (AM) 9:30-10:15 Introduction to RDA"

Similar presentations


Ads by Google