Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using RMap to Describe Distributed Works as Linked Data Graphs: Outcomes and Preservation Implications iPres. Bern, Switzerland, October 5, 2016 Karen.

Similar presentations


Presentation on theme: "Using RMap to Describe Distributed Works as Linked Data Graphs: Outcomes and Preservation Implications iPres. Bern, Switzerland, October 5, 2016 Karen."— Presentation transcript:

1 Using RMap to Describe Distributed Works as Linked Data Graphs: Outcomes and Preservation Implications iPres. Bern, Switzerland, October 5, 2016 Karen Hanson, Data Conservancy

2 View of a scholarly work
AUTHORS by cites ARTICLE Concept of a scholarly work has shifted. Used to be mostly focused on articles. Citation lists would point to other articles. ARTICLES

3 View of a scholarly work
output of CODE DATASET from by used AUTHORS by cites BIG DATASET DEVELOPER ARTICLE cites by Now more and more articles citing webpages, which are much more fragile. Also, with pressure from funders and publishers also need to provide more evidence for reproducibility. So the map is starting to get complex. These components are published in different places at different times. Note that this map is oriented towards a scientific research example, but it would be easy to imagine the distributed components for some of the other examples from this conference such as other works that can be expressed with many components… such as interactive art pieces, computer games, etc. described in other sessions ARTICLES WEBPAGES ORG

4 Scholarly works continually evolve
Adding to the complexity is the ideas laid out in this OCLC report. Digital objects are produced throughout the research process, and this does not end at the time of publication. A static linear citation list at the end of a published article is a constraint considering the fact that a scholarly work is distributed in time and space. Lavoie, Brian, Eric Childress, Ricky Erway, Ixchel Faniel, Constance Malpas, Jennifer Schaffner, and Titia van der Werf The Evolving Scholarly Record. Dublin, Ohio: OCLC Research. Lavoie, Brian et al The Evolving Scholarly Record. Dublin, Ohio: OCLC Research.

5 How can we capture this evolving context?

6 Maps of resources output of from by used by cites cites by CODE
CODE DATASET from by used AUTHORS by Robert Smith cites BIG DATASET DEVELOPER ARTICLE cites by You can think of this diagram as a map of resources. Different kinds of identifiers, distributed in different kinds of repository. ARTICLES WEBPAGES ORG

7 (Distributed Scholarly Compound Object)
21st Century DiSCO Encode as RDF (linked data) Something that’s good at describing maps of resources – the resource description framework. What RMap does is allow you to store this map as an RDF graph. We call these graphs “DiSCOs” short for Distributed Scholarly Compound Object. RMap DiSCO (Distributed Scholarly Compound Object)

8 RMap in a nutshell A REST API service for managing and retrieving the maps of relationships amongst distributed scholarly works. Just to summarize…

9 As RDF the graph looks like this
As RDF the graph looks like this. If you submit this to RMap it assigns and identifier so that you can use that to retrieve it later. This is difficult for the human eye to parse, so…

10 Here is the pretty version
Here is the pretty version. This is our web gui representation of a DiSCO.

11 Some features of a DiSCO

12 Describes an aggregation of 1 or more scholarly resources
ARTICLE DATASET A disco describes an aggregation of one or more scholarly resources, in this case an article and a dataset

13 Based on OAI-ORE*, but simpler
NAMESPACES @prefix ore: < . @prefix rmap: < . <ark:/00000/akfjlsdkf> a rmap:DiSCO ; ore:aggregates < < . DISCO PID TYPE This is the simplest disco you could create. The identifier is generated by RMap AGGREGATED RESOURCES * Open Archive Initiative – Object Reuse and Exchange

14 Can include additional assertions encoded as a RDF
In addition to this basic information you can add as many additional assertion as you like. They just need to be encoded as RDF, and form a connected graph with the aggregated resources. You may use this to define the types for the objects in the aggregates list, describe their exact relationship and add any other context as relevant. There is no restriction on ontologies used

15 Provenance and status captured in RMap Event
Each time a DiSCO is created or updated, provenance is captured in an RMap Event

16 DiSCOs are version-able, have status
INACTIVE INACTIVE ACTIVE DiSCO V1 DiSCO V2 DiSCO V3 DiSCOs are also versionable. You can always retrieve previous versions or ask the system for the latest. Each version gets a new identifier. When you tell the system to create a new version, the previous version is marked as inactive. Get Latest

17 The ideal version of a DiSCO
I showed this diagram, this is an ideal version of a DiSCO. If you are purely harvesting metadata, you don’t see this in the wild. Distributed Scholarly Compound Object

18 Relationship metadata is scattered, incomplete, and created at different times
Instead the metadata, like the underlying objects is scattered, incomplete and created at different times. It is difficult to coordinate the proliferation of this context.

19 DiSCOs can be connected through a shared URI
AGENTS JOURNAL creators partOf ARTICLE ARTICLE citedBy Publisher Metadata In the case of RMap it’s OK if this metadata is piecemeal, because linked data allows you to join it up based on shared identifiers. So if a publisher gives you this view…

20 DiSCOs can be connected through shared URIs
AGENTS DataCite Metadata creators ARTICLE supplementOf And then DataCite gives you this view… DATASET derivedFrom cites WEBPAGE BIG DATASET

21 Combined view JOURNAL creators AGENTS partOf creators citedBy
creators partOf creators ARTICLE ARTICLE citedBy supplementOf Through shared URIs they join together. DATASET derivedFrom cites WEBPAGE BIG DATASET

22 RMap allows you to view all links for a resource
AGENTS JOURNAL creators partOf ARTICLE ARTICLE citedBy supplementOf RMap allows you to ask it everything it knows about a particular URI and it will returned the combination of all DiSCOs DATASET

23 This is how it looks through the RMap GUI
This is how it looks through the RMap GUI. Notice the list of DiSCOs on the right that make up this visual

24 Preservation use cases
So what does this have to do with preservation?

25 This map represents a preservation challenge
output of CODE DATASET used from AUTHORS by by DEVELOPER cites ARTICLE BIG DATASET cites This map of distributed resources represents a preservation challenge by ARTICLES WEBSITES ORG

26 Institutional Archive
Preservation is often distributed, asynchronous, incomplete, with no repository having full context. Institutional Archive DATASET e.g. Portico ARTICLE Currently there are repositories that specialize in preservation of specific types of material, and that’s unlikely to change any time soon. It is unclear what the coverage of these is, and each holds a piece of the context of the work. ARTICLES WEBSITES e.g. Internet Archive

27 RMap can help add context around each research object
Institutional Archive CODE DATASET AUTHORS e.g. Portico ARTICLE BIG DATASET DEVELOPER What RMap could do is help fill in that context for each piece and help determine how these works fit together. In fact, the RMap DiSCOs could be valuable to preserve as well. ARTICLES WEBSITES e.g. Internet Archive ORG

28 Expose the location of preserved copies
FILES hasParts ARTICLE PRESERVED ARTICLE isFormatOf location PORTICO status Another possible preservation use case would be to give archives an opportunity to identify what parts of the map are being preserved. For example, Portico could provide graphs that show which articles they have preserved copies of and what files are included. <

29 It could support discovery of preservation gaps
Institutional Archive CODE DATASET AUTHORS e.g. Portico ARTICLE BIG DATASET DEVELOPER This could help identify possible preservation gaps in these distributed works. ARTICLES WEBSITES e.g. Internet Archive ORG

30 … perhaps be used to trigger preservation events
AGENTS JOURNAL creators partOf WARC it! creators ARTICLE ARTICLE citedBy supplementOf RMap data could also be used to trigger preservation events. Say a map is created with a website reference, this could trigger a web archive to preserve it. That archive could then add it’s own DiSCO containing the new archived link. DATASET derivedFrom cites WEBPAGE BIG DATASET

31 Distribute context metadata across repositories
Preservation Services Publishers Researchers Registries Finally, RMap could be used in general to help harmonize metadata across multiple platforms. Funders Data Repositories Academic Institutions

32 Status Working prototype REST API, web GUI, harvesting tools
Code is on GitHub ( Public sandboxes available (see tech wiki for docs) Work continues… Here is the current status. There was some buzz at the conference about overlaying dinosaurs onto long tail graphs… my graph doesn’t have a long tail so I’ve chosen a pterodactyl.

33 Acknowledgements RMap is funded by the Alfred P. Sloan Foundation
Thanks to RMap project colleagues: JHU: Sayeed Choudhury (PI), Aaron Birkland, Tim DiLauro Portico: Kate Wittenberg (PI), Sheila Morrissey, Jabin White, Vinay Cheruku, Amy Kirchhoff, John Meyer, Stephanie Orphan, Joseph Rogowski IEEE: Gerry Grenier (PI), Mark Donoghue, Renny Guida, Ken Rawson, Ken Moore.

34 http://www.rmap-project.info (general info)
(tech wiki) @rmapproject | @karenhansn |


Download ppt "Using RMap to Describe Distributed Works as Linked Data Graphs: Outcomes and Preservation Implications iPres. Bern, Switzerland, October 5, 2016 Karen."

Similar presentations


Ads by Google